In Part I (here), we discussed what are Piecewise-Linear Functions (PLFs). In Part II, this document, we are going to discuss how to estimate the coefficients.
As it was mentioned in Part I, if we do not need to guarantee the continuity, all you need to do is to just fit a line to each segment separately. I am assuming you already know how to do that.
When you have to guarantee the continuity we can use the re-written form discussed in Part I. We are going to divide the problem into three different types.
In the first type, we assume we already know the break points, i.e. all the are known, so we have to just discuss how to determine all the .
In the second type, we are going to assume we just know how many break points there are. Therefore, we need to determine both all the and all the . In this case we do know how many segments we have.
In the third type, we are going to fit a PLF, assuming that we also don’t know how many segments there are. So not only we need to fit all the and all the , but also we have to decide what is the max for and max for .
Just in case, a reminder from Part I, if you have segments, you have maximum for to be and maximum to be . For example, if you have 5 segments, then , so you have three break-points, max is 3, and max is 4.
Type 1: Fitting a PLF when break-points are known
So, in this type, we do know what are the break-points or . Let’s get to it.
A mathematician answer
Let’s say you have a set of measurements, i.e. , , , . You can form the following system of equations:
where:
So all you need to do now to get the coefficients is to compute them as:
A computational scientist and/or practitioner answer
If you have read my posts, you would know that I am not a big fan of methods that would require inverting a matrix. So, even though, we showed a mathematical equation above regarding how you would get the coefficients, when it comes to implementation, you have much more efficient methods of calculating those coefficients than doing all the Matrix multiplication and Matrix inversion that is show in the equation above. So, here is what we are going to do.
Let’s say you want to fit a function to get . So, let’s create some data and add some noise for example:
import numpy as np
from matplotlib import pyplot as plt
x0 = np.linspace(0, 2, 21)
np.random.seed(42)
noise = np.random.uniform(-0.03, 0.03, x0.shape)
y0 = np.abs(x0 - 1) + noise
Now let’s define our piecewise function that we pretend we don’t know the coefficients:
def my_piecewise_1(x, lambda_0, lambda_1, lambda_2):
return np.piecewise(
x,
[ x < 1 ],
[
lambda x: lambda_0 + lambda_1 * x,
lambda x: lambda_0 + lambda_1 * x + lambda_2 * (x - 1),
]
Now, we are going to use scipy.optimize.curve_fit to get the coefficients:
lambda_coefficients, _ = curve_fit(my_piecewise_1, x0, y0)
lambda_coefficients
array([ 1.00311503, -1.00563566, 2.00161582])
If you calculate the slopes and intercepts, you will get the following lines:
x < 1 --> y = +1.00 -1.01 x
1 < x --> y = -1.00 +1.00 x
You can use this procedure to fit any PLF that you want as long as you do know where the break-points are. The only thing that you would need to change is the implementation of the function that you are fitting.
As another example, let’s fit data to our W-Shape function. One way of implementing the PLF describing the W-Shaped function is:
def my_piecewise_2(x, lambda_0, lambda_1, lambda_2, lambda_3, lambda_4):
return np.piecewise(
x,
[
(x < -0.5),
(-0.5 <= x) & (x < 0),
(0 <= x) & (x < 0.5),
(0.5 <= x),
],
[
lambda x: lambda_0 + lambda_1 * x,
lambda x: lambda_0 + lambda_1 * x + lambda_2 * (x - -0.5),
lambda x: lambda_0 + lambda_1 * x + lambda_2 * (x - -0.5) + lambda_3 * (x - 0),
lambda x: lambda_0 + lambda_1 * x + lambda_2 * (x - -0.5) + lambda_3 * (x - 0) + lambda_4 * (x - 0.5),
]
)
Now you can fit the coefficients similarly:
lambda_coefficients, _ = curve_fit(my_piecewise_2, x0, y0)
lambda_coefficients
array([-0.49962731, -0.99322731, 2.01020448, -2.05157343, 2.05809717])
Now you can compute the slopes and intercepts as follows:
slopes = np.cumsum(lambda_coefficients[1:])
intercepts = np.concatenate([
[lambda_coefficients[0]],
lambda_coefficients[0] - np.cumsum(lambda_coefficients[2:]*knots)
])
and the lines that we are getting are:
x < -0.5 --> y = -0.50 -0.99 x
-0.5 <= x < 0.0 --> y = 0.51 +1.02 x
0.0 <= x < 0.5 --> y = 0.51 -1.03 x
0.5 <= x --> y = -0.52 +1.02 x
Type 2: Fitting a PLF when we know how many segments there are
Using “curve_fit”
In the previous type, we already know where the location of the break-points or knots are. That’s why our PLF implementation was only accepting the coefficients and no information regarding the knots was passed in.
So, now that we need to also determine the values for the knots along with values, we need to modify the implementation and pass the break-points, to the function and then optimize. Let’s see how it works:
def my_piecewise_3(x, l0, l1, l2, l3, l4, b0, b1, b2):
return np.piecewise(
x,
[
(x < b0),
(b0 <= x) & (x < b1),
(b1 <= x) & (x < b2),
(b2 <= x),
],
[
lambda x: l0 + l1 * x,
lambda x: l0 + l1 * x + l2 * (x - b0),
lambda x: l0 + l1 * x + l2 * (x - b0) + l3 * (x - b1),
lambda x: l0 + l1 * x + l2 * (x - b0) + l3 * (x - b1) + l4 * (x - b2),
]
)
and as before we are going to use scipy.optimize.curve_fit to get the coefficients back.
coefficients, _ = curve_fit(my_piecewise_3, x0, y0)
coefficients
array([-0.52093414, -1.0189633 , 1.16184422, -0.59845358, 1.42896937,
-0.63524498, 1.67113914, 1.18078627])
Wow, that was easy. Let’s plot our graph along with the detected break-points:
fig, ax = plt.subplots(1, 1)
x_high_res = np.linspace(x0.min(), x0.max(), 200)
ax.plot(
x_high_res,
my_piecewise_3(x_high_res, *coefficients),
'b.'
)
ax.plot(x0, y0, "r.")
ax.set_xlabel("x")
ax.set_ylabel("y")
ylim = ax.get_ylim()
color=["k", "g", "y"]
for idx in range(-3,0):
ax.plot([coefficients[idx]]*2, ylim, color[idx])
mmmm!!! What went wrong?
So, the first 5 coefficients are and the last 3 are the . We have assumed that . But do you see anywhere that we are imposing this constraint?
In fact if you check the results you would see that in our case: . That also explains why our graph is not continuous.
Before we discuss what the solution is, let’s get couple of things out of the way.
First, you might try “curve_fit” on another problem and get a correct or acceptable results. Yes, that can happen. In fact, if I don’t set the random seed, and keep running the code, sometimes you will get a relatively acceptable answer even for this problem. We are not looking, however, for a solution that works only sometimes. We are looking for a solution that is reliable more often.
Second point: overall, if you have only one break-points, that is only two segments, “curve_fit” provides an acceptable answer. Unless if the data that you are fitting to is too noisy. Particularly if the noise has a higher amplitude closer to the knot. Then in presence of noise, the location of the break-point is harder to detect; and more often it goes wrong. If you have two break points, well, if your luck is like mine, “curve_fit” usually won’t work well or reliably. It works worse than when you have only one break-point. On three break-points and above, usually the results is too often not acceptable that I have to abandon using “curve_fit” and use a constraint optimization.
We discuss the third point in the next sub-section
Using “curve_fit” along with bounds
Continuing from previous section: third point is “curve_fit” can accept bounds. Sometimes, I have seen people try to use that as a sort of constraints to force where each break-points are. I have done that too. That, i.e. using bounds as a form of constraints on break-points, could work, depending on the problem. Let’s say for example, you are fitting a data to something related to plant growth. You know that the crop is planted in a region usually between April and May, and then it is harvested in October. Furthermore, let’s say you have only two break-points. Then you can bound one of the break-points to be before July and the other to be after July.
You might think that bounds would help to get proper break-points only if we can impose some bounds that are non-overlapping and could separate break-points completely. That’s not correct actually. When you impose bounds for your parameter you can reduce the size of search space. Ask your self, is it easier to search for a good answer from to or perform a search in a much smaller interval, let’s say and , where is the measured data that you have collected. The thing with break-points is that you can usually safely assume that the break points are somewhere between the minimum and maximum range of values that you have collected during your measurements. Generally, my personal experience is that if you impose this as bounds on the break-points, the quality of your curve-fit increases incredibly.
Let’s try that on our toy problem.
We can impose the bounds as follow:
from scipy.optimize import Bounds
bounds = Bounds(
lb=[
-np.inf, -np.inf, -np.inf, -np.inf, -np.inf,
x0.min(), x0.min(), x0.min()
],
ub=[
np.inf, np.inf, np.inf, np.inf, np.inf,
x0.max(), x0.max(), x0.max(),
]
)
notice that we are imposing bounds for the break-points, , only (the last three coefficients) but we are letting the to go wild.
Let’s see how the results has improved:
Clearly that’s a great improvement.
Using Constraint Optimization
So, I think you have some idea regarding a better way to solve this. There are multiple methods to do so, but in this document, we are going to use scipy.optimize.minimize which allows performing constraint minimization. You can specify the linear constraints in the following format:
In our case, the parameter vector, , is defined as:
We have the following linear-constraints:
if we rearrange that we can write:
This is still not in the form that the linear constraints needs to be. The sign must be on both ends. So, we are going to modify it a bit. We do know that cannot be equal. So, we are going to impose a small difference between them and rewrite the constraints in proper format:
Now we can rewrite this in matrix form as follows:
Let’s implement this Linear Constraint in Python:
from scipy.optimize import LinearConstraint
linear_constraints = LinearConstraint(
[
[0, 0, 0, 0, 0, 1, -1, 0],
[0, 0, 0, 0, 0, 0, 1, -1],
],
[-np.inf, -np.inf],
[-eps, -eps]
)
Finally, we can call the minimize and get the results:
from scipy.optimize import minimize, SR1
optimization_results = minimize(
fun=lambda p: np.sqrt(
np.mean(
np.power(
my_piecewise_3_vec(x0, p) - y0,
2
)
)
),
x0=[0, 0, 0, 0, 0, -1, 0, 1],
method="trust-constr",
jac="2-point",
hess=SR1(),
bounds=bounds,
constraints=[linear_constraints]
)
coefficients = optimization_results.x
coefficients
array([-0.50050653, -1.00240024, 1.94039024, -1.89981795, 1.98129357,
-0.50248266, 0.00385454, 0.51091473])
You can already see that the results are very promising. Let’s plot the outputs:
Practically you can’t see any difference between the two results. Well, This happens in this case. But in a more complex problems, using proper constraints based on what you know about the problem, could be a differentiator. Don’t hesitate to include your domain knowledge into the optimization problem
Type 3: Not Even Knowing the number of segments
In this type, all we know is that we want to fit a piecewise-linear function. We don’t know how many segments there are and obviously, we could not know where the breaks are.
I know I promised to talk about this type in this post; but then I noticed that I would need to discuss couple of more topics before explaining how to handle this type. Specifically, I would need to discuss Over-Fitting issue in curve-fitting or model fitting as well as another topic in optimization, before I can discuss Type 3 of PLFs. So, far we did not have to worry about overfitting; But in the case of Type 3 PLFs fitting, we do. So, we have to postpone this type of PLF and discuss it in a future post, once the pre-requisite topics are properly discussed.
So, stay tuned for Type 3.