The graph above shows a simple straight-line example (Y = a + bX). The red dots are the data points. The least-squares algorithm computes the values of a (intercept) and b (slope) of the straight line that is a least-squares best fit to the data points (shown as the the green line in the plot). Least-squares best fits can be calculated by some hand-held calculators, spreadsheets, and dedicated computer programs. Although it is possible to estimate slope and intercept of a straight line by visual estimation and a straightedge, the least-square method is more objective and easier to automate. (If you were to give this set of data to five different people and ask them to estimate the best-fit line visually, you'd get five slightly different different answers, but it you gave the data set to five different computer programs, you'd get the exact same answer every time).
In some cases a fundamentally non-linear relationship can be transformed into a form that is amenable to polynomial curve fitting by means of a coordinate transformation (e.g. taking the log or the reciprocal of the data), and then least-squares method can be applied to the resulting linear equation. For example, the points in Figure 16 are from the simulation of an exponential decay (X=time, Y=signal intensity) that has the mathematical form Y = a exp(bX), where a is the Y-value at X=0 and b is the decay constant. In this particular example, a = 1 and b = -1. This is a fundamentally non-linear problem because Y is a non-linear function of the parameter b. However, by taking the natural log of both sides of the equation, we obtain ln(Y)=a + bX. In this equation, Y is a linear function of both parameters a and b, so it can be fit by the least squares method in order to estimate a and b.
The best fit equation, shown by the green solid line in the figure, is Y = 0.9897 exp(- 0.98896 X), that is a = 0.9897 and b = -0.98896, very close to the expected values. Thus, even in the presence of substantial random noise, it is possible to get good estimates of the parameters of the underlying equation. The most important requirement is that the model be good, that is, that the equation selected for the model accurately describes the underlying behavior of the system (except for noise). Often that is the most difficult aspect, because the underlying models are not always known with certainty. Other examples of non-linear relationships that can be linearized by coordinate transformation include the logarithmic (Y = a ln(bX)) and power (Y=aXb) relationships. Methods of this type used to be very common back in the days before computers, when fitting anything but a straight line was difficult. It is still used today to extend the range of functional relationships that can be handled by common linear least-squares routines.
Another example of the use of transformation to convert a non-linear relationship into a form that is amenable to polynomial curve fitting is the use of the natural log (ln) transformation to
convert a Gaussian peak, which has the fundamental functional form exp(-x2), into a parabola of the form -x2, which can be fit with a second order polynomial (quadratic) function (y = a + bx + cx2). All three parameters of the peak (height, maximum position, and width) can be calculated from the three quadratic coefficients a, b, and c; the peak height is given by exp(a-c*(b/(2*c))^2), the peak position by -b/(2*c), and the peak half-width by 2.35703/(sqrt(2)*sqrt(-c)).
An example is shown in the figure on the left. The signal is a Gaussian peak with a true peak height of 100 units, a true peak position of 100 units, and a true half-width of 100 units, but it is sampled only every 31 units on the x-axis. The resulting data set, shown by the red points in the upper left, has only 6 data points on the peak, too few for accurate measurement of peak height, position, and width. The maximum of those 6 points is at x=87 and has an amplitude of 95, quite a bit from the true values of peak position and height. However, taking the natural log of the data (upper right) produces a parabola that can be fit with a quadratic least-squares fit (shown by the blue line in the lower left). From the three coefficients of the quadratic fit we can calculate much more accurate values of the Gaussian peak parameters, shown at the bottom of the figure. The plot in the lower right shows the resulting Gaussian fit (in blue) displayed with the original data (red points). The accuracy is limited only by the noise in the data (about 1% in this example). This figure was created in Matlab, using this script. (Note: in order for this method to work properly, the data set must not contain any zeros or negatives points, and the original Gaussian peak signal must have a zero baseline, that is, must tend to zero far from the peak center. In practice this means that the baseline must be subtracted from the data set before applying this method.)
Recent versions of Matlab have a convenient tool for interactive manually-controlled (rather than programmed) polynomial curve fitting in the Figure window. Click for a video example: (external link to YouTube).