[regression] text and example for cost function figure

This commit is contained in:
Jan Benda 2020-12-18 13:10:53 +01:00
parent 99a8a9d91e
commit fefe1c3726
2 changed files with 34 additions and 24 deletions

View File

@ -0,0 +1,9 @@
cs = 2.0:0.1:8.0;
mses = zeros(length(cs));
for i = 1:length(cs)
mses(i) = meanSquaredErrorCubic(x, y, cs(i));
end
plot(cs, mses)
xlabel('c')
ylabel('mean squared error')

View File

@ -154,12 +154,11 @@ function $f_{cost}(c)$ that maps the parameter value $c$ to a scalar
error value. For a given data set we thus can simply plot the cost error value. For a given data set we thus can simply plot the cost
function as a function of $c$ (\figref{cubiccostfig}). function as a function of $c$ (\figref{cubiccostfig}).
\begin{exercise}{errorSurface.m}{}\label{errorsurfaceexercise} \begin{exercise}{plotcubiccosts.m}{}
Then calculate the mean squared error between the data and straight Calculate the mean squared error between the data and the cubic
lines for a range of slopes and intercepts using the function for a range of values of the factor $c$ using the
\varcode{meanSquaredErrorCubic()} function from the previous \varcode{meanSquaredErrorCubic()} function from the previous
exercise. Illustrate the error surface using the \code{surface()} exercise. Plot the graph of the cost function.
function.
\end{exercise} \end{exercise}
\begin{figure}[t] \begin{figure}[t]
@ -177,20 +176,22 @@ function as a function of $c$ (\figref{cubiccostfig}).
By looking at the plot of the cost function we can visually identify By looking at the plot of the cost function we can visually identify
the position of the minimum and thus estimate the optimal value for the position of the minimum and thus estimate the optimal value for
the parameter $c$. How can we use the error surface to guide an the parameter $c$. How can we use the cost function to guide an
automatic optimization process? automatic optimization process?
The obvious approach would be to calculate the error surface for any The obvious approach would be to calculate the mean squared error for
combination of slope and intercept values and then find the position a range of parameter values and then find the position of the minimum
of the minimum using the \code{min} function. This approach, however using the \code{min} function. This approach, however has several
has several disadvantages: (i) it is computationally very expensive to disadvantages: (i) the accuracy of the estimation of the best
calculate the error for each parameter combination. The number of parameter is limited by the resolution used to sample the parameter
combinations increases exponentially with the number of free space. The coarser the parameters are sampled the less precise is the
parameters (also known as the ``curse of dimensionality''). (ii) the obtained position of the minimum (\figref{cubiccostfig}, right). (ii)
accuracy with which the best parameters can be estimated is limited by the range of parameter values might not include the absolute minimum.
the resolution used to sample the parameter space. The coarser the (iii) in particular for functions with more than a single free
parameters are sampled the less precise is the obtained position of parameter it is computationally expensive to calculate the cost
the minimum. function for each parameter combination. The number of combinations
increases exponentially with the number of free parameters. This is
known as the \enterm{curse of dimensionality}.
So we need a different approach. We want a procedure that finds the So we need a different approach. We want a procedure that finds the
minimum of the cost function with a minimal number of computations and minimum of the cost function with a minimal number of computations and
@ -206,22 +207,22 @@ to arbitrary precision.
m = \frac{f(x + \Delta x) - f(x)}{\Delta x} m = \frac{f(x + \Delta x) - f(x)}{\Delta x}
\end{equation} \end{equation}
of a function $y = f(x)$ is the slope of the secant (red) defined of a function $y = f(x)$ is the slope of the secant (red) defined
by the points $(x,f(x))$ and $(x+\Delta x,f(x+\Delta x))$ with the by the points $(x,f(x))$ and $(x+\Delta x,f(x+\Delta x))$ at
distance $\Delta x$. distance $\Delta x$.
The slope of the function $y=f(x)$ at the position $x$ (yellow) is The slope of the function $y=f(x)$ at the position $x$ (orange) is
given by the derivative $f'(x)$ of the function at that position. given by the derivative $f'(x)$ of the function at that position.
It is defined by the difference quotient in the limit of It is defined by the difference quotient in the limit of
infinitesimally (orange) small distances $\Delta x$: infinitesimally (red and yellow) small distances $\Delta x$:
\begin{equation} \begin{equation}
\label{derivative} \label{derivative}
f'(x) = \frac{{\rm d} f(x)}{{\rm d}x} = \lim\limits_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} \end{equation} f'(x) = \frac{{\rm d} f(x)}{{\rm d}x} = \lim\limits_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} \end{equation}
\end{minipage}\vspace{2ex} \end{minipage}\vspace{2ex}
It is not possible to calculate the derivative, \eqnref{derivative}, It is not possible to calculate the exact value of the derivative,
numerically. The derivative can only be estimated using the \eqnref{derivative}, numerically. The derivative can only be
difference quotient, \eqnref{difffrac}, by using sufficiently small estimated by computing the difference quotient, \eqnref{difffrac}
$\Delta x$. using sufficiently small $\Delta x$.
\end{ibox} \end{ibox}
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient} \begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}