[regression] first part n-dim minimization

This commit is contained in:
Jan Benda 2020-12-19 13:41:28 +01:00
parent 7518f9dd47
commit 891515caf8
2 changed files with 93 additions and 12 deletions

View File

@ -1,4 +1,4 @@
meansquarederrorline % generate data meansquarederrorline; % generate data
c0 = 2.0; c0 = 2.0;
eps = 0.0001; eps = 0.0001;
@ -21,9 +21,8 @@ hold on;
% generate x-values for plottig the fit: % generate x-values for plottig the fit:
xx = min(x):0.01:max(x); xx = min(x):0.01:max(x);
yy = cest * xx.^3; yy = cest * xx.^3;
plot(xx, yy, 'displayname', 'fit'); plot(xx, yy);
plot(x, y, 'o', 'displayname', 'data'); % plot original data plot(x, y, 'o'); % plot original data
xlabel('Size [m]'); xlabel('Size [m]');
ylabel('Weight [kg]'); ylabel('Weight [kg]');
legend("location", "northwest"); legend('fit', 'data', 'location', 'northwest');
pause

View File

@ -254,7 +254,7 @@ can be approximated numerically by the difference quotient
The derivative is positive for positive slopes. Since want to go down The derivative is positive for positive slopes. Since want to go down
the hill, we choose the opposite direction. the hill, we choose the opposite direction.
\begin{exercise}{meanSquaredGradientCubic.m}{} \begin{exercise}{meanSquaredGradientCubic.m}{}\label{gradientcubic}
Implement a function \varcode{meanSquaredGradientCubic()}, that Implement a function \varcode{meanSquaredGradientCubic()}, that
takes the $x$- and $y$-data and the parameter $c$ as input takes the $x$- and $y$-data and the parameter $c$ as input
arguments. The function should return the derivative of the mean arguments. The function should return the derivative of the mean
@ -312,12 +312,12 @@ descent works as follows:
\item Repeat steps \ref{computegradient} -- \ref{gradientstep}. \item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
\end{enumerate} \end{enumerate}
\Figref{gradientdescentcubicfig} illustrates the gradient descent --- the \Figref{gradientdescentcubicfig} illustrates the gradient descent ---
path the imaginary ball has chosen to reach the minimum. We walk along the path the imaginary ball has chosen to reach the minimum. We walk
the parameter axis against the gradient as long as the gradient along the parameter axis against the gradient as long as the gradient
differs sufficiently from zero. At steep slopes we take large steps differs sufficiently from zero. At steep slopes we take large steps
(the distance between the red dots in \figref{gradientdescentcubicfig}) is (the distance between the red dots in \figref{gradientdescentcubicfig}
large. is large).
\begin{exercise}{gradientDescentCubic.m}{} \begin{exercise}{gradientDescentCubic.m}{}
Implement the gradient descent algorithm for the problem of fitting Implement the gradient descent algorithm for the problem of fitting
@ -339,11 +339,78 @@ large.
squared errors as a function of iteration step (two plots). Compare squared errors as a function of iteration step (two plots). Compare
the result of the gradient descent method with the true value of $c$ the result of the gradient descent method with the true value of $c$
used to simulate the data. Inspect the plots and adapt $\epsilon$ used to simulate the data. Inspect the plots and adapt $\epsilon$
and the threshold to make the algorithm behave as intended. Also and the threshold to make the algorithm behave as intended. Finally
plot the data together with the best fitting cubic relation plot the data together with the best fitting cubic relation
\eqref{cubicfunc}. \eqref{cubicfunc}.
\end{exercise} \end{exercise}
The $\epsilon$ parameter in \eqnref{gradientdescent} is critical. If
too large, the algorithm does not converge to the minimum of the cost
function (try it!). At medium values it oscillates around the minimum
but might nevertheless converge. Only for sufficiently small values
(here $\epsilon = 0.0001$) does the algorithm follow the slope
downwards towards the minimum.
The terminating condition on the absolute value of the gradient
influences how often the cost function is evaluated. The smaller the
threshold value the more often the cost is computed and the more
precisely the fit parameter is estimated. If it is too small, however,
the increase in precision is negligible, in particular in comparison
to the increased computational effort. Have a look at the derivatives
that we plotted in exercise~\ref{gradientcubic} and decide on a
sensible value for the threshold. Run the gradient descent algorithm
and check how the resulting $c$ parameter values converge and how many
iterations were needed. The reduce the threshold (by factors of ten)
and check how this changes the results.
Many modern algorithms for finding the minimum of a function are based
on the basic idea of the gradient descent. Luckily these algorithms
choose $\epsilon$ in a smart adaptive way and they also come up with
sensible default values for the termination condition. On the other
hand, these algorithm often take optional arguments that let you
control how they behave. Now you know what this is all about.
\section{N-dimensional minimization problems}
So far we were concerned about finding the right value for a single
parameter that minimizes a cost function. The gradient descent method
for such one dimensional problems seems a bit like over kill. However,
often we deal with functions that have more than a single parameter,
in general $n$ parameter. We then need to find the minimum in an $n$
dimensional parameter space.
For our tiger problem, we could have also fitted the exponent $\alpha$
of the power-law relation between size and weight, instead of assuming
a cubic relation:
\begin{equation}
\label{powerfunc}
y = f(x; c, \alpha) = f(x; \vec p) = c\cdot x^\alpha
\end{equation}
We then could check whether the resulting estimate of the exponent
$\alpha$ indeed is close to the expected power of three. The
power-law \eqref{powerfunc} has two free parameters $c$ and $\alpha$.
Instead of a single parameter we are now dealing with a vector $\vec
p$ containing $n$ parameter values. Here, $\vec p = (c,
\alpha)$. Luckily, all the concepts we introduced on the example of
the one dimensional problem of tiger weights generalize to
$n$-dimensional problems. We only need to adapt a few things. The cost
function for the mean squared error reads
\begin{equation}
\label{ndimcostfunc}
f_{cost}(\vec p|\{(x_i, y_i)\}) = \frac{1}{N} \sum_{i=1}^N (y_i - f(x_i;\vec p))^2
\end{equation}
For two-dimensional problems the graph of the cost function is an
\enterm{error surface} (\determ{{Fehlerfl\"ache}}). The two parameters
span a two-dimensional plane. The cost function assigns to each
parameter combination on this plane a single value. This results in a
landscape over the parameter plane with mountains and valleys and we
are searching for the bottom of the deepest valley.
When we place a ball somewhere on the slopes of a hill it rolls
downwards and eventually stops at the bottom. The ball always rolls in
the direction of the steepest slope.
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient} \begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
Some functions that depend on more than a single variable: Some functions that depend on more than a single variable:
\[ z = f(x,y) \] \[ z = f(x,y) \]
@ -388,6 +455,21 @@ that points to the strongest ascend of the objective function. The
gradient is given by partial derivatives gradient is given by partial derivatives
(Box~\ref{partialderivativebox}) of the mean squared error with (Box~\ref{partialderivativebox}) of the mean squared error with
respect to the parameters $m$ and $b$ of the straight line. respect to the parameters $m$ and $b$ of the straight line.
For example, you measure the response of a passive membrane to a
current step and you want to estimate membrane time constant. Then you
need to fit an exponential function
\begin{equation}
\label{expfunc}
V(t; \tau, \Delta V, V_{\infty}) = \Delta V e^{-t/\tau} + V_{\infty}
\end{equation}
with three free parameters $\tau$, $\Delta y$, $y_{\infty}$ to the
measured time course of the membrane potential $V(t)$. The $(x_i,y_i)$
data pairs are the sampling times $t_i$ and the corresponding
measurements of the membrane potential $V_i$.
\section{Summary} \section{Summary}
The gradient descent is an important numerical method for solving The gradient descent is an important numerical method for solving