[regression] first part n-dim minimization
This commit is contained in:
parent
7518f9dd47
commit
891515caf8
@ -1,4 +1,4 @@
|
||||
meansquarederrorline % generate data
|
||||
meansquarederrorline; % generate data
|
||||
|
||||
c0 = 2.0;
|
||||
eps = 0.0001;
|
||||
@ -21,9 +21,8 @@ hold on;
|
||||
% generate x-values for plottig the fit:
|
||||
xx = min(x):0.01:max(x);
|
||||
yy = cest * xx.^3;
|
||||
plot(xx, yy, 'displayname', 'fit');
|
||||
plot(x, y, 'o', 'displayname', 'data'); % plot original data
|
||||
plot(xx, yy);
|
||||
plot(x, y, 'o'); % plot original data
|
||||
xlabel('Size [m]');
|
||||
ylabel('Weight [kg]');
|
||||
legend("location", "northwest");
|
||||
pause
|
||||
legend('fit', 'data', 'location', 'northwest');
|
||||
|
@ -254,7 +254,7 @@ can be approximated numerically by the difference quotient
|
||||
The derivative is positive for positive slopes. Since want to go down
|
||||
the hill, we choose the opposite direction.
|
||||
|
||||
\begin{exercise}{meanSquaredGradientCubic.m}{}
|
||||
\begin{exercise}{meanSquaredGradientCubic.m}{}\label{gradientcubic}
|
||||
Implement a function \varcode{meanSquaredGradientCubic()}, that
|
||||
takes the $x$- and $y$-data and the parameter $c$ as input
|
||||
arguments. The function should return the derivative of the mean
|
||||
@ -312,12 +312,12 @@ descent works as follows:
|
||||
\item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
|
||||
\end{enumerate}
|
||||
|
||||
\Figref{gradientdescentcubicfig} illustrates the gradient descent --- the
|
||||
path the imaginary ball has chosen to reach the minimum. We walk along
|
||||
the parameter axis against the gradient as long as the gradient
|
||||
\Figref{gradientdescentcubicfig} illustrates the gradient descent ---
|
||||
the path the imaginary ball has chosen to reach the minimum. We walk
|
||||
along the parameter axis against the gradient as long as the gradient
|
||||
differs sufficiently from zero. At steep slopes we take large steps
|
||||
(the distance between the red dots in \figref{gradientdescentcubicfig}) is
|
||||
large.
|
||||
(the distance between the red dots in \figref{gradientdescentcubicfig}
|
||||
is large).
|
||||
|
||||
\begin{exercise}{gradientDescentCubic.m}{}
|
||||
Implement the gradient descent algorithm for the problem of fitting
|
||||
@ -339,11 +339,78 @@ large.
|
||||
squared errors as a function of iteration step (two plots). Compare
|
||||
the result of the gradient descent method with the true value of $c$
|
||||
used to simulate the data. Inspect the plots and adapt $\epsilon$
|
||||
and the threshold to make the algorithm behave as intended. Also
|
||||
and the threshold to make the algorithm behave as intended. Finally
|
||||
plot the data together with the best fitting cubic relation
|
||||
\eqref{cubicfunc}.
|
||||
\end{exercise}
|
||||
|
||||
The $\epsilon$ parameter in \eqnref{gradientdescent} is critical. If
|
||||
too large, the algorithm does not converge to the minimum of the cost
|
||||
function (try it!). At medium values it oscillates around the minimum
|
||||
but might nevertheless converge. Only for sufficiently small values
|
||||
(here $\epsilon = 0.0001$) does the algorithm follow the slope
|
||||
downwards towards the minimum.
|
||||
|
||||
The terminating condition on the absolute value of the gradient
|
||||
influences how often the cost function is evaluated. The smaller the
|
||||
threshold value the more often the cost is computed and the more
|
||||
precisely the fit parameter is estimated. If it is too small, however,
|
||||
the increase in precision is negligible, in particular in comparison
|
||||
to the increased computational effort. Have a look at the derivatives
|
||||
that we plotted in exercise~\ref{gradientcubic} and decide on a
|
||||
sensible value for the threshold. Run the gradient descent algorithm
|
||||
and check how the resulting $c$ parameter values converge and how many
|
||||
iterations were needed. The reduce the threshold (by factors of ten)
|
||||
and check how this changes the results.
|
||||
|
||||
Many modern algorithms for finding the minimum of a function are based
|
||||
on the basic idea of the gradient descent. Luckily these algorithms
|
||||
choose $\epsilon$ in a smart adaptive way and they also come up with
|
||||
sensible default values for the termination condition. On the other
|
||||
hand, these algorithm often take optional arguments that let you
|
||||
control how they behave. Now you know what this is all about.
|
||||
|
||||
\section{N-dimensional minimization problems}
|
||||
|
||||
So far we were concerned about finding the right value for a single
|
||||
parameter that minimizes a cost function. The gradient descent method
|
||||
for such one dimensional problems seems a bit like over kill. However,
|
||||
often we deal with functions that have more than a single parameter,
|
||||
in general $n$ parameter. We then need to find the minimum in an $n$
|
||||
dimensional parameter space.
|
||||
|
||||
For our tiger problem, we could have also fitted the exponent $\alpha$
|
||||
of the power-law relation between size and weight, instead of assuming
|
||||
a cubic relation:
|
||||
\begin{equation}
|
||||
\label{powerfunc}
|
||||
y = f(x; c, \alpha) = f(x; \vec p) = c\cdot x^\alpha
|
||||
\end{equation}
|
||||
We then could check whether the resulting estimate of the exponent
|
||||
$\alpha$ indeed is close to the expected power of three. The
|
||||
power-law \eqref{powerfunc} has two free parameters $c$ and $\alpha$.
|
||||
Instead of a single parameter we are now dealing with a vector $\vec
|
||||
p$ containing $n$ parameter values. Here, $\vec p = (c,
|
||||
\alpha)$. Luckily, all the concepts we introduced on the example of
|
||||
the one dimensional problem of tiger weights generalize to
|
||||
$n$-dimensional problems. We only need to adapt a few things. The cost
|
||||
function for the mean squared error reads
|
||||
\begin{equation}
|
||||
\label{ndimcostfunc}
|
||||
f_{cost}(\vec p|\{(x_i, y_i)\}) = \frac{1}{N} \sum_{i=1}^N (y_i - f(x_i;\vec p))^2
|
||||
\end{equation}
|
||||
|
||||
For two-dimensional problems the graph of the cost function is an
|
||||
\enterm{error surface} (\determ{{Fehlerfl\"ache}}). The two parameters
|
||||
span a two-dimensional plane. The cost function assigns to each
|
||||
parameter combination on this plane a single value. This results in a
|
||||
landscape over the parameter plane with mountains and valleys and we
|
||||
are searching for the bottom of the deepest valley.
|
||||
|
||||
When we place a ball somewhere on the slopes of a hill it rolls
|
||||
downwards and eventually stops at the bottom. The ball always rolls in
|
||||
the direction of the steepest slope.
|
||||
|
||||
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
|
||||
Some functions that depend on more than a single variable:
|
||||
\[ z = f(x,y) \]
|
||||
@ -388,6 +455,21 @@ that points to the strongest ascend of the objective function. The
|
||||
gradient is given by partial derivatives
|
||||
(Box~\ref{partialderivativebox}) of the mean squared error with
|
||||
respect to the parameters $m$ and $b$ of the straight line.
|
||||
|
||||
|
||||
|
||||
For example, you measure the response of a passive membrane to a
|
||||
current step and you want to estimate membrane time constant. Then you
|
||||
need to fit an exponential function
|
||||
\begin{equation}
|
||||
\label{expfunc}
|
||||
V(t; \tau, \Delta V, V_{\infty}) = \Delta V e^{-t/\tau} + V_{\infty}
|
||||
\end{equation}
|
||||
with three free parameters $\tau$, $\Delta y$, $y_{\infty}$ to the
|
||||
measured time course of the membrane potential $V(t)$. The $(x_i,y_i)$
|
||||
data pairs are the sampling times $t_i$ and the corresponding
|
||||
measurements of the membrane potential $V_i$.
|
||||
|
||||
\section{Summary}
|
||||
|
||||
The gradient descent is an important numerical method for solving
|
||||
|
Reference in New Issue
Block a user