[regression] first part n-dim minimization
This commit is contained in:
parent
7518f9dd47
commit
891515caf8
@ -1,4 +1,4 @@
|
|||||||
meansquarederrorline % generate data
|
meansquarederrorline; % generate data
|
||||||
|
|
||||||
c0 = 2.0;
|
c0 = 2.0;
|
||||||
eps = 0.0001;
|
eps = 0.0001;
|
||||||
@ -21,9 +21,8 @@ hold on;
|
|||||||
% generate x-values for plottig the fit:
|
% generate x-values for plottig the fit:
|
||||||
xx = min(x):0.01:max(x);
|
xx = min(x):0.01:max(x);
|
||||||
yy = cest * xx.^3;
|
yy = cest * xx.^3;
|
||||||
plot(xx, yy, 'displayname', 'fit');
|
plot(xx, yy);
|
||||||
plot(x, y, 'o', 'displayname', 'data'); % plot original data
|
plot(x, y, 'o'); % plot original data
|
||||||
xlabel('Size [m]');
|
xlabel('Size [m]');
|
||||||
ylabel('Weight [kg]');
|
ylabel('Weight [kg]');
|
||||||
legend("location", "northwest");
|
legend('fit', 'data', 'location', 'northwest');
|
||||||
pause
|
|
||||||
|
@ -254,7 +254,7 @@ can be approximated numerically by the difference quotient
|
|||||||
The derivative is positive for positive slopes. Since want to go down
|
The derivative is positive for positive slopes. Since want to go down
|
||||||
the hill, we choose the opposite direction.
|
the hill, we choose the opposite direction.
|
||||||
|
|
||||||
\begin{exercise}{meanSquaredGradientCubic.m}{}
|
\begin{exercise}{meanSquaredGradientCubic.m}{}\label{gradientcubic}
|
||||||
Implement a function \varcode{meanSquaredGradientCubic()}, that
|
Implement a function \varcode{meanSquaredGradientCubic()}, that
|
||||||
takes the $x$- and $y$-data and the parameter $c$ as input
|
takes the $x$- and $y$-data and the parameter $c$ as input
|
||||||
arguments. The function should return the derivative of the mean
|
arguments. The function should return the derivative of the mean
|
||||||
@ -312,12 +312,12 @@ descent works as follows:
|
|||||||
\item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
|
\item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
\Figref{gradientdescentcubicfig} illustrates the gradient descent --- the
|
\Figref{gradientdescentcubicfig} illustrates the gradient descent ---
|
||||||
path the imaginary ball has chosen to reach the minimum. We walk along
|
the path the imaginary ball has chosen to reach the minimum. We walk
|
||||||
the parameter axis against the gradient as long as the gradient
|
along the parameter axis against the gradient as long as the gradient
|
||||||
differs sufficiently from zero. At steep slopes we take large steps
|
differs sufficiently from zero. At steep slopes we take large steps
|
||||||
(the distance between the red dots in \figref{gradientdescentcubicfig}) is
|
(the distance between the red dots in \figref{gradientdescentcubicfig}
|
||||||
large.
|
is large).
|
||||||
|
|
||||||
\begin{exercise}{gradientDescentCubic.m}{}
|
\begin{exercise}{gradientDescentCubic.m}{}
|
||||||
Implement the gradient descent algorithm for the problem of fitting
|
Implement the gradient descent algorithm for the problem of fitting
|
||||||
@ -339,11 +339,78 @@ large.
|
|||||||
squared errors as a function of iteration step (two plots). Compare
|
squared errors as a function of iteration step (two plots). Compare
|
||||||
the result of the gradient descent method with the true value of $c$
|
the result of the gradient descent method with the true value of $c$
|
||||||
used to simulate the data. Inspect the plots and adapt $\epsilon$
|
used to simulate the data. Inspect the plots and adapt $\epsilon$
|
||||||
and the threshold to make the algorithm behave as intended. Also
|
and the threshold to make the algorithm behave as intended. Finally
|
||||||
plot the data together with the best fitting cubic relation
|
plot the data together with the best fitting cubic relation
|
||||||
\eqref{cubicfunc}.
|
\eqref{cubicfunc}.
|
||||||
\end{exercise}
|
\end{exercise}
|
||||||
|
|
||||||
|
The $\epsilon$ parameter in \eqnref{gradientdescent} is critical. If
|
||||||
|
too large, the algorithm does not converge to the minimum of the cost
|
||||||
|
function (try it!). At medium values it oscillates around the minimum
|
||||||
|
but might nevertheless converge. Only for sufficiently small values
|
||||||
|
(here $\epsilon = 0.0001$) does the algorithm follow the slope
|
||||||
|
downwards towards the minimum.
|
||||||
|
|
||||||
|
The terminating condition on the absolute value of the gradient
|
||||||
|
influences how often the cost function is evaluated. The smaller the
|
||||||
|
threshold value the more often the cost is computed and the more
|
||||||
|
precisely the fit parameter is estimated. If it is too small, however,
|
||||||
|
the increase in precision is negligible, in particular in comparison
|
||||||
|
to the increased computational effort. Have a look at the derivatives
|
||||||
|
that we plotted in exercise~\ref{gradientcubic} and decide on a
|
||||||
|
sensible value for the threshold. Run the gradient descent algorithm
|
||||||
|
and check how the resulting $c$ parameter values converge and how many
|
||||||
|
iterations were needed. The reduce the threshold (by factors of ten)
|
||||||
|
and check how this changes the results.
|
||||||
|
|
||||||
|
Many modern algorithms for finding the minimum of a function are based
|
||||||
|
on the basic idea of the gradient descent. Luckily these algorithms
|
||||||
|
choose $\epsilon$ in a smart adaptive way and they also come up with
|
||||||
|
sensible default values for the termination condition. On the other
|
||||||
|
hand, these algorithm often take optional arguments that let you
|
||||||
|
control how they behave. Now you know what this is all about.
|
||||||
|
|
||||||
|
\section{N-dimensional minimization problems}
|
||||||
|
|
||||||
|
So far we were concerned about finding the right value for a single
|
||||||
|
parameter that minimizes a cost function. The gradient descent method
|
||||||
|
for such one dimensional problems seems a bit like over kill. However,
|
||||||
|
often we deal with functions that have more than a single parameter,
|
||||||
|
in general $n$ parameter. We then need to find the minimum in an $n$
|
||||||
|
dimensional parameter space.
|
||||||
|
|
||||||
|
For our tiger problem, we could have also fitted the exponent $\alpha$
|
||||||
|
of the power-law relation between size and weight, instead of assuming
|
||||||
|
a cubic relation:
|
||||||
|
\begin{equation}
|
||||||
|
\label{powerfunc}
|
||||||
|
y = f(x; c, \alpha) = f(x; \vec p) = c\cdot x^\alpha
|
||||||
|
\end{equation}
|
||||||
|
We then could check whether the resulting estimate of the exponent
|
||||||
|
$\alpha$ indeed is close to the expected power of three. The
|
||||||
|
power-law \eqref{powerfunc} has two free parameters $c$ and $\alpha$.
|
||||||
|
Instead of a single parameter we are now dealing with a vector $\vec
|
||||||
|
p$ containing $n$ parameter values. Here, $\vec p = (c,
|
||||||
|
\alpha)$. Luckily, all the concepts we introduced on the example of
|
||||||
|
the one dimensional problem of tiger weights generalize to
|
||||||
|
$n$-dimensional problems. We only need to adapt a few things. The cost
|
||||||
|
function for the mean squared error reads
|
||||||
|
\begin{equation}
|
||||||
|
\label{ndimcostfunc}
|
||||||
|
f_{cost}(\vec p|\{(x_i, y_i)\}) = \frac{1}{N} \sum_{i=1}^N (y_i - f(x_i;\vec p))^2
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
|
For two-dimensional problems the graph of the cost function is an
|
||||||
|
\enterm{error surface} (\determ{{Fehlerfl\"ache}}). The two parameters
|
||||||
|
span a two-dimensional plane. The cost function assigns to each
|
||||||
|
parameter combination on this plane a single value. This results in a
|
||||||
|
landscape over the parameter plane with mountains and valleys and we
|
||||||
|
are searching for the bottom of the deepest valley.
|
||||||
|
|
||||||
|
When we place a ball somewhere on the slopes of a hill it rolls
|
||||||
|
downwards and eventually stops at the bottom. The ball always rolls in
|
||||||
|
the direction of the steepest slope.
|
||||||
|
|
||||||
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
|
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
|
||||||
Some functions that depend on more than a single variable:
|
Some functions that depend on more than a single variable:
|
||||||
\[ z = f(x,y) \]
|
\[ z = f(x,y) \]
|
||||||
@ -388,6 +455,21 @@ that points to the strongest ascend of the objective function. The
|
|||||||
gradient is given by partial derivatives
|
gradient is given by partial derivatives
|
||||||
(Box~\ref{partialderivativebox}) of the mean squared error with
|
(Box~\ref{partialderivativebox}) of the mean squared error with
|
||||||
respect to the parameters $m$ and $b$ of the straight line.
|
respect to the parameters $m$ and $b$ of the straight line.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
For example, you measure the response of a passive membrane to a
|
||||||
|
current step and you want to estimate membrane time constant. Then you
|
||||||
|
need to fit an exponential function
|
||||||
|
\begin{equation}
|
||||||
|
\label{expfunc}
|
||||||
|
V(t; \tau, \Delta V, V_{\infty}) = \Delta V e^{-t/\tau} + V_{\infty}
|
||||||
|
\end{equation}
|
||||||
|
with three free parameters $\tau$, $\Delta y$, $y_{\infty}$ to the
|
||||||
|
measured time course of the membrane potential $V(t)$. The $(x_i,y_i)$
|
||||||
|
data pairs are the sampling times $t_i$ and the corresponding
|
||||||
|
measurements of the membrane potential $V_i$.
|
||||||
|
|
||||||
\section{Summary}
|
\section{Summary}
|
||||||
|
|
||||||
The gradient descent is an important numerical method for solving
|
The gradient descent is an important numerical method for solving
|
||||||
|
Reference in New Issue
Block a user