|
|
|
@ -240,11 +240,11 @@ at the position of the ball.
|
|
|
|
|
\includegraphics{cubicgradient}
|
|
|
|
|
\titlecaption{Derivative of the cost function.}{The gradient, the
|
|
|
|
|
derivative \eqref{costderivative} of the cost function, is
|
|
|
|
|
negative to the left of the minimum of the cost function, zero at,
|
|
|
|
|
and positive to the right of the minimum (left). For each value of
|
|
|
|
|
the parameter $c$ the negative gradient (arrows) points towards
|
|
|
|
|
the minimum of the cost function
|
|
|
|
|
(right).} \label{gradientcubicfig}
|
|
|
|
|
negative to the left of the minimum (vertical line) of the cost
|
|
|
|
|
function, zero (horizontal line) at, and positive to the right of
|
|
|
|
|
the minimum (left). For each value of the parameter $c$ the
|
|
|
|
|
negative gradient (arrows) points towards the minimum of the cost
|
|
|
|
|
function (right).} \label{gradientcubicfig}
|
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
|
|
In our one-dimensional example of a single free parameter the slope is
|
|
|
|
@ -263,9 +263,9 @@ can be approximated numerically by the difference quotient
|
|
|
|
|
\lim\limits_{\Delta c \to 0} \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
|
|
|
|
|
\approx \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
|
|
|
|
|
\end{equation}
|
|
|
|
|
The derivative is positive for positive slopes. Since want to go down
|
|
|
|
|
the hill, we choose the opposite direction (\figref{gradientcubicfig},
|
|
|
|
|
right).
|
|
|
|
|
Choose, for example, $\Delta c = 10^{-7}$. The derivative is positive
|
|
|
|
|
for positive slopes. Since want to go down the hill, we choose the
|
|
|
|
|
opposite direction (\figref{gradientcubicfig}, right).
|
|
|
|
|
|
|
|
|
|
\begin{exercise}{meanSquaredGradientCubic.m}{}\label{gradientcubic}
|
|
|
|
|
Implement a function \varcode{meanSquaredGradientCubic()}, that
|
|
|
|
@ -361,7 +361,7 @@ The $\epsilon$ parameter in \eqnref{gradientdescent} is critical. If
|
|
|
|
|
too large, the algorithm does not converge to the minimum of the cost
|
|
|
|
|
function (try it!). At medium values it oscillates around the minimum
|
|
|
|
|
but might nevertheless converge. Only for sufficiently small values
|
|
|
|
|
(here $\epsilon = 0.0001$) does the algorithm follow the slope
|
|
|
|
|
(here $\epsilon = 0.00001$) does the algorithm follow the slope
|
|
|
|
|
downwards towards the minimum.
|
|
|
|
|
|
|
|
|
|
The terminating condition on the absolute value of the gradient
|
|
|
|
@ -373,7 +373,7 @@ to the increased computational effort. Have a look at the derivatives
|
|
|
|
|
that we plotted in exercise~\ref{gradientcubic} and decide on a
|
|
|
|
|
sensible value for the threshold. Run the gradient descent algorithm
|
|
|
|
|
and check how the resulting $c$ parameter values converge and how many
|
|
|
|
|
iterations were needed. The reduce the threshold (by factors of ten)
|
|
|
|
|
iterations were needed. Then reduce the threshold (by factors of ten)
|
|
|
|
|
and check how this changes the results.
|
|
|
|
|
|
|
|
|
|
Many modern algorithms for finding the minimum of a function are based
|
|
|
|
@ -508,7 +508,8 @@ the sum of the squared partial derivatives:
|
|
|
|
|
\label{ndimabsgradient}
|
|
|
|
|
|\nabla f_{cost}(\vec p_i)| = \sqrt{\sum_{i=1}^n \left(\frac{\partial f_{cost}(\vec p)}{\partial p_i}\right)^2}
|
|
|
|
|
\end{equation}
|
|
|
|
|
The \code{norm()} function implements this.
|
|
|
|
|
The \code{norm()} function implements this given a vector with the
|
|
|
|
|
partial derivatives.
|
|
|
|
|
|
|
|
|
|
\subsection{Passing a function as an argument to another function}
|
|
|
|
|
|
|
|
|
@ -568,10 +569,8 @@ our tiger data-set (\figref{powergradientdescentfig}):
|
|
|
|
|
parameters against each other. Compare the result of the gradient
|
|
|
|
|
descent method with the true values of $c$ and $a$ used to simulate
|
|
|
|
|
the data. Observe the norm of the gradient and inspect the plots to
|
|
|
|
|
adapt $\epsilon$ (smaller than in
|
|
|
|
|
exercise~\ref{plotgradientdescentexercise}) and the threshold (much
|
|
|
|
|
larger) appropriately. Finally plot the data together with the best
|
|
|
|
|
fitting power-law \eqref{powerfunc}.
|
|
|
|
|
adapt $\epsilon$ and the threshold if necessary. Finally plot the
|
|
|
|
|
data together with the best fitting power-law \eqref{powerfunc}.
|
|
|
|
|
\end{exercise}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|