[regression] gradient descent code

2020-12-19 12:20:18 +01:00
parent 4b624fe981
commit 7518f9dd47
3 changed files with 98 additions and 28 deletions
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@@ -269,9 +269,21 @@ the hill, we choose the opposite direction.
 \end{exercise}

 \section{Gradient descent}
+
+\begin{figure}[t]
+  \includegraphics{cubicmse}
+  \titlecaption{Gradient descent.}{The algorithm starts at an
+    arbitrary position. At each point the gradient is estimated and
+    the position is updated as long as the length of the gradient is
+    sufficiently large.The dots show the positions after each
+    iteration of the algorithm.} \label{gradientdescentcubicfig}
+\end{figure}
+
 Finally, we are able to implement the optimization itself. By now it
-should be obvious why it is called the gradient descent method. All
-ingredients are already there. We need: (i) the cost function
+should be obvious why it is called the gradient descent method. From a
+starting position on we iteratively walk down the slope of the cost
+function against its gradient. All ingredients necessary for this
+algorithm are already there. We need: (i) the cost function
 (\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
 (\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
 descent works as follows:
@@ -292,41 +304,45 @@ descent works as follows:
 \item \label{gradientstep} If the length of the gradient exceeds the
  threshold we take a small step into the opposite direction:
  \begin{equation}
+    \label{gradientdescent}
    p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
  \end{equation}
-  where $\epsilon = 0.01$ is a factor linking the gradient to
+  where $\epsilon$ is a factor linking the gradient to
  appropriate steps in the parameter space.
 \item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
 \end{enumerate}

-\Figref{gradientdescentfig} illustrates the gradient descent --- the
-path the imaginary ball has chosen to reach the minimum. Starting at
-an arbitrary position we change the position as long as the gradient
-at that position is larger than a certain threshold. If the slope is
-very steep, the change in the position (the distance between the red
-dots in \figref{gradientdescentfig}) is large.
+\Figref{gradientdescentcubicfig} illustrates the gradient descent --- the
+path the imaginary ball has chosen to reach the minimum. We walk along
+the parameter axis against the gradient as long as the gradient
+differs sufficiently from zero.  At steep slopes we take large steps
+(the distance between the red dots in \figref{gradientdescentcubicfig}) is
+large.

-\begin{figure}[t]
-  \includegraphics{cubicmse}
-  \titlecaption{Gradient descent.}{The algorithm starts at an
-    arbitrary position. At each point the gradient is estimated and
-    the position is updated as long as the length of the gradient is
-    sufficiently large.The dots show the positions after each
-    iteration of the algorithm.} \label{gradientdescentfig}
-\end{figure}
-
-\begin{exercise}{gradientDescent.m}{}
-  Implement the gradient descent for the problem of fitting a straight
-  line to some measured data. Reuse the data generated in
-  exercise~\ref{errorsurfaceexercise}.
-  \begin{enumerate}
-  \item Store for each iteration the error value.
-  \item Plot the error values as a function of the iterations, the
-    number of optimization steps.
-  \item Plot the measured data together with the best fitting straight line.
-  \end{enumerate}\vspace{-4.5ex}
+\begin{exercise}{gradientDescentCubic.m}{}
+  Implement the gradient descent algorithm for the problem of fitting
+  a cubic function \eqref{cubicfunc} to some measured data pairs $x$
+  and $y$ as a function \varcode{gradientDescentCubic()} that returns
+  the estimated best fitting parameter value $c$ as well as two
+  vectors with all the parameter values and the corresponding values
+  of the cost function that the algorithm iterated trough. As
+  additional arguments that function takes the initial value for the
+  parameter $c$, the factor $\epsilon$ connecting the gradient with
+  iteration steps in \eqnref{gradientdescent}, and the threshold value
+  for the absolute value of the gradient terminating the algorithm.
 \end{exercise}

+\begin{exercise}{plotgradientdescentcubic.m}{}
+  Use the function \varcode{gradientDescentCubic()} to fit the
+  simulated data from exercise~\ref{mseexercise}. Plot the returned
+  values of the parameter $c$ as as well as the corresponding mean
+  squared errors as a function of iteration step (two plots). Compare
+  the result of the gradient descent method with the true value of $c$
+  used to simulate the data. Inspect the plots and adapt $\epsilon$
+  and the threshold to make the algorithm behave as intended. Also
+  plot the data together with the best fitting cubic relation
+  \eqref{cubicfunc}.
+\end{exercise}

 \begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
  Some functions that depend on more than a single variable: