[regression] gradient descent code

2020-12-19 12:20:18 +01:00 · 2020-12-19 12:20:18 +01:00 · 7518f9dd47
commit 7518f9dd47
parent 4b624fe981
3 changed files with 98 additions and 28 deletions
--- a/regression/code/gradientDescentCubic.m
+++ b/regression/code/gradientDescentCubic.m
@ -0,0 +1,25 @@
+function [c, cs, mses] = gradientDescentCubic(x, y, c0, epsilon, threshold)
+% Gradient descent for fitting a cubic relation.
+%
+% Arguments: x, vector of the x-data values.
+%            y, vector of the corresponding y-data values.
+%            c0, initial value for the parameter c. 
+%            epsilon: factor multiplying the gradient.
+%            threshold: minimum value for gradient
+%
+% Returns:   c, the final value of the c-parameter.
+%            cs: vector with all the c-values traversed.
+%            mses: vector with the corresponding mean squared errors
+  c = c0;
+  gradient = 1000.0;
+  cs = [];
+  mses = [];
+  count = 1;
+  while abs(gradient) > threshold
+      cs(count) = c;
+      mses(count) = meanSquaredErrorCubic(x, y, c);
+      gradient = meanSquaredGradientCubic(x, y, c);
+      c = c - epsilon * gradient; 
+      count = count + 1;
+  end
+end
--- a/regression/code/plotgradientdescentcubic.m
+++ b/regression/code/plotgradientdescentcubic.m
@ -0,0 +1,29 @@
+meansquarederrorline                % generate data
+
+c0 = 2.0;
+eps = 0.0001;
+thresh = 0.1;
+[cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh);
+
+subplot(2, 2, 1);                   % top left panel
+hold on;
+plot(cs, '-o');
+plot([1, length(cs)], [c, c], 'k'); % line indicating true c value
+hold off;
+xlabel('Iteration');
+ylabel('C');
+subplot(2, 2, 3);                   % bottom left panel
+plot(mses, '-o');
+xlabel('Iteration steps');
+ylabel('MSE');
+subplot(1, 2, 2);                   % right panel
+hold on;
+% generate x-values for plottig the fit:
+xx = min(x):0.01:max(x);
+yy = cest * xx.^3;
+plot(xx, yy, 'displayname', 'fit');
+plot(x, y, 'o', 'displayname', 'data'); % plot original data
+xlabel('Size [m]');
+ylabel('Weight [kg]');
+legend("location", "northwest");
+pause
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -269,9 +269,21 @@ the hill, we choose the opposite direction.
 \end{exercise}

 \section{Gradient descent}
+
+\begin{figure}[t]
+  \includegraphics{cubicmse}
+  \titlecaption{Gradient descent.}{The algorithm starts at an
+    arbitrary position. At each point the gradient is estimated and
+    the position is updated as long as the length of the gradient is
+    sufficiently large.The dots show the positions after each
+    iteration of the algorithm.} \label{gradientdescentcubicfig}
+\end{figure}
+
 Finally, we are able to implement the optimization itself. By now it
-should be obvious why it is called the gradient descent method. All
-ingredients are already there. We need: (i) the cost function
+should be obvious why it is called the gradient descent method. From a
+starting position on we iteratively walk down the slope of the cost
+function against its gradient. All ingredients necessary for this
+algorithm are already there. We need: (i) the cost function
 (\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
 (\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
 descent works as follows:
@ -292,41 +304,45 @@ descent works as follows:
 \item \label{gradientstep} If the length of the gradient exceeds the
  threshold we take a small step into the opposite direction:
  \begin{equation}
+    \label{gradientdescent}
    p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
  \end{equation}
-  where $\epsilon = 0.01$ is a factor linking the gradient to
+  where $\epsilon$ is a factor linking the gradient to
  appropriate steps in the parameter space.
 \item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
 \end{enumerate}

-\Figref{gradientdescentfig} illustrates the gradient descent --- the
-path the imaginary ball has chosen to reach the minimum. Starting at
-an arbitrary position we change the position as long as the gradient
-at that position is larger than a certain threshold. If the slope is
-very steep, the change in the position (the distance between the red
-dots in \figref{gradientdescentfig}) is large.
+\Figref{gradientdescentcubicfig} illustrates the gradient descent --- the
+path the imaginary ball has chosen to reach the minimum. We walk along
+the parameter axis against the gradient as long as the gradient
+differs sufficiently from zero.  At steep slopes we take large steps
+(the distance between the red dots in \figref{gradientdescentcubicfig}) is
+large.

-\begin{figure}[t]
-  \includegraphics{cubicmse}
-  \titlecaption{Gradient descent.}{The algorithm starts at an
-    arbitrary position. At each point the gradient is estimated and
-    the position is updated as long as the length of the gradient is
-    sufficiently large.The dots show the positions after each
-    iteration of the algorithm.} \label{gradientdescentfig}
-\end{figure}
-
-\begin{exercise}{gradientDescent.m}{}
-  Implement the gradient descent for the problem of fitting a straight
-  line to some measured data. Reuse the data generated in
-  exercise~\ref{errorsurfaceexercise}.
-  \begin{enumerate}
-  \item Store for each iteration the error value.
-  \item Plot the error values as a function of the iterations, the
-    number of optimization steps.
-  \item Plot the measured data together with the best fitting straight line.
-  \end{enumerate}\vspace{-4.5ex}
+\begin{exercise}{gradientDescentCubic.m}{}
+  Implement the gradient descent algorithm for the problem of fitting
+  a cubic function \eqref{cubicfunc} to some measured data pairs $x$
+  and $y$ as a function \varcode{gradientDescentCubic()} that returns
+  the estimated best fitting parameter value $c$ as well as two
+  vectors with all the parameter values and the corresponding values
+  of the cost function that the algorithm iterated trough. As
+  additional arguments that function takes the initial value for the
+  parameter $c$, the factor $\epsilon$ connecting the gradient with
+  iteration steps in \eqnref{gradientdescent}, and the threshold value
+  for the absolute value of the gradient terminating the algorithm.
 \end{exercise}

+\begin{exercise}{plotgradientdescentcubic.m}{}
+  Use the function \varcode{gradientDescentCubic()} to fit the
+  simulated data from exercise~\ref{mseexercise}. Plot the returned
+  values of the parameter $c$ as as well as the corresponding mean
+  squared errors as a function of iteration step (two plots). Compare
+  the result of the gradient descent method with the true value of $c$
+  used to simulate the data. Inspect the plots and adapt $\epsilon$
+  and the threshold to make the algorithm behave as intended. Also
+  plot the data together with the best fitting cubic relation
+  \eqref{cubicfunc}.
+\end{exercise}

 \begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
  Some functions that depend on more than a single variable: