[regression] gradient descent code

2020-12-19 12:20:18 +01:00 · 2020-12-19 12:20:18 +01:00 · 7518f9dd47
commit 7518f9dd47
parent 4b624fe981
3 changed files with 98 additions and 28 deletions
--- a/regression/code/gradientDescentCubic.m
+++ b/regression/code/gradientDescentCubic.m
@ -0,0 +1,25 @@
 function [c, cs, mses] = gradientDescentCubic(x, y, c0, epsilon, threshold)
 % Gradient descent for fitting a cubic relation.
 %
 % Arguments: x, vector of the x-data values.
 %            y, vector of the corresponding y-data values.
 %            c0, initial value for the parameter c. 
 %            epsilon: factor multiplying the gradient.
 %            threshold: minimum value for gradient
 %
 % Returns:   c, the final value of the c-parameter.
 %            cs: vector with all the c-values traversed.
 %            mses: vector with the corresponding mean squared errors
  c = c0;
  gradient = 1000.0;
  cs = [];
  mses = [];
  count = 1;
  while abs(gradient) > threshold
      cs(count) = c;
      mses(count) = meanSquaredErrorCubic(x, y, c);
      gradient = meanSquaredGradientCubic(x, y, c);
      c = c - epsilon * gradient; 
      count = count + 1;
  end
 end
--- a/regression/code/plotgradientdescentcubic.m
+++ b/regression/code/plotgradientdescentcubic.m
@ -0,0 +1,29 @@
 meansquarederrorline                % generate data
 c0 = 2.0;
 eps = 0.0001;
 thresh = 0.1;
 [cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh);
 subplot(2, 2, 1);                   % top left panel
 hold on;
 plot(cs, '-o');
 plot([1, length(cs)], [c, c], 'k'); % line indicating true c value
 hold off;
 xlabel('Iteration');
 ylabel('C');
 subplot(2, 2, 3);                   % bottom left panel
 plot(mses, '-o');
 xlabel('Iteration steps');
 ylabel('MSE');
 subplot(1, 2, 2);                   % right panel
 hold on;
 % generate x-values for plottig the fit:
 xx = min(x):0.01:max(x);
 yy = cest * xx.^3;
 plot(xx, yy, 'displayname', 'fit');
 plot(x, y, 'o', 'displayname', 'data'); % plot original data
 xlabel('Size [m]');
 ylabel('Weight [kg]');
 legend("location", "northwest");
 pause
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -269,9 +269,21 @@ the hill, we choose the opposite direction.
 \end{exercise}
 \section{Gradient descent}
 \begin{figure}[t]
  \includegraphics{cubicmse}
  \titlecaption{Gradient descent.}{The algorithm starts at an
    arbitrary position. At each point the gradient is estimated and
    the position is updated as long as the length of the gradient is
    sufficiently large.The dots show the positions after each
    iteration of the algorithm.} \label{gradientdescentcubicfig}
 \end{figure}
 Finally, we are able to implement the optimization itself. By now it
-should be obvious why it is called the gradient descent method. All
+should be obvious why it is called the gradient descent method. From a
-ingredients are already there. We need: (i) the cost function
+starting position on we iteratively walk down the slope of the cost
 function against its gradient. All ingredients necessary for this
 algorithm are already there. We need: (i) the cost function
 (\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
 (\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
 descent works as follows:
@ -292,41 +304,45 @@ descent works as follows:
 \item \label{gradientstep} If the length of the gradient exceeds the
  threshold we take a small step into the opposite direction:
  \begin{equation}
    \label{gradientdescent}
    p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
  \end{equation}
-  where $\epsilon = 0.01$ is a factor linking the gradient to
+  where $\epsilon$ is a factor linking the gradient to
  appropriate steps in the parameter space.
 \item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
 \end{enumerate}
-\Figref{gradientdescentfig} illustrates the gradient descent --- the
+\Figref{gradientdescentcubicfig} illustrates the gradient descent --- the
-path the imaginary ball has chosen to reach the minimum. Starting at
+path the imaginary ball has chosen to reach the minimum. We walk along
-an arbitrary position we change the position as long as the gradient
+the parameter axis against the gradient as long as the gradient
-at that position is larger than a certain threshold. If the slope is
+differs sufficiently from zero.  At steep slopes we take large steps
-very steep, the change in the position (the distance between the red
+(the distance between the red dots in \figref{gradientdescentcubicfig}) is
-dots in \figref{gradientdescentfig}) is large.
+large.
-\begin{figure}[t]
+\begin{exercise}{gradientDescentCubic.m}{}
-  \includegraphics{cubicmse}
+  Implement the gradient descent algorithm for the problem of fitting
-  \titlecaption{Gradient descent.}{The algorithm starts at an
+  a cubic function \eqref{cubicfunc} to some measured data pairs $x$
-    arbitrary position. At each point the gradient is estimated and
+  and $y$ as a function \varcode{gradientDescentCubic()} that returns
-    the position is updated as long as the length of the gradient is
+  the estimated best fitting parameter value $c$ as well as two
-    sufficiently large.The dots show the positions after each
+  vectors with all the parameter values and the corresponding values
-    iteration of the algorithm.} \label{gradientdescentfig}
+  of the cost function that the algorithm iterated trough. As
-\end{figure}
+  additional arguments that function takes the initial value for the
-
+  parameter $c$, the factor $\epsilon$ connecting the gradient with
-\begin{exercise}{gradientDescent.m}{}
+  iteration steps in \eqnref{gradientdescent}, and the threshold value
-  Implement the gradient descent for the problem of fitting a straight
+  for the absolute value of the gradient terminating the algorithm.
  line to some measured data. Reuse the data generated in
  exercise~\ref{errorsurfaceexercise}.
  \begin{enumerate}
  \item Store for each iteration the error value.
  \item Plot the error values as a function of the iterations, the
    number of optimization steps.
  \item Plot the measured data together with the best fitting straight line.
  \end{enumerate}\vspace{-4.5ex}
 \end{exercise}
 \begin{exercise}{plotgradientdescentcubic.m}{}
  Use the function \varcode{gradientDescentCubic()} to fit the
  simulated data from exercise~\ref{mseexercise}. Plot the returned
  values of the parameter $c$ as as well as the corresponding mean
  squared errors as a function of iteration step (two plots). Compare
  the result of the gradient descent method with the true value of $c$
  used to simulate the data. Inspect the plots and adapt $\epsilon$
  and the threshold to make the algorithm behave as intended. Also
  plot the data together with the best fitting cubic relation
  \eqref{cubicfunc}.
 \end{exercise}
 \begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
  Some functions that depend on more than a single variable: