[regression] smaller steps for derivative

2020-12-20 13:15:40 +01:00 · 2020-12-20 13:15:40 +01:00 · 17bf940101
commit 17bf940101
parent dea6319e75
6 changed files with 21 additions and 22 deletions
--- a/regression/code/gradientDescent.m
+++ b/regression/code/gradientDescent.m
@ -30,7 +30,7 @@ end

 function gradmse = meanSquaredGradient(x, y, func, p)
  gradmse = zeros(size(p, 1), size(p, 2));
-  h = 1e-5;               % stepsize for derivatives
+  h = 1e-7;               % stepsize for derivatives
  mse = meanSquaredError(x, y, func, p);
  for i = 1:length(p)     % for each coordinate ...
    pi = p;
--- a/regression/code/meanSquaredGradientCubic.m
+++ b/regression/code/meanSquaredGradientCubic.m
@ -7,7 +7,7 @@ function dmsedc = meanSquaredGradientCubic(x, y, c)
 %
 % Returns: the derivative of the mean squared error at c.

-  h = 1e-5;   % stepsize for derivatives
+  h = 1e-7;   % stepsize for derivatives
  mse = meanSquaredErrorCubic(x, y, c);
  mseh = meanSquaredErrorCubic(x, y, c+h);
  dmsedc = (mseh - mse)/h;
--- a/regression/code/plotcubicgradient.m
+++ b/regression/code/plotcubicgradient.m
@ -1,3 +1,5 @@
+meansquarederrorline;               % generate data
+
 cs = 2.0:0.1:8.0;
 mseg = zeros(length(cs));
 for i = 1:length(cs)
--- a/regression/code/plotgradientdescentcubic.m
+++ b/regression/code/plotgradientdescentcubic.m
@ -1,8 +1,8 @@
 meansquarederrorline;               % generate data

 c0 = 2.0;
-eps = 0.0001;
-thresh = 0.1;
+eps = 0.00001;
+thresh = 1.0;
 [cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh);

 subplot(2, 2, 1);                   % top left panel
--- a/regression/code/plotgradientdescentpower.m
+++ b/regression/code/plotgradientdescentpower.m
@ -2,7 +2,7 @@ meansquarederrorline;               % generate data

 p0 = [2.0, 1.0];
 eps = 0.00001;
-thresh = 50.0;
+thresh = 1.0;
 [pest, ps, mses] = gradientDescent(x, y, @powerLaw, p0, eps, thresh);
 pest

@ -28,5 +28,3 @@ plot(x, y, 'o');                    % plot original data
 xlabel('Size [m]');
 ylabel('Weight [kg]');
 legend('fit', 'data', 'location', 'northwest');
-pause
-
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -240,11 +240,11 @@ at the position of the ball.
  \includegraphics{cubicgradient}
  \titlecaption{Derivative of the cost function.}{The gradient, the
    derivative \eqref{costderivative} of the cost function, is
-    negative to the left of the minimum of the cost function, zero at,
-    and positive to the right of the minimum (left). For each value of
-    the parameter $c$ the negative gradient (arrows) points towards
-    the minimum of the cost function
-    (right).} \label{gradientcubicfig}
+    negative to the left of the minimum (vertical line) of the cost
+    function, zero (horizontal line) at, and positive to the right of
+    the minimum (left). For each value of the parameter $c$ the
+    negative gradient (arrows) points towards the minimum of the cost
+    function (right).} \label{gradientcubicfig}
 \end{figure}

 In our one-dimensional example of a single free parameter the slope is
@ -263,9 +263,9 @@ can be approximated numerically by the difference quotient
    \lim\limits_{\Delta c \to 0} \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
    \approx \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
 \end{equation}
-The derivative is positive for positive slopes. Since want to go down
-the hill, we choose the opposite direction (\figref{gradientcubicfig},
-right).
+Choose, for example, $\Delta c = 10^{-7}$.  The derivative is positive
+for positive slopes. Since want to go down the hill, we choose the
+opposite direction (\figref{gradientcubicfig}, right).

 \begin{exercise}{meanSquaredGradientCubic.m}{}\label{gradientcubic}
  Implement a function \varcode{meanSquaredGradientCubic()}, that
@ -361,7 +361,7 @@ The $\epsilon$ parameter in \eqnref{gradientdescent} is critical. If
 too large, the algorithm does not converge to the minimum of the cost
 function (try it!). At medium values it oscillates around the minimum
 but might nevertheless converge. Only for sufficiently small values
-(here $\epsilon = 0.0001$) does the algorithm follow the slope
+(here $\epsilon = 0.00001$) does the algorithm follow the slope
 downwards towards the minimum.

 The terminating condition on the absolute value of the gradient
@ -373,7 +373,7 @@ to the increased computational effort. Have a look at the derivatives
 that we plotted in exercise~\ref{gradientcubic} and decide on a
 sensible value for the threshold. Run the gradient descent algorithm
 and check how the resulting $c$ parameter values converge and how many
-iterations were needed. The reduce the threshold (by factors of ten)
+iterations were needed. Then reduce the threshold (by factors of ten)
 and check how this changes the results.

 Many modern algorithms for finding the minimum of a function are based
@ -508,7 +508,8 @@ the sum of the squared partial derivatives:
  \label{ndimabsgradient}
  |\nabla f_{cost}(\vec p_i)| = \sqrt{\sum_{i=1}^n \left(\frac{\partial f_{cost}(\vec p)}{\partial p_i}\right)^2}
 \end{equation}
-The \code{norm()} function implements this.
+The \code{norm()} function implements this given a vector with the
+partial derivatives.

 \subsection{Passing a function as an argument to another function}

@ -568,10 +569,8 @@ our tiger data-set (\figref{powergradientdescentfig}):
  parameters against each other. Compare the result of the gradient
  descent method with the true values of $c$ and $a$ used to simulate
  the data. Observe the norm of the gradient and inspect the plots to
-  adapt $\epsilon$ (smaller than in
-  exercise~\ref{plotgradientdescentexercise}) and the threshold (much
-  larger) appropriately. Finally plot the data together with the best
-  fitting power-law \eqref{powerfunc}.
+  adapt $\epsilon$ and the threshold if necessary. Finally plot the
+  data together with the best fitting power-law \eqref{powerfunc}.
 \end{exercise}