[regression] smaller steps for derivative

This commit is contained in:
Jan Benda 2020-12-20 13:15:40 +01:00
parent dea6319e75
commit 17bf940101
6 changed files with 21 additions and 22 deletions

View File

@ -30,7 +30,7 @@ end
function gradmse = meanSquaredGradient(x, y, func, p)
gradmse = zeros(size(p, 1), size(p, 2));
h = 1e-5; % stepsize for derivatives
h = 1e-7; % stepsize for derivatives
mse = meanSquaredError(x, y, func, p);
for i = 1:length(p) % for each coordinate ...
pi = p;

View File

@ -7,7 +7,7 @@ function dmsedc = meanSquaredGradientCubic(x, y, c)
%
% Returns: the derivative of the mean squared error at c.
h = 1e-5; % stepsize for derivatives
h = 1e-7; % stepsize for derivatives
mse = meanSquaredErrorCubic(x, y, c);
mseh = meanSquaredErrorCubic(x, y, c+h);
dmsedc = (mseh - mse)/h;

View File

@ -1,3 +1,5 @@
meansquarederrorline; % generate data
cs = 2.0:0.1:8.0;
mseg = zeros(length(cs));
for i = 1:length(cs)

View File

@ -1,8 +1,8 @@
meansquarederrorline; % generate data
c0 = 2.0;
eps = 0.0001;
thresh = 0.1;
eps = 0.00001;
thresh = 1.0;
[cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh);
subplot(2, 2, 1); % top left panel

View File

@ -2,7 +2,7 @@ meansquarederrorline; % generate data
p0 = [2.0, 1.0];
eps = 0.00001;
thresh = 50.0;
thresh = 1.0;
[pest, ps, mses] = gradientDescent(x, y, @powerLaw, p0, eps, thresh);
pest
@ -28,5 +28,3 @@ plot(x, y, 'o'); % plot original data
xlabel('Size [m]');
ylabel('Weight [kg]');
legend('fit', 'data', 'location', 'northwest');
pause

View File

@ -240,11 +240,11 @@ at the position of the ball.
\includegraphics{cubicgradient}
\titlecaption{Derivative of the cost function.}{The gradient, the
derivative \eqref{costderivative} of the cost function, is
negative to the left of the minimum of the cost function, zero at,
and positive to the right of the minimum (left). For each value of
the parameter $c$ the negative gradient (arrows) points towards
the minimum of the cost function
(right).} \label{gradientcubicfig}
negative to the left of the minimum (vertical line) of the cost
function, zero (horizontal line) at, and positive to the right of
the minimum (left). For each value of the parameter $c$ the
negative gradient (arrows) points towards the minimum of the cost
function (right).} \label{gradientcubicfig}
\end{figure}
In our one-dimensional example of a single free parameter the slope is
@ -263,9 +263,9 @@ can be approximated numerically by the difference quotient
\lim\limits_{\Delta c \to 0} \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
\approx \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
\end{equation}
The derivative is positive for positive slopes. Since want to go down
the hill, we choose the opposite direction (\figref{gradientcubicfig},
right).
Choose, for example, $\Delta c = 10^{-7}$. The derivative is positive
for positive slopes. Since want to go down the hill, we choose the
opposite direction (\figref{gradientcubicfig}, right).
\begin{exercise}{meanSquaredGradientCubic.m}{}\label{gradientcubic}
Implement a function \varcode{meanSquaredGradientCubic()}, that
@ -361,7 +361,7 @@ The $\epsilon$ parameter in \eqnref{gradientdescent} is critical. If
too large, the algorithm does not converge to the minimum of the cost
function (try it!). At medium values it oscillates around the minimum
but might nevertheless converge. Only for sufficiently small values
(here $\epsilon = 0.0001$) does the algorithm follow the slope
(here $\epsilon = 0.00001$) does the algorithm follow the slope
downwards towards the minimum.
The terminating condition on the absolute value of the gradient
@ -373,7 +373,7 @@ to the increased computational effort. Have a look at the derivatives
that we plotted in exercise~\ref{gradientcubic} and decide on a
sensible value for the threshold. Run the gradient descent algorithm
and check how the resulting $c$ parameter values converge and how many
iterations were needed. The reduce the threshold (by factors of ten)
iterations were needed. Then reduce the threshold (by factors of ten)
and check how this changes the results.
Many modern algorithms for finding the minimum of a function are based
@ -508,7 +508,8 @@ the sum of the squared partial derivatives:
\label{ndimabsgradient}
|\nabla f_{cost}(\vec p_i)| = \sqrt{\sum_{i=1}^n \left(\frac{\partial f_{cost}(\vec p)}{\partial p_i}\right)^2}
\end{equation}
The \code{norm()} function implements this.
The \code{norm()} function implements this given a vector with the
partial derivatives.
\subsection{Passing a function as an argument to another function}
@ -568,10 +569,8 @@ our tiger data-set (\figref{powergradientdescentfig}):
parameters against each other. Compare the result of the gradient
descent method with the true values of $c$ and $a$ used to simulate
the data. Observe the norm of the gradient and inspect the plots to
adapt $\epsilon$ (smaller than in
exercise~\ref{plotgradientdescentexercise}) and the threshold (much
larger) appropriately. Finally plot the data together with the best
fitting power-law \eqref{powerfunc}.
adapt $\epsilon$ and the threshold if necessary. Finally plot the
data together with the best fitting power-law \eqref{powerfunc}.
\end{exercise}