[regression] smaller steps for derivative

This commit is contained in:
Jan Benda 2020-12-20 13:15:40 +01:00
parent dea6319e75
commit 17bf940101
6 changed files with 21 additions and 22 deletions

View File

@ -30,7 +30,7 @@ end
function gradmse = meanSquaredGradient(x, y, func, p) function gradmse = meanSquaredGradient(x, y, func, p)
gradmse = zeros(size(p, 1), size(p, 2)); gradmse = zeros(size(p, 1), size(p, 2));
h = 1e-5; % stepsize for derivatives h = 1e-7; % stepsize for derivatives
mse = meanSquaredError(x, y, func, p); mse = meanSquaredError(x, y, func, p);
for i = 1:length(p) % for each coordinate ... for i = 1:length(p) % for each coordinate ...
pi = p; pi = p;

View File

@ -7,7 +7,7 @@ function dmsedc = meanSquaredGradientCubic(x, y, c)
% %
% Returns: the derivative of the mean squared error at c. % Returns: the derivative of the mean squared error at c.
h = 1e-5; % stepsize for derivatives h = 1e-7; % stepsize for derivatives
mse = meanSquaredErrorCubic(x, y, c); mse = meanSquaredErrorCubic(x, y, c);
mseh = meanSquaredErrorCubic(x, y, c+h); mseh = meanSquaredErrorCubic(x, y, c+h);
dmsedc = (mseh - mse)/h; dmsedc = (mseh - mse)/h;

View File

@ -1,3 +1,5 @@
meansquarederrorline; % generate data
cs = 2.0:0.1:8.0; cs = 2.0:0.1:8.0;
mseg = zeros(length(cs)); mseg = zeros(length(cs));
for i = 1:length(cs) for i = 1:length(cs)

View File

@ -1,8 +1,8 @@
meansquarederrorline; % generate data meansquarederrorline; % generate data
c0 = 2.0; c0 = 2.0;
eps = 0.0001; eps = 0.00001;
thresh = 0.1; thresh = 1.0;
[cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh); [cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh);
subplot(2, 2, 1); % top left panel subplot(2, 2, 1); % top left panel

View File

@ -2,7 +2,7 @@ meansquarederrorline; % generate data
p0 = [2.0, 1.0]; p0 = [2.0, 1.0];
eps = 0.00001; eps = 0.00001;
thresh = 50.0; thresh = 1.0;
[pest, ps, mses] = gradientDescent(x, y, @powerLaw, p0, eps, thresh); [pest, ps, mses] = gradientDescent(x, y, @powerLaw, p0, eps, thresh);
pest pest
@ -28,5 +28,3 @@ plot(x, y, 'o'); % plot original data
xlabel('Size [m]'); xlabel('Size [m]');
ylabel('Weight [kg]'); ylabel('Weight [kg]');
legend('fit', 'data', 'location', 'northwest'); legend('fit', 'data', 'location', 'northwest');
pause

View File

@ -240,11 +240,11 @@ at the position of the ball.
\includegraphics{cubicgradient} \includegraphics{cubicgradient}
\titlecaption{Derivative of the cost function.}{The gradient, the \titlecaption{Derivative of the cost function.}{The gradient, the
derivative \eqref{costderivative} of the cost function, is derivative \eqref{costderivative} of the cost function, is
negative to the left of the minimum of the cost function, zero at, negative to the left of the minimum (vertical line) of the cost
and positive to the right of the minimum (left). For each value of function, zero (horizontal line) at, and positive to the right of
the parameter $c$ the negative gradient (arrows) points towards the minimum (left). For each value of the parameter $c$ the
the minimum of the cost function negative gradient (arrows) points towards the minimum of the cost
(right).} \label{gradientcubicfig} function (right).} \label{gradientcubicfig}
\end{figure} \end{figure}
In our one-dimensional example of a single free parameter the slope is In our one-dimensional example of a single free parameter the slope is
@ -263,9 +263,9 @@ can be approximated numerically by the difference quotient
\lim\limits_{\Delta c \to 0} \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c} \lim\limits_{\Delta c \to 0} \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
\approx \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c} \approx \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
\end{equation} \end{equation}
The derivative is positive for positive slopes. Since want to go down Choose, for example, $\Delta c = 10^{-7}$. The derivative is positive
the hill, we choose the opposite direction (\figref{gradientcubicfig}, for positive slopes. Since want to go down the hill, we choose the
right). opposite direction (\figref{gradientcubicfig}, right).
\begin{exercise}{meanSquaredGradientCubic.m}{}\label{gradientcubic} \begin{exercise}{meanSquaredGradientCubic.m}{}\label{gradientcubic}
Implement a function \varcode{meanSquaredGradientCubic()}, that Implement a function \varcode{meanSquaredGradientCubic()}, that
@ -361,7 +361,7 @@ The $\epsilon$ parameter in \eqnref{gradientdescent} is critical. If
too large, the algorithm does not converge to the minimum of the cost too large, the algorithm does not converge to the minimum of the cost
function (try it!). At medium values it oscillates around the minimum function (try it!). At medium values it oscillates around the minimum
but might nevertheless converge. Only for sufficiently small values but might nevertheless converge. Only for sufficiently small values
(here $\epsilon = 0.0001$) does the algorithm follow the slope (here $\epsilon = 0.00001$) does the algorithm follow the slope
downwards towards the minimum. downwards towards the minimum.
The terminating condition on the absolute value of the gradient The terminating condition on the absolute value of the gradient
@ -373,7 +373,7 @@ to the increased computational effort. Have a look at the derivatives
that we plotted in exercise~\ref{gradientcubic} and decide on a that we plotted in exercise~\ref{gradientcubic} and decide on a
sensible value for the threshold. Run the gradient descent algorithm sensible value for the threshold. Run the gradient descent algorithm
and check how the resulting $c$ parameter values converge and how many and check how the resulting $c$ parameter values converge and how many
iterations were needed. The reduce the threshold (by factors of ten) iterations were needed. Then reduce the threshold (by factors of ten)
and check how this changes the results. and check how this changes the results.
Many modern algorithms for finding the minimum of a function are based Many modern algorithms for finding the minimum of a function are based
@ -508,7 +508,8 @@ the sum of the squared partial derivatives:
\label{ndimabsgradient} \label{ndimabsgradient}
|\nabla f_{cost}(\vec p_i)| = \sqrt{\sum_{i=1}^n \left(\frac{\partial f_{cost}(\vec p)}{\partial p_i}\right)^2} |\nabla f_{cost}(\vec p_i)| = \sqrt{\sum_{i=1}^n \left(\frac{\partial f_{cost}(\vec p)}{\partial p_i}\right)^2}
\end{equation} \end{equation}
The \code{norm()} function implements this. The \code{norm()} function implements this given a vector with the
partial derivatives.
\subsection{Passing a function as an argument to another function} \subsection{Passing a function as an argument to another function}
@ -568,10 +569,8 @@ our tiger data-set (\figref{powergradientdescentfig}):
parameters against each other. Compare the result of the gradient parameters against each other. Compare the result of the gradient
descent method with the true values of $c$ and $a$ used to simulate descent method with the true values of $c$ and $a$ used to simulate
the data. Observe the norm of the gradient and inspect the plots to the data. Observe the norm of the gradient and inspect the plots to
adapt $\epsilon$ (smaller than in adapt $\epsilon$ and the threshold if necessary. Finally plot the
exercise~\ref{plotgradientdescentexercise}) and the threshold (much data together with the best fitting power-law \eqref{powerfunc}.
larger) appropriately. Finally plot the data together with the best
fitting power-law \eqref{powerfunc}.
\end{exercise} \end{exercise}