[regression] gradient descent code

This commit is contained in:
Jan Benda 2020-12-19 12:20:18 +01:00
parent 4b624fe981
commit 7518f9dd47
3 changed files with 99 additions and 29 deletions

View File

@ -0,0 +1,25 @@
function [c, cs, mses] = gradientDescentCubic(x, y, c0, epsilon, threshold)
% Gradient descent for fitting a cubic relation.
%
% Arguments: x, vector of the x-data values.
% y, vector of the corresponding y-data values.
% c0, initial value for the parameter c.
% epsilon: factor multiplying the gradient.
% threshold: minimum value for gradient
%
% Returns: c, the final value of the c-parameter.
% cs: vector with all the c-values traversed.
% mses: vector with the corresponding mean squared errors
c = c0;
gradient = 1000.0;
cs = [];
mses = [];
count = 1;
while abs(gradient) > threshold
cs(count) = c;
mses(count) = meanSquaredErrorCubic(x, y, c);
gradient = meanSquaredGradientCubic(x, y, c);
c = c - epsilon * gradient;
count = count + 1;
end
end

View File

@ -0,0 +1,29 @@
meansquarederrorline % generate data
c0 = 2.0;
eps = 0.0001;
thresh = 0.1;
[cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh);
subplot(2, 2, 1); % top left panel
hold on;
plot(cs, '-o');
plot([1, length(cs)], [c, c], 'k'); % line indicating true c value
hold off;
xlabel('Iteration');
ylabel('C');
subplot(2, 2, 3); % bottom left panel
plot(mses, '-o');
xlabel('Iteration steps');
ylabel('MSE');
subplot(1, 2, 2); % right panel
hold on;
% generate x-values for plottig the fit:
xx = min(x):0.01:max(x);
yy = cest * xx.^3;
plot(xx, yy, 'displayname', 'fit');
plot(x, y, 'o', 'displayname', 'data'); % plot original data
xlabel('Size [m]');
ylabel('Weight [kg]');
legend("location", "northwest");
pause

View File

@ -269,9 +269,21 @@ the hill, we choose the opposite direction.
\end{exercise} \end{exercise}
\section{Gradient descent} \section{Gradient descent}
\begin{figure}[t]
\includegraphics{cubicmse}
\titlecaption{Gradient descent.}{The algorithm starts at an
arbitrary position. At each point the gradient is estimated and
the position is updated as long as the length of the gradient is
sufficiently large.The dots show the positions after each
iteration of the algorithm.} \label{gradientdescentcubicfig}
\end{figure}
Finally, we are able to implement the optimization itself. By now it Finally, we are able to implement the optimization itself. By now it
should be obvious why it is called the gradient descent method. All should be obvious why it is called the gradient descent method. From a
ingredients are already there. We need: (i) the cost function starting position on we iteratively walk down the slope of the cost
function against its gradient. All ingredients necessary for this
algorithm are already there. We need: (i) the cost function
(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient (\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient (\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
descent works as follows: descent works as follows:
@ -292,41 +304,45 @@ descent works as follows:
\item \label{gradientstep} If the length of the gradient exceeds the \item \label{gradientstep} If the length of the gradient exceeds the
threshold we take a small step into the opposite direction: threshold we take a small step into the opposite direction:
\begin{equation} \begin{equation}
\label{gradientdescent}
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i) p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
\end{equation} \end{equation}
where $\epsilon = 0.01$ is a factor linking the gradient to where $\epsilon$ is a factor linking the gradient to
appropriate steps in the parameter space. appropriate steps in the parameter space.
\item Repeat steps \ref{computegradient} -- \ref{gradientstep}. \item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
\end{enumerate} \end{enumerate}
\Figref{gradientdescentfig} illustrates the gradient descent --- the \Figref{gradientdescentcubicfig} illustrates the gradient descent --- the
path the imaginary ball has chosen to reach the minimum. Starting at path the imaginary ball has chosen to reach the minimum. We walk along
an arbitrary position we change the position as long as the gradient the parameter axis against the gradient as long as the gradient
at that position is larger than a certain threshold. If the slope is differs sufficiently from zero. At steep slopes we take large steps
very steep, the change in the position (the distance between the red (the distance between the red dots in \figref{gradientdescentcubicfig}) is
dots in \figref{gradientdescentfig}) is large. large.
\begin{figure}[t] \begin{exercise}{gradientDescentCubic.m}{}
\includegraphics{cubicmse} Implement the gradient descent algorithm for the problem of fitting
\titlecaption{Gradient descent.}{The algorithm starts at an a cubic function \eqref{cubicfunc} to some measured data pairs $x$
arbitrary position. At each point the gradient is estimated and and $y$ as a function \varcode{gradientDescentCubic()} that returns
the position is updated as long as the length of the gradient is the estimated best fitting parameter value $c$ as well as two
sufficiently large.The dots show the positions after each vectors with all the parameter values and the corresponding values
iteration of the algorithm.} \label{gradientdescentfig} of the cost function that the algorithm iterated trough. As
\end{figure} additional arguments that function takes the initial value for the
parameter $c$, the factor $\epsilon$ connecting the gradient with
\begin{exercise}{gradientDescent.m}{} iteration steps in \eqnref{gradientdescent}, and the threshold value
Implement the gradient descent for the problem of fitting a straight for the absolute value of the gradient terminating the algorithm.
line to some measured data. Reuse the data generated in
exercise~\ref{errorsurfaceexercise}.
\begin{enumerate}
\item Store for each iteration the error value.
\item Plot the error values as a function of the iterations, the
number of optimization steps.
\item Plot the measured data together with the best fitting straight line.
\end{enumerate}\vspace{-4.5ex}
\end{exercise} \end{exercise}
\begin{exercise}{plotgradientdescentcubic.m}{}
Use the function \varcode{gradientDescentCubic()} to fit the
simulated data from exercise~\ref{mseexercise}. Plot the returned
values of the parameter $c$ as as well as the corresponding mean
squared errors as a function of iteration step (two plots). Compare
the result of the gradient descent method with the true value of $c$
used to simulate the data. Inspect the plots and adapt $\epsilon$
and the threshold to make the algorithm behave as intended. Also
plot the data together with the best fitting cubic relation
\eqref{cubicfunc}.
\end{exercise}
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient} \begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
Some functions that depend on more than a single variable: Some functions that depend on more than a single variable: