[regression] gradient descent code
This commit is contained in:
parent
4b624fe981
commit
7518f9dd47
25
regression/code/gradientDescentCubic.m
Normal file
25
regression/code/gradientDescentCubic.m
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
function [c, cs, mses] = gradientDescentCubic(x, y, c0, epsilon, threshold)
|
||||||
|
% Gradient descent for fitting a cubic relation.
|
||||||
|
%
|
||||||
|
% Arguments: x, vector of the x-data values.
|
||||||
|
% y, vector of the corresponding y-data values.
|
||||||
|
% c0, initial value for the parameter c.
|
||||||
|
% epsilon: factor multiplying the gradient.
|
||||||
|
% threshold: minimum value for gradient
|
||||||
|
%
|
||||||
|
% Returns: c, the final value of the c-parameter.
|
||||||
|
% cs: vector with all the c-values traversed.
|
||||||
|
% mses: vector with the corresponding mean squared errors
|
||||||
|
c = c0;
|
||||||
|
gradient = 1000.0;
|
||||||
|
cs = [];
|
||||||
|
mses = [];
|
||||||
|
count = 1;
|
||||||
|
while abs(gradient) > threshold
|
||||||
|
cs(count) = c;
|
||||||
|
mses(count) = meanSquaredErrorCubic(x, y, c);
|
||||||
|
gradient = meanSquaredGradientCubic(x, y, c);
|
||||||
|
c = c - epsilon * gradient;
|
||||||
|
count = count + 1;
|
||||||
|
end
|
||||||
|
end
|
29
regression/code/plotgradientdescentcubic.m
Normal file
29
regression/code/plotgradientdescentcubic.m
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
meansquarederrorline % generate data
|
||||||
|
|
||||||
|
c0 = 2.0;
|
||||||
|
eps = 0.0001;
|
||||||
|
thresh = 0.1;
|
||||||
|
[cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh);
|
||||||
|
|
||||||
|
subplot(2, 2, 1); % top left panel
|
||||||
|
hold on;
|
||||||
|
plot(cs, '-o');
|
||||||
|
plot([1, length(cs)], [c, c], 'k'); % line indicating true c value
|
||||||
|
hold off;
|
||||||
|
xlabel('Iteration');
|
||||||
|
ylabel('C');
|
||||||
|
subplot(2, 2, 3); % bottom left panel
|
||||||
|
plot(mses, '-o');
|
||||||
|
xlabel('Iteration steps');
|
||||||
|
ylabel('MSE');
|
||||||
|
subplot(1, 2, 2); % right panel
|
||||||
|
hold on;
|
||||||
|
% generate x-values for plottig the fit:
|
||||||
|
xx = min(x):0.01:max(x);
|
||||||
|
yy = cest * xx.^3;
|
||||||
|
plot(xx, yy, 'displayname', 'fit');
|
||||||
|
plot(x, y, 'o', 'displayname', 'data'); % plot original data
|
||||||
|
xlabel('Size [m]');
|
||||||
|
ylabel('Weight [kg]');
|
||||||
|
legend("location", "northwest");
|
||||||
|
pause
|
@ -269,9 +269,21 @@ the hill, we choose the opposite direction.
|
|||||||
\end{exercise}
|
\end{exercise}
|
||||||
|
|
||||||
\section{Gradient descent}
|
\section{Gradient descent}
|
||||||
|
|
||||||
|
\begin{figure}[t]
|
||||||
|
\includegraphics{cubicmse}
|
||||||
|
\titlecaption{Gradient descent.}{The algorithm starts at an
|
||||||
|
arbitrary position. At each point the gradient is estimated and
|
||||||
|
the position is updated as long as the length of the gradient is
|
||||||
|
sufficiently large.The dots show the positions after each
|
||||||
|
iteration of the algorithm.} \label{gradientdescentcubicfig}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
Finally, we are able to implement the optimization itself. By now it
|
Finally, we are able to implement the optimization itself. By now it
|
||||||
should be obvious why it is called the gradient descent method. All
|
should be obvious why it is called the gradient descent method. From a
|
||||||
ingredients are already there. We need: (i) the cost function
|
starting position on we iteratively walk down the slope of the cost
|
||||||
|
function against its gradient. All ingredients necessary for this
|
||||||
|
algorithm are already there. We need: (i) the cost function
|
||||||
(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
|
(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
|
||||||
(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
|
(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
|
||||||
descent works as follows:
|
descent works as follows:
|
||||||
@ -292,41 +304,45 @@ descent works as follows:
|
|||||||
\item \label{gradientstep} If the length of the gradient exceeds the
|
\item \label{gradientstep} If the length of the gradient exceeds the
|
||||||
threshold we take a small step into the opposite direction:
|
threshold we take a small step into the opposite direction:
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
|
\label{gradientdescent}
|
||||||
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
|
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
|
||||||
\end{equation}
|
\end{equation}
|
||||||
where $\epsilon = 0.01$ is a factor linking the gradient to
|
where $\epsilon$ is a factor linking the gradient to
|
||||||
appropriate steps in the parameter space.
|
appropriate steps in the parameter space.
|
||||||
\item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
|
\item Repeat steps \ref{computegradient} -- \ref{gradientstep}.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
\Figref{gradientdescentfig} illustrates the gradient descent --- the
|
\Figref{gradientdescentcubicfig} illustrates the gradient descent --- the
|
||||||
path the imaginary ball has chosen to reach the minimum. Starting at
|
path the imaginary ball has chosen to reach the minimum. We walk along
|
||||||
an arbitrary position we change the position as long as the gradient
|
the parameter axis against the gradient as long as the gradient
|
||||||
at that position is larger than a certain threshold. If the slope is
|
differs sufficiently from zero. At steep slopes we take large steps
|
||||||
very steep, the change in the position (the distance between the red
|
(the distance between the red dots in \figref{gradientdescentcubicfig}) is
|
||||||
dots in \figref{gradientdescentfig}) is large.
|
large.
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{exercise}{gradientDescentCubic.m}{}
|
||||||
\includegraphics{cubicmse}
|
Implement the gradient descent algorithm for the problem of fitting
|
||||||
\titlecaption{Gradient descent.}{The algorithm starts at an
|
a cubic function \eqref{cubicfunc} to some measured data pairs $x$
|
||||||
arbitrary position. At each point the gradient is estimated and
|
and $y$ as a function \varcode{gradientDescentCubic()} that returns
|
||||||
the position is updated as long as the length of the gradient is
|
the estimated best fitting parameter value $c$ as well as two
|
||||||
sufficiently large.The dots show the positions after each
|
vectors with all the parameter values and the corresponding values
|
||||||
iteration of the algorithm.} \label{gradientdescentfig}
|
of the cost function that the algorithm iterated trough. As
|
||||||
\end{figure}
|
additional arguments that function takes the initial value for the
|
||||||
|
parameter $c$, the factor $\epsilon$ connecting the gradient with
|
||||||
\begin{exercise}{gradientDescent.m}{}
|
iteration steps in \eqnref{gradientdescent}, and the threshold value
|
||||||
Implement the gradient descent for the problem of fitting a straight
|
for the absolute value of the gradient terminating the algorithm.
|
||||||
line to some measured data. Reuse the data generated in
|
|
||||||
exercise~\ref{errorsurfaceexercise}.
|
|
||||||
\begin{enumerate}
|
|
||||||
\item Store for each iteration the error value.
|
|
||||||
\item Plot the error values as a function of the iterations, the
|
|
||||||
number of optimization steps.
|
|
||||||
\item Plot the measured data together with the best fitting straight line.
|
|
||||||
\end{enumerate}\vspace{-4.5ex}
|
|
||||||
\end{exercise}
|
\end{exercise}
|
||||||
|
|
||||||
|
\begin{exercise}{plotgradientdescentcubic.m}{}
|
||||||
|
Use the function \varcode{gradientDescentCubic()} to fit the
|
||||||
|
simulated data from exercise~\ref{mseexercise}. Plot the returned
|
||||||
|
values of the parameter $c$ as as well as the corresponding mean
|
||||||
|
squared errors as a function of iteration step (two plots). Compare
|
||||||
|
the result of the gradient descent method with the true value of $c$
|
||||||
|
used to simulate the data. Inspect the plots and adapt $\epsilon$
|
||||||
|
and the threshold to make the algorithm behave as intended. Also
|
||||||
|
plot the data together with the best fitting cubic relation
|
||||||
|
\eqref{cubicfunc}.
|
||||||
|
\end{exercise}
|
||||||
|
|
||||||
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
|
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
|
||||||
Some functions that depend on more than a single variable:
|
Some functions that depend on more than a single variable:
|
||||||
|
Reference in New Issue
Block a user