diff --git a/regression/code/gradientDescentCubic.m b/regression/code/gradientDescentCubic.m new file mode 100644 index 0000000..a495221 --- /dev/null +++ b/regression/code/gradientDescentCubic.m @@ -0,0 +1,25 @@ +function [c, cs, mses] = gradientDescentCubic(x, y, c0, epsilon, threshold) +% Gradient descent for fitting a cubic relation. +% +% Arguments: x, vector of the x-data values. +% y, vector of the corresponding y-data values. +% c0, initial value for the parameter c. +% epsilon: factor multiplying the gradient. +% threshold: minimum value for gradient +% +% Returns: c, the final value of the c-parameter. +% cs: vector with all the c-values traversed. +% mses: vector with the corresponding mean squared errors + c = c0; + gradient = 1000.0; + cs = []; + mses = []; + count = 1; + while abs(gradient) > threshold + cs(count) = c; + mses(count) = meanSquaredErrorCubic(x, y, c); + gradient = meanSquaredGradientCubic(x, y, c); + c = c - epsilon * gradient; + count = count + 1; + end +end diff --git a/regression/code/plotgradientdescentcubic.m b/regression/code/plotgradientdescentcubic.m new file mode 100644 index 0000000..4972a92 --- /dev/null +++ b/regression/code/plotgradientdescentcubic.m @@ -0,0 +1,29 @@ +meansquarederrorline % generate data + +c0 = 2.0; +eps = 0.0001; +thresh = 0.1; +[cest, cs, mses] = gradientDescentCubic(x, y, c0, eps, thresh); + +subplot(2, 2, 1); % top left panel +hold on; +plot(cs, '-o'); +plot([1, length(cs)], [c, c], 'k'); % line indicating true c value +hold off; +xlabel('Iteration'); +ylabel('C'); +subplot(2, 2, 3); % bottom left panel +plot(mses, '-o'); +xlabel('Iteration steps'); +ylabel('MSE'); +subplot(1, 2, 2); % right panel +hold on; +% generate x-values for plottig the fit: +xx = min(x):0.01:max(x); +yy = cest * xx.^3; +plot(xx, yy, 'displayname', 'fit'); +plot(x, y, 'o', 'displayname', 'data'); % plot original data +xlabel('Size [m]'); +ylabel('Weight [kg]'); +legend("location", "northwest"); +pause diff --git a/regression/lecture/regression.tex b/regression/lecture/regression.tex index 3f8c045..6e9c591 100644 --- a/regression/lecture/regression.tex +++ b/regression/lecture/regression.tex @@ -269,9 +269,21 @@ the hill, we choose the opposite direction. \end{exercise} \section{Gradient descent} + +\begin{figure}[t] + \includegraphics{cubicmse} + \titlecaption{Gradient descent.}{The algorithm starts at an + arbitrary position. At each point the gradient is estimated and + the position is updated as long as the length of the gradient is + sufficiently large.The dots show the positions after each + iteration of the algorithm.} \label{gradientdescentcubicfig} +\end{figure} + Finally, we are able to implement the optimization itself. By now it -should be obvious why it is called the gradient descent method. All -ingredients are already there. We need: (i) the cost function +should be obvious why it is called the gradient descent method. From a +starting position on we iteratively walk down the slope of the cost +function against its gradient. All ingredients necessary for this +algorithm are already there. We need: (i) the cost function (\varcode{meanSquaredErrorCubic()}), and (ii) the gradient (\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient descent works as follows: @@ -292,41 +304,45 @@ descent works as follows: \item \label{gradientstep} If the length of the gradient exceeds the threshold we take a small step into the opposite direction: \begin{equation} + \label{gradientdescent} p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i) \end{equation} - where $\epsilon = 0.01$ is a factor linking the gradient to + where $\epsilon$ is a factor linking the gradient to appropriate steps in the parameter space. \item Repeat steps \ref{computegradient} -- \ref{gradientstep}. \end{enumerate} -\Figref{gradientdescentfig} illustrates the gradient descent --- the -path the imaginary ball has chosen to reach the minimum. Starting at -an arbitrary position we change the position as long as the gradient -at that position is larger than a certain threshold. If the slope is -very steep, the change in the position (the distance between the red -dots in \figref{gradientdescentfig}) is large. - -\begin{figure}[t] - \includegraphics{cubicmse} - \titlecaption{Gradient descent.}{The algorithm starts at an - arbitrary position. At each point the gradient is estimated and - the position is updated as long as the length of the gradient is - sufficiently large.The dots show the positions after each - iteration of the algorithm.} \label{gradientdescentfig} -\end{figure} - -\begin{exercise}{gradientDescent.m}{} - Implement the gradient descent for the problem of fitting a straight - line to some measured data. Reuse the data generated in - exercise~\ref{errorsurfaceexercise}. - \begin{enumerate} - \item Store for each iteration the error value. - \item Plot the error values as a function of the iterations, the - number of optimization steps. - \item Plot the measured data together with the best fitting straight line. - \end{enumerate}\vspace{-4.5ex} +\Figref{gradientdescentcubicfig} illustrates the gradient descent --- the +path the imaginary ball has chosen to reach the minimum. We walk along +the parameter axis against the gradient as long as the gradient +differs sufficiently from zero. At steep slopes we take large steps +(the distance between the red dots in \figref{gradientdescentcubicfig}) is +large. + +\begin{exercise}{gradientDescentCubic.m}{} + Implement the gradient descent algorithm for the problem of fitting + a cubic function \eqref{cubicfunc} to some measured data pairs $x$ + and $y$ as a function \varcode{gradientDescentCubic()} that returns + the estimated best fitting parameter value $c$ as well as two + vectors with all the parameter values and the corresponding values + of the cost function that the algorithm iterated trough. As + additional arguments that function takes the initial value for the + parameter $c$, the factor $\epsilon$ connecting the gradient with + iteration steps in \eqnref{gradientdescent}, and the threshold value + for the absolute value of the gradient terminating the algorithm. \end{exercise} +\begin{exercise}{plotgradientdescentcubic.m}{} + Use the function \varcode{gradientDescentCubic()} to fit the + simulated data from exercise~\ref{mseexercise}. Plot the returned + values of the parameter $c$ as as well as the corresponding mean + squared errors as a function of iteration step (two plots). Compare + the result of the gradient descent method with the true value of $c$ + used to simulate the data. Inspect the plots and adapt $\epsilon$ + and the threshold to make the algorithm behave as intended. Also + plot the data together with the best fitting cubic relation + \eqref{cubicfunc}. +\end{exercise} \begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient} Some functions that depend on more than a single variable: