[regression] finished gradient section

This commit is contained in:
Jan Benda 2020-12-18 23:37:17 +01:00
parent fefe1c3726
commit ca54936245
4 changed files with 115 additions and 111 deletions

View File

@ -0,0 +1,14 @@
function dmsedc = meanSquaredGradientCubic(x, y, c)
% The gradient of the mean squared error for a cubic relation.
%
% Arguments: x, vector of the x-data values
% y, vector of the corresponding y-data values
% c, the factor for the cubic relation.
%
% Returns: the derivative of the mean squared error at c.
h = 1e-5; % stepsize for derivatives
mse = meanSquaredErrorCubic(x, y, c);
mseh = meanSquaredErrorCubic(x, y, c+h);
dmsedc = (mseh - mse)/h;
end

View File

@ -0,0 +1,9 @@
cs = 2.0:0.1:8.0;
mseg = zeros(length(cs));
for i = 1:length(cs)
mseg(i) = meanSquaredGradientCubic(x, y, cs(i));
end
plot(cs, mseg)
xlabel('c')
ylabel('gradient')

View File

@ -25,19 +25,11 @@
\subsection{Start with one-dimensional problem!} \subsection{Start with one-dimensional problem!}
\begin{itemize} \begin{itemize}
\item Let's fit a cubic function $y=cx^3$ (weight versus length of a tiger)\\
\includegraphics[width=0.8\textwidth]{cubicfunc}
\item Introduce the problem, $c$ is density and form factor
\item How to generate an artificial data set (refer to simulation chapter)
\item How to plot a function (do not use the data x values!) \item How to plot a function (do not use the data x values!)
\item Just the mean square error as a function of the factor c\\
\includegraphics[width=0.8\textwidth]{cubicerrors}
\item Also mention the cost function for a straight line
\item 1-d gradient, NO quiver plot (it is a nightmare to get this right)\\
\includegraphics[width=0.8\textwidth]{cubicmse}
\item 1-d gradient descend \item 1-d gradient descend
\item Describe in words the n-d problem. \item Describe in words the n-d problem (boltzman as example?).
\item Homework is to do the 2d problem with the straight line! \item Homework is to do the 2d problem with the straight line!
\item NO quiver plot (it is a nightmare to get this right)
\end{itemize} \end{itemize}
\subsection{2D fit} \subsection{2D fit}

View File

@ -152,7 +152,7 @@ For each value of the parameter $c$ of the model we can use
function. The cost function $f_{cost}(c|\{(x_i, y_i)\}|)$ is a function. The cost function $f_{cost}(c|\{(x_i, y_i)\}|)$ is a
function $f_{cost}(c)$ that maps the parameter value $c$ to a scalar function $f_{cost}(c)$ that maps the parameter value $c$ to a scalar
error value. For a given data set we thus can simply plot the cost error value. For a given data set we thus can simply plot the cost
function as a function of $c$ (\figref{cubiccostfig}). function as a function of the parameter $c$ (\figref{cubiccostfig}).
\begin{exercise}{plotcubiccosts.m}{} \begin{exercise}{plotcubiccosts.m}{}
Calculate the mean squared error between the data and the cubic Calculate the mean squared error between the data and the cubic
@ -181,17 +181,18 @@ automatic optimization process?
The obvious approach would be to calculate the mean squared error for The obvious approach would be to calculate the mean squared error for
a range of parameter values and then find the position of the minimum a range of parameter values and then find the position of the minimum
using the \code{min} function. This approach, however has several using the \code{min()} function. This approach, however has several
disadvantages: (i) the accuracy of the estimation of the best disadvantages: (i) The accuracy of the estimation of the best
parameter is limited by the resolution used to sample the parameter parameter is limited by the resolution used to sample the parameter
space. The coarser the parameters are sampled the less precise is the space. The coarser the parameters are sampled the less precise is the
obtained position of the minimum (\figref{cubiccostfig}, right). (ii) obtained position of the minimum (\figref{cubiccostfig}, right). (ii)
the range of parameter values might not include the absolute minimum. The range of parameter values might not include the absolute minimum.
(iii) in particular for functions with more than a single free (iii) In particular for functions with more than a single free
parameter it is computationally expensive to calculate the cost parameter it is computationally expensive to calculate the cost
function for each parameter combination. The number of combinations function for each parameter combination at a sufficient
increases exponentially with the number of free parameters. This is resolution. The number of combinations increases exponentially with
known as the \enterm{curse of dimensionality}. the number of free parameters. This is known as the \enterm{curse of
dimensionality}.
So we need a different approach. We want a procedure that finds the So we need a different approach. We want a procedure that finds the
minimum of the cost function with a minimal number of computations and minimum of the cost function with a minimal number of computations and
@ -219,110 +220,54 @@ to arbitrary precision.
f'(x) = \frac{{\rm d} f(x)}{{\rm d}x} = \lim\limits_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} \end{equation} f'(x) = \frac{{\rm d} f(x)}{{\rm d}x} = \lim\limits_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} \end{equation}
\end{minipage}\vspace{2ex} \end{minipage}\vspace{2ex}
It is not possible to calculate the exact value of the derivative, It is not possible to calculate the exact value of the derivative
\eqnref{derivative}, numerically. The derivative can only be \eqref{derivative} numerically. The derivative can only be estimated
estimated by computing the difference quotient, \eqnref{difffrac} by computing the difference quotient \eqref{difffrac} using
using sufficiently small $\Delta x$. sufficiently small $\Delta x$.
\end{ibox} \end{ibox}
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
Some functions that depend on more than a single variable:
\[ z = f(x,y) \]
for example depends on $x$ and $y$. Using the partial derivative
\[ \frac{\partial f(x,y)}{\partial x} = \lim\limits_{\Delta x \to 0} \frac{f(x + \Delta x,y) - f(x,y)}{\Delta x} \]
and
\[ \frac{\partial f(x,y)}{\partial y} = \lim\limits_{\Delta y \to 0} \frac{f(x, y + \Delta y) - f(x,y)}{\Delta y} \]
one can estimate the slope in the direction of the variables
individually by using the respective difference quotient
(Box~\ref{differentialquotientbox}). \vspace{1ex}
\begin{minipage}[t]{0.5\textwidth}
\mbox{}\\[-2ex]
\includegraphics[width=1\textwidth]{gradient}
\end{minipage}
\hfill
\begin{minipage}[t]{0.46\textwidth}
For example, the partial derivatives of
\[ f(x,y) = x^2+y^2 \] are
\[ \frac{\partial f(x,y)}{\partial x} = 2x \; , \quad \frac{\partial f(x,y)}{\partial y} = 2y \; .\]
The gradient is a vector that is constructed from the partial derivatives:
\[ \nabla f(x,y) = \left( \begin{array}{c} \frac{\partial f(x,y)}{\partial x} \\[1ex] \frac{\partial f(x,y)}{\partial y} \end{array} \right) \]
This vector points into the direction of the strongest ascend of
$f(x,y)$.
\end{minipage}
\vspace{0.5ex} The figure shows the contour lines of a bi-variate
Gaussian $f(x,y) = \exp(-(x^2+y^2)/2)$ and the gradient (thick
arrows) and the corresponding two partial derivatives (thin arrows)
for three different locations.
\end{ibox}
\section{Gradient} \section{Gradient}
Imagine to place a small ball at some point on the error surface Imagine to place a ball at some point on the cost function
\figref{errorsurfacefig}. Naturally, it would roll down the steepest \figref{cubiccostfig}. Naturally, it would roll down the slope and
slope and eventually stop at the minimum of the error surface (if it had no eventually stop at the minimum of the error surface (if it had no
inertia). We will use this picture to develop an algorithm to find our inertia). We will use this analogy to develop an algorithm to find our
way to the minimum of the objective function. The ball will always way to the minimum of the cost function. The ball always follows the
follow the steepest slope. Thus we need to figure out the direction of steepest slope. Thus we need to figure out the direction of the slope
the steepest slope at the position of the ball. at the position of the ball.
The \entermde{Gradient}{gradient} (Box~\ref{partialderivativebox}) of the In our one-dimensional example of a single free parameter the slope is
objective function is the vector simply the derivative of the cost function with respect to the
parameter $c$. This derivative is called the
\entermde{Gradient}{gradient} of the cost function:
\begin{equation} \begin{equation}
\label{gradient} \label{costderivative}
\nabla f_{cost}(m,b) = \left( \frac{\partial f(m,b)}{\partial m}, \nabla f_{cost}(c) = \frac{{\rm d} f_{cost}(c)}{{\rm d} c}
\frac{\partial f(m,b)}{\partial b} \right)
\end{equation} \end{equation}
that points to the strongest ascend of the objective function. The There is no need to calculate this derivative analytically, because it
gradient is given by partial derivatives can be approximated numerically by the difference quotient
(Box~\ref{partialderivativebox}) of the mean squared error with (Box~\ref{differentialquotientbox}) for small steps $\Delta c$:
respect to the parameters $m$ and $b$ of the straight line. There is
no need to calculate it analytically because it can be estimated from
the partial derivatives using the difference quotient
(Box~\ref{differentialquotientbox}) for small steps $\Delta m$ and
$\Delta b$. For example, the partial derivative with respect to $m$
can be computed as
\begin{equation} \begin{equation}
\frac{\partial f_{cost}(m,b)}{\partial m} = \lim\limits_{\Delta m \to \frac{{\rm d} f_{cost}(c)}{{\rm d} c} =
0} \frac{f_{cost}(m + \Delta m, b) - f_{cost}(m,b)}{\Delta m} \lim\limits_{\Delta c \to 0} \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
\approx \frac{f_{cost}(m + \Delta m, b) - f_{cost}(m,b)}{\Delta m} \; . \approx \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
\end{equation} \end{equation}
The length of the gradient indicates the steepness of the slope The derivative is positive for positive slopes. Since want to go down
(\figref{gradientquiverfig}). Since want to go down the hill, we the hill, we choose the opposite direction.
choose the opposite direction.
\begin{exercise}{meanSquaredGradientCubic.m}{}
Implement a function \varcode{meanSquaredGradientCubic()}, that
\begin{figure}[t] takes the $x$- and $y$-data and the parameter $c$ as input
\includegraphics[width=0.75\textwidth]{error_gradient} arguments. The function should return the derivative of the mean
\titlecaption{Gradient of the error surface.} {Each arrow points squared error $f_{cost}(c)$ with respect to $c$ at the position
into the direction of the greatest ascend at different positions $c$.
of the error surface shown in \figref{errorsurfacefig}. The
contour lines in the background illustrate the error surface. Warm
colors indicate high errors, colder colors low error values. Each
contour line connects points of equal
error.}\label{gradientquiverfig}
\end{figure}
\begin{exercise}{meanSquaredGradient.m}{}\label{gradientexercise}%
Implement a function \varcode{meanSquaredGradient()}, that takes the
$x$- and $y$-data and the set of parameters $(m, b)$ of a straight
line as a two-element vector as input arguments. The function should
return the gradient at the position $(m, b)$ as a vector with two
elements.
\end{exercise} \end{exercise}
\begin{exercise}{errorGradient.m}{} \begin{exercise}{plotcubicgradient.m}{}
Extend the script of exercises~\ref{errorsurfaceexercise} to plot Using the \varcode{meanSquaredGradientCubic()} function from the
both the error surface and gradients using the previous exercise, plot the derivative of the cost function as a
\varcode{meanSquaredGradient()} function from function of $c$.
exercise~\ref{gradientexercise}. Vectors in space can be easily
plotted using the function \code{quiver()}. Use \code{contour()}
instead of \code{surface()} to plot the error surface.
\end{exercise} \end{exercise}
\section{Gradient descent} \section{Gradient descent}
Finally, we are able to implement the optimization itself. By now it Finally, we are able to implement the optimization itself. By now it
should be obvious why it is called the gradient descent method. All should be obvious why it is called the gradient descent method. All
@ -381,6 +326,50 @@ large.
\end{exercise} \end{exercise}
\begin{ibox}[tp]{\label{partialderivativebox}Partial derivative and gradient}
Some functions that depend on more than a single variable:
\[ z = f(x,y) \]
for example depends on $x$ and $y$. Using the partial derivative
\[ \frac{\partial f(x,y)}{\partial x} = \lim\limits_{\Delta x \to 0} \frac{f(x + \Delta x,y) - f(x,y)}{\Delta x} \]
and
\[ \frac{\partial f(x,y)}{\partial y} = \lim\limits_{\Delta y \to 0} \frac{f(x, y + \Delta y) - f(x,y)}{\Delta y} \]
one can estimate the slope in the direction of the variables
individually by using the respective difference quotient
(Box~\ref{differentialquotientbox}). \vspace{1ex}
\begin{minipage}[t]{0.5\textwidth}
\mbox{}\\[-2ex]
\includegraphics[width=1\textwidth]{gradient}
\end{minipage}
\hfill
\begin{minipage}[t]{0.46\textwidth}
For example, the partial derivatives of
\[ f(x,y) = x^2+y^2 \] are
\[ \frac{\partial f(x,y)}{\partial x} = 2x \; , \quad \frac{\partial f(x,y)}{\partial y} = 2y \; .\]
The gradient is a vector that is constructed from the partial derivatives:
\[ \nabla f(x,y) = \left( \begin{array}{c} \frac{\partial f(x,y)}{\partial x} \\[1ex] \frac{\partial f(x,y)}{\partial y} \end{array} \right) \]
This vector points into the direction of the strongest ascend of
$f(x,y)$.
\end{minipage}
\vspace{0.5ex} The figure shows the contour lines of a bi-variate
Gaussian $f(x,y) = \exp(-(x^2+y^2)/2)$ and the gradient (thick
arrows) and the corresponding two partial derivatives (thin arrows)
for three different locations.
\end{ibox}
The \entermde{Gradient}{gradient} (Box~\ref{partialderivativebox}) of the
objective function is the vector
\begin{equation}
\label{gradient}
\nabla f_{cost}(m,b) = \left( \frac{\partial f(m,b)}{\partial m},
\frac{\partial f(m,b)}{\partial b} \right)
\end{equation}
that points to the strongest ascend of the objective function. The
gradient is given by partial derivatives
(Box~\ref{partialderivativebox}) of the mean squared error with
respect to the parameters $m$ and $b$ of the straight line.
\section{Summary} \section{Summary}
The gradient descent is an important numerical method for solving The gradient descent is an important numerical method for solving