[regression] updated gradient descent seciotn text

This commit is contained in:
Jan Benda 2020-12-19 10:13:37 +01:00
parent ca54936245
commit 4b624fe981

View File

@ -272,24 +272,27 @@ the hill, we choose the opposite direction.
Finally, we are able to implement the optimization itself. By now it Finally, we are able to implement the optimization itself. By now it
should be obvious why it is called the gradient descent method. All should be obvious why it is called the gradient descent method. All
ingredients are already there. We need: (i) the cost function ingredients are already there. We need: (i) the cost function
(\varcode{meanSquaredError()}), and (ii) the gradient (\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
(\varcode{meanSquaredGradient()}). The algorithm of the gradient (\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
descent works as follows: descent works as follows:
\begin{enumerate} \begin{enumerate}
\item Start with some given combination of the parameters $m$ and $b$ \item Start with some arbitrary value $p_0$ for the parameter $c$.
($p_0 = (m_0, b_0)$). \item \label{computegradient} Compute the gradient
\item \label{computegradient} Calculate the gradient at the current \eqnref{costderivative} at the current position $p_i$.
position $p_i$. \item If the length of the gradient, the absolute value of the
\item If the length of the gradient falls below a certain value, we derivative \eqnref{costderivative}, is smaller than some threshold
assume to have reached the minimum and stop the search. We are value, we assume to have reached the minimum and stop the search.
actually looking for the point at which the length of the gradient We return the current parameter value $p_i$ as our best estimate of
is zero, but finding zero is impossible because of numerical the parameter $c$ that minimizes the cost function.
imprecision. We thus apply a threshold below which we are
sufficiently close to zero (e.g. \varcode{norm(gradient) < 0.1}). We are actually looking for the point at which the derivative of the
cost function equals zero, but this is impossible because of
numerical imprecision. We thus apply a threshold below which we are
sufficiently close to zero.
\item \label{gradientstep} If the length of the gradient exceeds the \item \label{gradientstep} If the length of the gradient exceeds the
threshold we take a small step into the opposite direction: threshold we take a small step into the opposite direction:
\begin{equation} \begin{equation}
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(m_i, b_i) p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
\end{equation} \end{equation}
where $\epsilon = 0.01$ is a factor linking the gradient to where $\epsilon = 0.01$ is a factor linking the gradient to
appropriate steps in the parameter space. appropriate steps in the parameter space.
@ -298,11 +301,10 @@ descent works as follows:
\Figref{gradientdescentfig} illustrates the gradient descent --- the \Figref{gradientdescentfig} illustrates the gradient descent --- the
path the imaginary ball has chosen to reach the minimum. Starting at path the imaginary ball has chosen to reach the minimum. Starting at
an arbitrary position on the error surface we change the position as an arbitrary position we change the position as long as the gradient
long as the gradient at that position is larger than a certain at that position is larger than a certain threshold. If the slope is
threshold. If the slope is very steep, the change in the position (the very steep, the change in the position (the distance between the red
distance between the red dots in \figref{gradientdescentfig}) is dots in \figref{gradientdescentfig}) is large.
large.
\begin{figure}[t] \begin{figure}[t]
\includegraphics{cubicmse} \includegraphics{cubicmse}