[regression] updated gradient descent seciotn text

This commit is contained in:
Jan Benda 2020-12-19 10:13:37 +01:00
parent ca54936245
commit 4b624fe981

View File

@ -272,24 +272,27 @@ the hill, we choose the opposite direction.
Finally, we are able to implement the optimization itself. By now it
should be obvious why it is called the gradient descent method. All
ingredients are already there. We need: (i) the cost function
(\varcode{meanSquaredError()}), and (ii) the gradient
(\varcode{meanSquaredGradient()}). The algorithm of the gradient
(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
descent works as follows:
\begin{enumerate}
\item Start with some given combination of the parameters $m$ and $b$
($p_0 = (m_0, b_0)$).
\item \label{computegradient} Calculate the gradient at the current
position $p_i$.
\item If the length of the gradient falls below a certain value, we
assume to have reached the minimum and stop the search. We are
actually looking for the point at which the length of the gradient
is zero, but finding zero is impossible because of numerical
imprecision. We thus apply a threshold below which we are
sufficiently close to zero (e.g. \varcode{norm(gradient) < 0.1}).
\item Start with some arbitrary value $p_0$ for the parameter $c$.
\item \label{computegradient} Compute the gradient
\eqnref{costderivative} at the current position $p_i$.
\item If the length of the gradient, the absolute value of the
derivative \eqnref{costderivative}, is smaller than some threshold
value, we assume to have reached the minimum and stop the search.
We return the current parameter value $p_i$ as our best estimate of
the parameter $c$ that minimizes the cost function.
We are actually looking for the point at which the derivative of the
cost function equals zero, but this is impossible because of
numerical imprecision. We thus apply a threshold below which we are
sufficiently close to zero.
\item \label{gradientstep} If the length of the gradient exceeds the
threshold we take a small step into the opposite direction:
\begin{equation}
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(m_i, b_i)
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
\end{equation}
where $\epsilon = 0.01$ is a factor linking the gradient to
appropriate steps in the parameter space.
@ -298,11 +301,10 @@ descent works as follows:
\Figref{gradientdescentfig} illustrates the gradient descent --- the
path the imaginary ball has chosen to reach the minimum. Starting at
an arbitrary position on the error surface we change the position as
long as the gradient at that position is larger than a certain
threshold. If the slope is very steep, the change in the position (the
distance between the red dots in \figref{gradientdescentfig}) is
large.
an arbitrary position we change the position as long as the gradient
at that position is larger than a certain threshold. If the slope is
very steep, the change in the position (the distance between the red
dots in \figref{gradientdescentfig}) is large.
\begin{figure}[t]
\includegraphics{cubicmse}