[regression] updated gradient descent seciotn text
This commit is contained in:
parent
ca54936245
commit
4b624fe981
@ -272,24 +272,27 @@ the hill, we choose the opposite direction.
|
||||
Finally, we are able to implement the optimization itself. By now it
|
||||
should be obvious why it is called the gradient descent method. All
|
||||
ingredients are already there. We need: (i) the cost function
|
||||
(\varcode{meanSquaredError()}), and (ii) the gradient
|
||||
(\varcode{meanSquaredGradient()}). The algorithm of the gradient
|
||||
(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
|
||||
(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
|
||||
descent works as follows:
|
||||
\begin{enumerate}
|
||||
\item Start with some given combination of the parameters $m$ and $b$
|
||||
($p_0 = (m_0, b_0)$).
|
||||
\item \label{computegradient} Calculate the gradient at the current
|
||||
position $p_i$.
|
||||
\item If the length of the gradient falls below a certain value, we
|
||||
assume to have reached the minimum and stop the search. We are
|
||||
actually looking for the point at which the length of the gradient
|
||||
is zero, but finding zero is impossible because of numerical
|
||||
imprecision. We thus apply a threshold below which we are
|
||||
sufficiently close to zero (e.g. \varcode{norm(gradient) < 0.1}).
|
||||
\item Start with some arbitrary value $p_0$ for the parameter $c$.
|
||||
\item \label{computegradient} Compute the gradient
|
||||
\eqnref{costderivative} at the current position $p_i$.
|
||||
\item If the length of the gradient, the absolute value of the
|
||||
derivative \eqnref{costderivative}, is smaller than some threshold
|
||||
value, we assume to have reached the minimum and stop the search.
|
||||
We return the current parameter value $p_i$ as our best estimate of
|
||||
the parameter $c$ that minimizes the cost function.
|
||||
|
||||
We are actually looking for the point at which the derivative of the
|
||||
cost function equals zero, but this is impossible because of
|
||||
numerical imprecision. We thus apply a threshold below which we are
|
||||
sufficiently close to zero.
|
||||
\item \label{gradientstep} If the length of the gradient exceeds the
|
||||
threshold we take a small step into the opposite direction:
|
||||
\begin{equation}
|
||||
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(m_i, b_i)
|
||||
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
|
||||
\end{equation}
|
||||
where $\epsilon = 0.01$ is a factor linking the gradient to
|
||||
appropriate steps in the parameter space.
|
||||
@ -298,11 +301,10 @@ descent works as follows:
|
||||
|
||||
\Figref{gradientdescentfig} illustrates the gradient descent --- the
|
||||
path the imaginary ball has chosen to reach the minimum. Starting at
|
||||
an arbitrary position on the error surface we change the position as
|
||||
long as the gradient at that position is larger than a certain
|
||||
threshold. If the slope is very steep, the change in the position (the
|
||||
distance between the red dots in \figref{gradientdescentfig}) is
|
||||
large.
|
||||
an arbitrary position we change the position as long as the gradient
|
||||
at that position is larger than a certain threshold. If the slope is
|
||||
very steep, the change in the position (the distance between the red
|
||||
dots in \figref{gradientdescentfig}) is large.
|
||||
|
||||
\begin{figure}[t]
|
||||
\includegraphics{cubicmse}
|
||||
|
Reference in New Issue
Block a user