[regression] updated gradient descent seciotn text
This commit is contained in:
parent
ca54936245
commit
4b624fe981
@ -272,24 +272,27 @@ the hill, we choose the opposite direction.
|
|||||||
Finally, we are able to implement the optimization itself. By now it
|
Finally, we are able to implement the optimization itself. By now it
|
||||||
should be obvious why it is called the gradient descent method. All
|
should be obvious why it is called the gradient descent method. All
|
||||||
ingredients are already there. We need: (i) the cost function
|
ingredients are already there. We need: (i) the cost function
|
||||||
(\varcode{meanSquaredError()}), and (ii) the gradient
|
(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
|
||||||
(\varcode{meanSquaredGradient()}). The algorithm of the gradient
|
(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
|
||||||
descent works as follows:
|
descent works as follows:
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item Start with some given combination of the parameters $m$ and $b$
|
\item Start with some arbitrary value $p_0$ for the parameter $c$.
|
||||||
($p_0 = (m_0, b_0)$).
|
\item \label{computegradient} Compute the gradient
|
||||||
\item \label{computegradient} Calculate the gradient at the current
|
\eqnref{costderivative} at the current position $p_i$.
|
||||||
position $p_i$.
|
\item If the length of the gradient, the absolute value of the
|
||||||
\item If the length of the gradient falls below a certain value, we
|
derivative \eqnref{costderivative}, is smaller than some threshold
|
||||||
assume to have reached the minimum and stop the search. We are
|
value, we assume to have reached the minimum and stop the search.
|
||||||
actually looking for the point at which the length of the gradient
|
We return the current parameter value $p_i$ as our best estimate of
|
||||||
is zero, but finding zero is impossible because of numerical
|
the parameter $c$ that minimizes the cost function.
|
||||||
imprecision. We thus apply a threshold below which we are
|
|
||||||
sufficiently close to zero (e.g. \varcode{norm(gradient) < 0.1}).
|
We are actually looking for the point at which the derivative of the
|
||||||
|
cost function equals zero, but this is impossible because of
|
||||||
|
numerical imprecision. We thus apply a threshold below which we are
|
||||||
|
sufficiently close to zero.
|
||||||
\item \label{gradientstep} If the length of the gradient exceeds the
|
\item \label{gradientstep} If the length of the gradient exceeds the
|
||||||
threshold we take a small step into the opposite direction:
|
threshold we take a small step into the opposite direction:
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(m_i, b_i)
|
p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
|
||||||
\end{equation}
|
\end{equation}
|
||||||
where $\epsilon = 0.01$ is a factor linking the gradient to
|
where $\epsilon = 0.01$ is a factor linking the gradient to
|
||||||
appropriate steps in the parameter space.
|
appropriate steps in the parameter space.
|
||||||
@ -298,11 +301,10 @@ descent works as follows:
|
|||||||
|
|
||||||
\Figref{gradientdescentfig} illustrates the gradient descent --- the
|
\Figref{gradientdescentfig} illustrates the gradient descent --- the
|
||||||
path the imaginary ball has chosen to reach the minimum. Starting at
|
path the imaginary ball has chosen to reach the minimum. Starting at
|
||||||
an arbitrary position on the error surface we change the position as
|
an arbitrary position we change the position as long as the gradient
|
||||||
long as the gradient at that position is larger than a certain
|
at that position is larger than a certain threshold. If the slope is
|
||||||
threshold. If the slope is very steep, the change in the position (the
|
very steep, the change in the position (the distance between the red
|
||||||
distance between the red dots in \figref{gradientdescentfig}) is
|
dots in \figref{gradientdescentfig}) is large.
|
||||||
large.
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\includegraphics{cubicmse}
|
\includegraphics{cubicmse}
|
||||||
|
Reference in New Issue
Block a user