[regression] updated gradient descent seciotn text

2020-12-19 10:13:37 +01:00 · 2020-12-19 10:13:37 +01:00 · 4b624fe981
commit 4b624fe981
parent ca54936245
1 changed files with 20 additions and 18 deletions
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -272,24 +272,27 @@ the hill, we choose the opposite direction.
 Finally, we are able to implement the optimization itself. By now it
 should be obvious why it is called the gradient descent method. All
 ingredients are already there. We need: (i) the cost function
-(\varcode{meanSquaredError()}), and (ii) the gradient
+(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
-(\varcode{meanSquaredGradient()}). The algorithm of the gradient
+(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
 descent works as follows:
 \begin{enumerate}
-\item Start with some given combination of the parameters $m$ and $b$
+\item Start with some arbitrary value $p_0$ for the parameter $c$.
-  ($p_0 = (m_0, b_0)$).
+\item \label{computegradient} Compute the gradient
-\item \label{computegradient} Calculate the gradient at the current
+  \eqnref{costderivative} at the current position $p_i$.
-  position $p_i$.
+\item If the length of the gradient, the absolute value of the
-\item If the length of the gradient falls below a certain value, we
+  derivative \eqnref{costderivative}, is smaller than some threshold
-  assume to have reached the minimum and stop the search. We are
+  value, we assume to have reached the minimum and stop the search.
-  actually looking for the point at which the length of the gradient
+  We return the current parameter value $p_i$ as our best estimate of
-  is zero, but finding zero is impossible because of numerical
+  the parameter $c$ that minimizes the cost function.
-  imprecision. We thus apply a threshold below which we are
+
-  sufficiently close to zero (e.g. \varcode{norm(gradient) < 0.1}).
+  We are actually looking for the point at which the derivative of the
  cost function equals zero, but this is impossible because of
  numerical imprecision. We thus apply a threshold below which we are
  sufficiently close to zero.
 \item \label{gradientstep} If the length of the gradient exceeds the
  threshold we take a small step into the opposite direction:
  \begin{equation}
-    p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(m_i, b_i)
+    p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
  \end{equation}
  where $\epsilon = 0.01$ is a factor linking the gradient to
  appropriate steps in the parameter space.
@ -298,11 +301,10 @@ descent works as follows:
 \Figref{gradientdescentfig} illustrates the gradient descent --- the
 path the imaginary ball has chosen to reach the minimum. Starting at
-an arbitrary position on the error surface we change the position as
+an arbitrary position we change the position as long as the gradient
-long as the gradient at that position is larger than a certain
+at that position is larger than a certain threshold. If the slope is
-threshold. If the slope is very steep, the change in the position (the
+very steep, the change in the position (the distance between the red
-distance between the red dots in \figref{gradientdescentfig}) is
+dots in \figref{gradientdescentfig}) is large.
 large.
 \begin{figure}[t]
  \includegraphics{cubicmse}