[regression] updated gradient descent seciotn text

2020-12-19 10:13:37 +01:00 · 2020-12-19 10:13:37 +01:00 · 4b624fe981
commit 4b624fe981
parent ca54936245
1 changed files with 20 additions and 18 deletions
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -272,24 +272,27 @@ the hill, we choose the opposite direction.
 Finally, we are able to implement the optimization itself. By now it
 should be obvious why it is called the gradient descent method. All
 ingredients are already there. We need: (i) the cost function
-(\varcode{meanSquaredError()}), and (ii) the gradient
-(\varcode{meanSquaredGradient()}). The algorithm of the gradient
+(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient
+(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient
 descent works as follows:
 \begin{enumerate}
-\item Start with some given combination of the parameters $m$ and $b$
-  ($p_0 = (m_0, b_0)$).
-\item \label{computegradient} Calculate the gradient at the current
-  position $p_i$.
-\item If the length of the gradient falls below a certain value, we
-  assume to have reached the minimum and stop the search. We are
-  actually looking for the point at which the length of the gradient
-  is zero, but finding zero is impossible because of numerical
-  imprecision. We thus apply a threshold below which we are
-  sufficiently close to zero (e.g. \varcode{norm(gradient) < 0.1}).
+\item Start with some arbitrary value $p_0$ for the parameter $c$.
+\item \label{computegradient} Compute the gradient
+  \eqnref{costderivative} at the current position $p_i$.
+\item If the length of the gradient, the absolute value of the
+  derivative \eqnref{costderivative}, is smaller than some threshold
+  value, we assume to have reached the minimum and stop the search.
+  We return the current parameter value $p_i$ as our best estimate of
+  the parameter $c$ that minimizes the cost function.
+
+  We are actually looking for the point at which the derivative of the
+  cost function equals zero, but this is impossible because of
+  numerical imprecision. We thus apply a threshold below which we are
+  sufficiently close to zero.
 \item \label{gradientstep} If the length of the gradient exceeds the
  threshold we take a small step into the opposite direction:
  \begin{equation}
-    p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(m_i, b_i)
+    p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i)
  \end{equation}
  where $\epsilon = 0.01$ is a factor linking the gradient to
  appropriate steps in the parameter space.
@ -298,11 +301,10 @@ descent works as follows:

 \Figref{gradientdescentfig} illustrates the gradient descent --- the
 path the imaginary ball has chosen to reach the minimum. Starting at
-an arbitrary position on the error surface we change the position as
-long as the gradient at that position is larger than a certain
-threshold. If the slope is very steep, the change in the position (the
-distance between the red dots in \figref{gradientdescentfig}) is
-large.
+an arbitrary position we change the position as long as the gradient
+at that position is larger than a certain threshold. If the slope is
+very steep, the change in the position (the distance between the red
+dots in \figref{gradientdescentfig}) is large.

 \begin{figure}[t]
  \includegraphics{cubicmse}