[regression] n-dim is simply n times 1 dim

2020-12-23 18:16:27 +01:00 · 2020-12-23 18:16:27 +01:00 · 5a6cca59d3
commit 5a6cca59d3
parent 60a94c9ce6
1 changed files with 8 additions and 2 deletions
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -259,6 +259,7 @@ There is no need to calculate this derivative analytically, because it
 can be approximated numerically by the difference quotient
 (Box~\ref{differentialquotientbox}) for small steps $\Delta c$:
 \begin{equation}
+  \label{costderivativediff}
  \frac{{\rm d} f_{cost}(c)}{{\rm d} c} =
    \lim\limits_{\Delta c \to 0} \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
    \approx \frac{f_{cost}(c + \Delta c) - f_{cost}(c)}{\Delta c}
@ -443,7 +444,7 @@ are searching for the position of the bottom of the deepest valley
  and
  \[ \frac{\partial f(x,y)}{\partial y} = \lim\limits_{\Delta y \to 0} \frac{f(x, y + \Delta y) - f(x,y)}{\Delta y} \]
  one can calculate the slopes in the directions of each of the
-  variables by means of the respective difference quotient
+  variables by means of the respective difference quotients
  (see box~\ref{differentialquotientbox}).  \vspace{1ex}

  \begin{minipage}[t]{0.5\textwidth}
@ -489,6 +490,10 @@ $p_j$ the respective partial derivatives as coordinates:
  \label{gradient}
  \nabla f_{cost}(\vec p) = \left( \frac{\partial f_{cost}(\vec p)}{\partial p_j} \right)
 \end{equation}
+Despite the fancy words this simply means that we need to calculate the
+derivatives in the very same way as we have done it for the case of a
+single parameter, \eqnref{costderivativediff}, for each parameter
+separately.

 The iterative equation \eqref{gradientdescent} of the gradient descent
 stays exactly the same, with the only difference that the current
@ -497,7 +502,8 @@ parameter value $p_i$ becomes a vector $\vec p_i$ of parameter values:
  \label{ndimgradientdescent}
  \vec p_{i+1} = \vec p_i - \epsilon \cdot \nabla f_{cost}(\vec p_i)
 \end{equation}
-The algorithm proceeds along the negative gradient
+For each parameter we subtract the corresponding derivative multiplied
+with $\epsilon$.  The algorithm proceeds along the negative gradient
 (\figref{powergradientdescentfig}).

 For the termination condition we need the length of the gradient. In