From 4b624fe9815f2d489ec72d9a9c3af0c494b63df4 Mon Sep 17 00:00:00 2001 From: Jan Benda Date: Sat, 19 Dec 2020 10:13:37 +0100 Subject: [PATCH] [regression] updated gradient descent seciotn text --- regression/lecture/regression.tex | 38 ++++++++++++++++--------------- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/regression/lecture/regression.tex b/regression/lecture/regression.tex index 776f745..3f8c045 100644 --- a/regression/lecture/regression.tex +++ b/regression/lecture/regression.tex @@ -272,24 +272,27 @@ the hill, we choose the opposite direction. Finally, we are able to implement the optimization itself. By now it should be obvious why it is called the gradient descent method. All ingredients are already there. We need: (i) the cost function -(\varcode{meanSquaredError()}), and (ii) the gradient -(\varcode{meanSquaredGradient()}). The algorithm of the gradient +(\varcode{meanSquaredErrorCubic()}), and (ii) the gradient +(\varcode{meanSquaredGradientCubic()}). The algorithm of the gradient descent works as follows: \begin{enumerate} -\item Start with some given combination of the parameters $m$ and $b$ - ($p_0 = (m_0, b_0)$). -\item \label{computegradient} Calculate the gradient at the current - position $p_i$. -\item If the length of the gradient falls below a certain value, we - assume to have reached the minimum and stop the search. We are - actually looking for the point at which the length of the gradient - is zero, but finding zero is impossible because of numerical - imprecision. We thus apply a threshold below which we are - sufficiently close to zero (e.g. \varcode{norm(gradient) < 0.1}). +\item Start with some arbitrary value $p_0$ for the parameter $c$. +\item \label{computegradient} Compute the gradient + \eqnref{costderivative} at the current position $p_i$. +\item If the length of the gradient, the absolute value of the + derivative \eqnref{costderivative}, is smaller than some threshold + value, we assume to have reached the minimum and stop the search. + We return the current parameter value $p_i$ as our best estimate of + the parameter $c$ that minimizes the cost function. + + We are actually looking for the point at which the derivative of the + cost function equals zero, but this is impossible because of + numerical imprecision. We thus apply a threshold below which we are + sufficiently close to zero. \item \label{gradientstep} If the length of the gradient exceeds the threshold we take a small step into the opposite direction: \begin{equation} - p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(m_i, b_i) + p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(p_i) \end{equation} where $\epsilon = 0.01$ is a factor linking the gradient to appropriate steps in the parameter space. @@ -298,11 +301,10 @@ descent works as follows: \Figref{gradientdescentfig} illustrates the gradient descent --- the path the imaginary ball has chosen to reach the minimum. Starting at -an arbitrary position on the error surface we change the position as -long as the gradient at that position is larger than a certain -threshold. If the slope is very steep, the change in the position (the -distance between the red dots in \figref{gradientdescentfig}) is -large. +an arbitrary position we change the position as long as the gradient +at that position is larger than a certain threshold. If the slope is +very steep, the change in the position (the distance between the red +dots in \figref{gradientdescentfig}) is large. \begin{figure}[t] \includegraphics{cubicmse}