[regression] translations 2

2018-10-11 16:55:24 +02:00 · 2018-10-11 16:55:24 +02:00 · e500280c07
commit e500280c07
parent 51a8183f33
1 changed files with 164 additions and 155 deletions
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -117,7 +117,7 @@ Replacing $y^{est}$ with the linear equation (the model) in
  & = & \frac{1}{N} \sum_{i=1}^N (y_i - m x_i - b)^2 \label{mseline}
 \end{eqnarray}
-That is, the meas square error given the pairs $(x_i, y_i)$ and the
+That is, the mean square error is given the pairs $(x_i, y_i)$ and the
 parameters $m$ and $b$ of the linear equation. The optimization
 process will not try to optimize $m$ and $b$ to lead to the smallest
 error, the method of the \enterm{least square error}.
@ -163,66 +163,67 @@ third dimension is used to indicate the error value
  Load the dataset \textit{lin\_regression.mat} into the workspace (20
  data pairs contained in the vectors \varcode{x} and
  \varcode{y}). Implement a script \file{errorSurface.m}, that
-  calculates the mean square error between data and a linear model und
+  calculates the mean square error between data and a linear model and
  illustrates the error surface using the \code{surf()} function
  (consult the help to find out how to use \code{surf}.).
 \end{exercise}
-An der Fehlerfl\"ache kann direkt erkannt werden, bei welcher
+By looking at the error surface we can directly see the position of
-Parameterkombination der Fehler minimal, beziehungsweise die
+the minimum and thus estimate the optimal parameter combination. How
-Parameterisierung optimal an die Daten angepasst ist. Wie kann die
+can we use the error surface to guide an automatic optimization
-Fehlerfunktion und die durch sie definierte Fehlerfl\"ache nun benutzt
+process.
 werden, um den Optimierungsprozess zu leiten?
-Die naheliegenste Variante ist, von der Fehlerfl\"ache einfach den Ort
+The obvious approach would be to calculate the error surface and then
-des globalen Minimums zu bestimmen. Das ist im Allgemeinen jedoch zu
+find the position of the minimum. The approach, however has several
-rechenintensiv, da f\"ur jede m\"ogliche Kombination der Parameter der
+disadvantages: (I) it is computationally very expensive to calculate
-Fehler berechnet werden muss. Die Anzahl der n\"otigen Berechnungen
+the error for each parameter combination. The number of combinations
-steigt exponentiell mit der Anzahl der Parameter (``Fluch der
+increases exponentially with the number of free parameters (also known
-Dimension''). Auch eine bessere Genauigkeit, mit der das Minimum
+as the ``curse of dimensionality''). (II) the accuracy with which the
-bestimmt werden soll, erh\"oht die Anzahl der n\"otigen
+best parameters can be estimated is limited by the resolution with
-Berechnungen. Wir suchen also ein Verfahren, dass das Minimum der
+which the parameter space was sampled. If the grid is too large, one
-Kostenfunktion mit m\"oglichst wenigen Berechnungen findet.
+might miss the minimum.
-\begin{ibox}[t]{\label{differentialquotientbox}Differenzenquotient und Ableitung}
+We thus want a procedure that finds the minimum with a minimal number
 of computations.
 \begin{ibox}[t]{\label{differentialquotientbox}Difference quotient and derivative}
  \includegraphics[width=0.33\textwidth]{derivative}
  \hfill
  \begin{minipage}[b]{0.63\textwidth}
-    Der Differenzenquotient 
+    The difference quotient
    \begin{equation}
      \label{difffrac}
      m = \frac{f(x + \Delta x) - f(x)}{\Delta x}
    \end{equation}
-    einer Funktion $y = f(x)$ ist die Steigung der Sekante (rot) durch
+    of a function $y = f(x)$ is the slope of the secant (red) defined
-    die beiden Punkte $(x,f(x))$ und $(x+\Delta x,f(x+\Delta x))$ mit
+    by the points $(x,f(x))$ and $(x+\Delta x,f(x+\Delta x))$ with the
-    dem Abstand $\Delta x$.
+    distance $\Delta x$.
-    Die Steigung einer Funktion $y=f(x)$ an einer Stelle $x$ (gelb) wird durch
+    The slope of the function $y=f(x)$ at the position $x$ (yellow) is
-    die Ableitung $f'(x)$ der Funktion an dieser Stelle berechnet.  Die
+    given by the derivative $f'(x)$ of the function at that position.
-    Ableitung ist \"uber den Grenzwert (orange) des Differenzenquotienten f\"ur
+    It is defined by the difference quotient in the limit of
-    unendlich kleine Abst\"ande $\Delta x$ definiert:
+    infinitesimally (orange) small distances $\Delta x$:
    \begin{equation}
      \label{derivative}
      f'(x) = \frac{{\rm d} f(x)}{{\rm d}x} = \lim\limits_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} \end{equation}
  \end{minipage}\vspace{2ex} 
-  Numerisch kann der Grenzwert \eqnref{derivative} nicht
+  It is not possible to calculate this numerically
-  gebildet werden. Die Ableitung kann nur durch den
+  (\eqnref{derivative}). The derivative can only be estimated using
-  Differenzenquotienten \eqnref{difffrac} mit gen\"ugend kleinem
+  the difference quotient \eqnref{difffrac} by using sufficiently
-  $\Delta x$ angen\"ahert werden.
+  small $\Delta x$.
 \end{ibox}
-\begin{ibox}[t]{\label{partialderivativebox}Partielle Ableitungen und Gradient}
+\begin{ibox}[t]{\label{partialderivativebox}Partial derivative and gradient}
-  Bei Funktionen
+  Some functions that depend on more than a single variable:
  \[ z = f(x,y) \]
-  die von mehreren Variablen, z.B. $x$ und $y$ abh\"angen,
+  for example depends on $x$ and $y$. Using the partial derivative
  kann die Steigung in Richtung jeder dieser Variablen
  mit den partiellen Ableitungen 
  \[ \frac{\partial f(x,y)}{\partial x} = \lim\limits_{\Delta x \to 0} \frac{f(x + \Delta x,y) - f(x,y)}{\Delta x} \]
-  und
+  and
  \[ \frac{\partial f(x,y)}{\partial y} = \lim\limits_{\Delta y \to 0} \frac{f(x, y + \Delta y) - f(x,y)}{\Delta y} \]
-  definiert \"uber den jeweiligen Differenzenquotienten
+  one can estimate the slope in the direction of the variables
-  (Box~\ref{differentialquotientbox}) berechnet werden.  \vspace{1ex}
+  individually by using the respective difference quotient
  (Box~\ref{differentialquotientbox}).  \vspace{1ex}
  \begin{minipage}[t]{0.44\textwidth}
    \mbox{}\\[-2ex]
@ -230,172 +231,180 @@ Kostenfunktion mit m\"oglichst wenigen Berechnungen findet.
  \end{minipage}
  \hfill
  \begin{minipage}[t]{0.52\textwidth}
-    Z.B. lauten die partiellen Ableitungen von 
+    For example, the partial derivatives of
-    \[ f(x,y) = x^2+y^2 \]
+    \[ f(x,y) = x^2+y^2 \] are 
    \[ \frac{\partial f(x,y)}{\partial x} = 2x \; , \quad \frac{\partial f(x,y)}{\partial y} = 2y \; .\]
-    Der Gradient ist der aus den partiellen Ableitungen gebildete Vektor
+    The gradient is a vector that constructed from the partial derivatives:
    \[ \nabla f(x,y) = \left( \begin{array}{c} \frac{\partial f(x,y)}{\partial x} \\[1ex] \frac{\partial f(x,y)}{\partial y} \end{array} \right) \]
-    und zeigt in Richtung des st\"arksten Anstiegs der Funktion $f(x,y)$.
+    This vector points into the direction of the strongest ascend of
    $f(x,y)$.
  \end{minipage}
-  \vspace{1ex} Die Abbildung zeigt die Konturlinien einer bivariaten
+  \vspace{1ex} The figure shows the contour lines of a bi-variate
-  Gau{\ss}glocke $f(x,y) = \exp(-(x^2+y^2)/2)$ und den Gradienten mit
+  Gaussian $f(x,y) = \exp(-(x^2+y^2)/2)$ and the gradient (thick
-  seinen partiellen Ableitungen an drei verschiedenen Stellen.
+  arrow) and the two partial derivatives (thin arrows) for three
  different locations.
 \end{ibox}
 \section{Gradient}
 Imagine to place a small ball at some point on the error surface
 \figref{errorsurfacefig}. Naturally, it would follow the steepest
 slope and would stop at the minimum of the error surface (if it had no
 inertia). We will use this picture to develop an algorithm to find our
 way to the minimum of the objective function. The ball will always
 follow the steepest slope. Thus we need to figure out the direction of
 the steepest slope at the position of the ball.
-Wenn eine Kugel an einem beliebigen Startpunkt auf der Fehlerfl\"ache
+The \enterm{gradient} (Box~\ref{partialderivativebox}) of the
-\figref{errorsurfacefig} losgelassen werden w\"urde, dann w\"urde sie
+objective function is the vector
 entlang des steilsten Gef\"alles auf schnellsten Wege zum Minimum der
 Fehlerfl\"ache rollen und dort zum Stehen kommen (wenn sie keine
 Tr\"agheit besitzen w\"urde). Den Weg der Kugel wollen wir nun als
 Grundlage unseres Algorithmus zur Bestimmung des Minimums der
 Kostenfunktion verwenden. Da die Kugel immer entlang des steilsten
 Gef\"alles rollt, ben\"otigen wir Information \"uber die Richtung des
 Gef\"alles an der jeweils aktuellen Position.
 Der \determ{Gradient} (Box~\ref{partialderivativebox}) der Kostenfunktion
 \[ \nabla f_{cost}(m,b) = \left( \frac{\partial f(m,b)}{\partial m},
-  \frac{\partial f(m,b)}{\partial b} \right) \] bzgl. der beiden
+\frac{\partial f(m,b)}{\partial b} \right) \]
-Parameter $m$ und $b$ der Geradengleichung ist ein Vektor, der in
+
-Richtung des steilsten Anstiegs der Kostenfunktion $f_{cost}(m,b)$ zeigt.
+that points to the strongest ascend of the objective function. Since
-Die L\"ange des Gradienten gibt die St\"arke des Anstiegs an
+we want to reach the minimum we simply choose the opposite direction.
-(\figref{gradientquiverfig})).  Da wir aber abw\"arts zum Minimum
+
-laufen wollen, m\"ussen wir die dem Gradienten entgegengesetzte
+The gradient is given by partial derivatives
-Richtung einschlagen.
+(Box~\ref{partialderivativebox}) with respect to the parameters $m$
 and $b$ of the linear equation. There is no need to calculate it
 analytically but it can be estimated from the partial derivatives
 using the difference quotient (Box~\ref{differentialquotient}) for
 small steps $\Delta m$ und $\Delta b$. For example the partial
 derivative with respect to $m$:
 Die partiellen Ableitungen m\"ussen nicht analytisch berechnet werden
 sondern k\"onnen numerisch entsprechend dem Differenzenquotienten
 (Box~\ref{differentialquotientbox}) mit kleinen Schrittweiten $\Delta
 m$ und $\Delta b$ angen\"ahert werden. z.B. approximieren wir die
 partielle Ableitung nach $m$ durch
 \[\frac{\partial f_{cost}(m,b)}{\partial m} = \lim\limits_{\Delta m \to
-  0} \frac{f_{cost}(m + \Delta m, b) - f_{cost}(m,b)}{\Delta m} \approx \frac{f_{cost}(m + \Delta m, b) -
+  0} \frac{f_{cost}(m + \Delta m, b) - f_{cost}(m,b)}{\Delta m}
-  f_{cost}(m,b)}{\Delta m} \; . \]
+\approx \frac{f_{cost}(m + \Delta m, b) - f_{cost}(m,b)}{\Delta m} \;
 . \]
 The length of the gradient indicates the steepness of the slope
 (\figref{gradientquiverfig}). Since want to go down the hill, we
 choose the opposite direction.
 \begin{figure}[t]
  \includegraphics[width=0.75\columnwidth]{error_gradient}
-  \titlecaption{Gradient der Fehlerfl\"ache.} 
+  \titlecaption{Gradient of the error surface.}  {Each arrow points
-  {Jeder Pfeil zeigt die Richtung und die
+    into the direction of the greatest ascend at different positions
-    Steigung f\"ur verschiedene Parameterkombination aus Steigung und
+    of the error surface shown in \figref{errorsurfacefig}. The
-    $y$-Achsenabschnitt an. Die Konturlinien im Hintergrund
+    contour lines in the background illustrate the error surface. Warm
-    illustrieren die Fehlerfl\"ache. Warme Farben stehen f\"ur
+    colors indicate high errors, colder colors low error values. Each
-    gro{\ss}e Fehlerwerte, kalte Farben f\"ur kleine. Jede
+    contour line connects points of equal
-    Konturlinie steht f\"ur eine Linie gleichen
+    error.}\label{gradientquiverfig}
    Fehlers.}\label{gradientquiverfig}
 \end{figure}
 \begin{exercise}{lsqGradient.m}{}\label{gradientexercise}%
-  Implementiere eine Funktion \code{lsqGradient()}, die den
+  Implement a function \code{lsqGradient()}, that takes the set of
-  Parametersatz $(m, b)$ der Geradengleichung als 2-elementigen Vektor
+  parameters $(m, b)$ of the linear equation as a two-element vector
-  sowie die $x$- und $y$-Werte der Messdaten als Argumente
+  and the $x$- and $y$-data as input arguments. The function should
-  entgegennimmt und den Gradienten an dieser Stelle zur\"uckgibt.
+  return the gradient at that position.
 \end{exercise}
 \begin{exercise}{errorGradient.m}{}
-  Benutze die Funktion aus der vorherigen \"Ubung (\ref{gradientexercise}),
+  Use the functions from the previous
-  um f\"ur jede Parameterkombination aus der Fehlerfl\"ache
+  exercises~\ref{errorsurfaceexercise} and~\ref{gradientexercise} to
-  (\"Ubung \ref{errorsurfaceexercise}) auch den Gradienten zu
+  estimate and plot the error surface including the gradients. Choose
-  berechnen und darzustellen. Vektoren im Raum k\"onnen mithilfe der
+  a subset of parameter combinations for which you plot the
-  Funktion \code{quiver()} geplottet werden.
+  gradient. Vectors in space can be easily plotted using the function
  \code{quiver()}.
 \end{exercise}
-\section{Gradientenabstieg}
+\section{Gradient descent}
 Finally, we are able to implement the optimization itself. By now it
 should be obvious why it is called the gradient descent method. All
 ingredients are already there. We need: 1. The error function
 (\code{meanSquareError}), 2. the objective function
 (\code{lsqError()}), and 3. the gradient (\code{lsqGradient()}). The
 algorithm of the gradient descent is:
 Zu guter Letzt muss nur noch der \determ{Gradientenabstieg} implementiert
 werden. Die daf\"ur ben\"otigten Zutaten haben wir aus den
 vorangegangenen \"Ubungen bereits vorbereitet. Wir brauchen: 1. Die Fehlerfunktion
 (\code{meanSquareError()}), 2. die Zielfunktion (\code{lsqError()})
 und 3. den Gradienten (\code{lsqGradient()}).  Der Algorithmus
 f\"ur den Abstieg lautet:
 \begin{enumerate}
-\item Starte mit einer beliebigen Parameterkombination $p_0 = (m_0,
+\item Start with any given combination of the parameters $m$ and $b$ ($p_0 = (m_0,
-  b_0)$.
+  b_0)$).
-\item \label{computegradient} Berechne den Gradienten an der akutellen Position $p_i$.
+\item \label{computegradient} Calculate the gradient at the current
-\item Wenn die L\"ange des Gradienten einen bestimmten Wert
+  position $p_i$.
-  unterschreitet, haben wir das Minum gefunden und k\"onnen die Suche
+\item If the length of the gradient falls below a certain value, we
-  abbrechen.  Wir suchen ja das Minimum, bei dem der Gradient gleich
+  assume to have reached the minimum and stop the search. We are
-  Null ist. Da aus numerischen Gr\"unden der Gradient nie exakt Null
+  actually looking for the point at which the length of the gradient
-  werden wird, k\"onnen wir nur fordern, dass er hinreichend klein
+  is zero but finding zero is impossible for numerical reasons. We
-  wird (z.B. \varcode{norm(gradient) < 0.1}).
+  thus apply a threshold below which we are sufficiently close to zero
-\item \label{gradientstep} Gehe einen kleinen Schritt ($\epsilon =
+  (e.g. \varcode{norm(gradient) < 0.1}).
-  0.01$) in die entgegensetzte Richtung des Gradienten:
+\item \label{gradientstep} If the length of the gradient exceeds the
  threshold we take a small step into the opposite direction
  ($\epsilon = 0.01$):
  \[p_{i+1} = p_i - \epsilon \cdot \nabla f_{cost}(m_i, b_i)\]
-\item Wiederhole die Schritte \ref{computegradient} -- \ref{gradientstep}.
+\item Repeat steps \ref{computegradient} --
  \ref{gradientstep}.
 \end{enumerate}
-Abbildung \ref{gradientdescentfig} zeigt den Verlauf des
+\Figref{gradientdescentfig} illustrates the gradient descent (the path
-Gradientenabstiegs. Von einer Startposition aus wird die Position
+the imaginary ball has chosen to reach the minimum). Starting at an
-solange ver\"andert, wie der Gradient eine bestimmte Gr\"o{\ss}e
+arbitrary position on the error surface we change the position as long
-\"uberschreitet. An den Stellen, an denen der Gradient sehr stark ist,
+as the gradient at that position is larger than a certain
-ist auch die Ver\"anderung der Position gro{\ss} und der Abstand der
+threshold. If the slope is very steep, the change in the position (the
-Punkte in Abbildung \ref{gradientdescentfig} gro{\ss}.
+distance between the red dots in \figref{gradientdescentfig}) is
 large.
 \begin{figure}[t]
  \includegraphics[width=0.6\columnwidth]{gradient_descent}
-  \titlecaption{Gradientenabstieg.}{Es wird von einer beliebigen
+  \titlecaption{Gradient descent.}{The algorithm starts at an
-    Position aus gestartet und der Gradient berechnet und die Position
+    arbitrary position. At each point the gradient is estimated and
-    ver\"andert. Jeder Punkt zeigt die Position nach jedem
+    the position is updated as long as the length of the gradient is
-    Optimierungsschritt an.} \label{gradientdescentfig}
+    sufficiently large.The dots show the positions after each
    iteration of the algorithm.} \label{gradientdescentfig}
 \end{figure}
 \setboolean{showexercisesolutions}{false}
 \begin{exercise}{gradientDescent.m}{}
-  Implementiere den Gradientenabstieg f\"ur das Problem der
+  Implement the gradient descent for the problem of the linear
-  Parameteranpassung der linearen Geradengleichung an die Messdaten in
+  equation for the measured data in file \file{lin\_regression.mat}.
  der Datei \file{lin\_regression.mat}.
  \begin{enumerate}
-  \item Merke Dir f\"ur jeden Schritt den Fehler zwischen
+  \item Store for each iteration the error value.
-    Modellvorhersage und Daten.
+  \item Create a plot that shows the error value as a function of the
-  \item Erstelle eine Plot, der die Entwicklung des Fehlers als
+    number of optimization steps.
-    Funktion der Optimierungsschritte zeigt.
+  \item Create a plot that shows the measured data and the best fit.
  \item Erstelle einen Plot, der den besten Fit in die Daten plottet.
  \end{enumerate}
 \end{exercise}
-\section{Fazit}
+\section{Summary}
 Mit dem Gradientenabstieg haben wir eine wichtige Methode zur
 Bestimmung eines globalen Minimums einer Kostenfunktion
 kennengelernt. 
-F\"ur den Fall des Kurvenfits mit einer Geradengleichung zeigt der
+The gradient descent is an important method for solving optimization
-mittlere quadratische Abstand als Kostenfunktion in der Tat ein
+problems. It is used to find the global minimum of an objective
-einziges klar definiertes Minimum.  Wie wir im n\"achsten Kapitel
+function.
 sehen werden, kann die Position des Minimums bei Geradengleichungen
 sogar analytisch bestimmt werden, der Gradientenabstieg w\"are also
 gar nicht n\"otig \matlabfun{polyfit()}.
-F\"ur Parameter, die nichtlinear in einer Funktion
+In the case of the linear equation the error surface (using the mean
-enthalten sind, wie z.B. die Rate $\lambda$ als Parameter in der
+square error) shows a clearly defined minimum. The position of the
-Exponentialfunktion $f(x;\lambda) = \exp(\lambda x)$, gibt es keine
+minimum can be analytically calculated. The next chapter will
-analytische L\"osung, und das Minimum der Kostenfunktion muss
+introduce how this can be done without using the gradient descent
-numerisch, z.B. mit dem Gradientenabstiegsverfahren bestimmt werden.
+\matlabfun{polyfit()}.
-Um noch schneller das Minimum zu finden, kann das Verfahren des
+Problems that involve nonlinear computations on parameters, e.g. the
-Gradientenabstiegs auf vielf\"altige Weise verbessert
+rate $\lambda$ in the exponential function $f(x;\lambda) =
-werden. z.B. kann die Schrittweite an die St\"arke des Gradienten
+\exp(\lambda x)$, do not have an analytical solution. To find minima
-angepasst werden. Diese numerischen Tricks sind in bereits vorhandenen
+in such functions numerical methods such as the gradient descent have
-Funktionen implementiert.  Allgemeine Funktionen sind f\"ur beliebige
+to be applied.
-Kostenfunktionen gemacht \matlabfun{fminsearch()}, w\"ahrend spezielle
+
-Funktionen z.B. f\"ur die Minimierung des quadratischen Abstands bei
+The suggested gradient descent algorithm can be improved in multiple
-einem Kurvenfit angeboten werden \matlabfun{lsqcurvefit()}.
+ways to converge faster.  For example one could adapt the step size to
 the length of the gradient. These numerical tricks have already been
 implemented in pre-defined functions. Generic optimization functions
 such as \matlabfun{fminsearch()} have been implemented for arbitrary
 objective functions while more specialized functions are specifically
 designed for optimizations in the least square error sense
 \matlabfun{lsqcurvefit()}.
 \newpage
-\begin{important}[Achtung Nebenminima!]
+\begin{important}[Beware of secondary minima!]
-  Das Finden des globalen Minimums ist leider nur selten so leicht wie
+  Finding the absolute minimum is not always as easy as in the case of
-  bei einem Geradenfit. Oft hat die Kostenfunktion viele Nebenminima,
+  the linear equation. Often, the error surface has secondary or local
-  in denen der Gradientenabstieg enden kann, obwohl das gesuchte
+  minima in which the gradient descent stops even though there is a
-  globale Minimum noch weit entfernt ist. Darum ist es meist sehr
+  more optimal solution. Starting from good start positions is a good
-  wichtig, wirklich gute Startwerte f\"ur die zu bestimmenden
+  approach to avoid getting stuck in local minima. Further it is
-  Parameter der Kostenfunktion zu haben. Auch sollten nur so wenig wie
+  easier to optimize as few parameters as possible. Each additional
-  m\"oglich Parameter gefittet werden, da jeder zus\"atzliche
+  parameter increases complexity and is computationally expensive.
  Parameter den Optimierungsprozess schwieriger und
  rechenaufw\"andiger macht.
 \end{important}
 \selectlanguage{english}