[regression] translations I

2018-10-10 18:05:42 +02:00 · 2018-10-10 18:05:42 +02:00 · 51a8183f33
commit 51a8183f33
parent fde6fb6177
1 changed files with 122 additions and 147 deletions
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -1,196 +1,171 @@
 \chapter{\tr{Optimization and gradient descent}{Optimierung und Gradientenabstieg}}
-\selectlanguage{ngerman}
+% \selectlanguage{ngerman}
 To understand the behaviour of a given system sciences often probe the
 system with input signals and then try to explain the responses
 through a model. Typically the model has a few parameter that specify
 how input and output signals are related. The question arises which
 combination of paramters are best suited to describe the relation of
 in- and output. The process of finding the best paramter set is called
 optimization or also \enterm{curve fitting}. One rather generic
 approach to the problem is the so called gradient descent method which
 will be introduced in this chapter.
 Ein sehr h\"aufiges Problem ist, dass die Abh\"angigkeit von
 Messwerten von einer Eingangsgr\"o{\ss}e durch ein Modell erkl\"art
 werden soll. Das Modell enth\"alt \"ublicherweise einen oder mehrere
 Parameter, die den Zusammenhang modifizieren. Wie soll die beste
 Parameterisierung des Modells gefunden werden, so dass das Modell die
 Daten am besten beschreibt? Dieser Prozess der Parameteranpassung ist
 ein Optimierungsproblem, der als Kurvenfit bekannt ist
 (\enterm{curve fitting}).
 \begin{figure}[t]
  \includegraphics[width=1\textwidth]{lin_regress}\hfill
-  \titlecaption{Beispieldatensatz f\"ur den Geradenfit.}{F\"ur eine
+  \titlecaption{Example data suggesting a linear relation.}{A set of
-    Reihe von Eingangswerten $x$, z.B. Stimulusintensit\"aten, wurden
+    input signals $x$, e.g. stimulus intensities, were used to probe a
-    die Antworten $y$ eines Systems gemessen (links). Der postulierte
+    system. The system's output $y$ to the inputs are noted
-    lineare Zusammenhang hat als freie Parameter die Steigung (mitte)
+    (left). Assuming a linear relation between $x$ and $y$ leaves us
-    und den $y$-Achsenabschnitt (rechts).}\label{linregressiondatafig}
+    with 2 parameters, the slope (center) and the intercept with the
    y-axis (right panel).}\label{linregressiondatafig}
 \end{figure}
-Die Punktewolke in \figref{linregressiondatafig} legt
+The data plotted in \figref{linregressiondatafig} suggests a linear
-zum Beispiel nahe, einen (verrauschten) linearen Zusammenhang zwischen
+relation between input and output of the invesitagted system. We thus
-der Eingangsgr\"o{\ss}e $x$ (\enterm{input}) und der Systemantwort
+assume that the linear equation
-$y$ (\enterm{output}) zu postulieren.
+\[y = f(x; m, b) = m\cdot x + b \] is an appropriate model to describe the system.
-Wir nehmen also an, dass die Geradengleichung 
+The linear equation has two free paramteter $m$ and $b$ which denote
-\[y = f(x; m, b) = m\cdot x + b \] 
+the slope and the y-intercept, respectively. In this chapter we will
-ein gutes Modell f\"ur das zugrundeliegende System sein k\"onnte
+use this example to illustrate the methods behind several curve
-(Abbildung \ref{linregressiondatafig}).  Die Geradengleichung hat die
+fitting approaches. We will apply this method to find the combination
-beiden Parameter Steigung $m$ und $y$-Achsenabschnitt $b$ und es wird
+of slope and intercept that best describes the system.
-die Kombination von $m$ und $b$ gesucht, die die Systemantwort am
+
-besten vorhersagt.
+
-
+\section{The error function --- mean square error}
-In folgenden Kapitel werden wir anhand dieses Beispiels zeigen, welche
+
-Methoden hinter einem Kurvenfit stecken, wie also numerisch die
+Before the optimization can be done we need to specify what is
-optimale Kombination aus Steigung und $y$-Achsen\-abschnitt gefunden
+considered an optimal fit. In our example we search the parameter
-werden kann.
+combination that describe the relation of $x$ and $y$ best. What is
-
+meant by this? Each input $x_i$ leads to an output $y_i$ and for each
-
+$x_i$ there is a \emph{prediction} or \emph{estimation}
-\section{Mittlere quadratischen Abweichung}
+$y^{est}_i$. For each of $x_i$ estimation and measurement will have a
-
+certain distance $y_i - y_i^{est}$. In our example the estimation is
-Zuerst m\"u{\ss}en wir pr\"azisieren, was wir unter optimalen
+given by the linear equation $y_i^{est} = f(x;m,b)$. The best fit of
-Parametern verstehen. Es sollen die Werte der Parameter der
+the model with the parameters $m$ and $b$ leads to the minimal
-Geradengleichung sein, so dass die entsprechende Gerade am besten die
+distances between observation $y_i$ and estimation $y_i^{est}$
-Daten beschreibt.  Was meinen wir damit? Jeder $y$-Wert der $N$
+(\figref{leastsquareerrorfig}).
-Datenpaare wird einen Abstand $y_i - y^{est}_i$ zu den durch das
+
-Modell vorhergesagten Werten $y^{est}_i$ (\enterm{estimate}) an den
+We could require that the sum $\sum_{i=1}^N y_i - y^{est}_i$ is
-entsprechenden $x$-Werten haben. In unserem Beispiel mit der
+minimized. This approach, however, will not work since a minimal sum
-Geradengleichung ist die Modellvorhersage $y^{est}_i=f(x_i;m,b)$
+can also be achieved if half of the measurements is above and the
-gegeben durch die Geradengleichung
+other half below the predicted line. Positive and negative errors
-(\figref{leastsquareerrorfig}). F\"ur den besten Fit sollten dieser
+would cancel out and then sum up to values close to zero. A better
-Abst\"ande m\"oglichst klein sein.
+approach is to consider the absolute value of the distance
-
+$\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
-Wir k\"onnten z.B. fordern, die Summe $\sum_{i=1}^N y_i - y^{est}_i$
+all deviations are indeed small no matter if they are above or below
-m\"oglichst klein zu machen. Das funktioniert aber nicht, da diese
+the prediced line. Instead of the sum we could also ask for the
-Summe auch dann klein wird, wenn die H\"alfte der $y$-Daten weit
+\emph{average}
-oberhalb der Geraden und die andere H\"alfte weit darunter liegt, da
+
 sich diese positiven und negativen Werte gegenseitig zu Zahlen nahe
 Null aufsummieren. Besser w\"are es auf jeden Fall, die Summe des
 Betrags der Abst\"ande $\sum_{i=1}^N |y_i - y^{est}_i|$ zu betrachten. Ein
 kleiner Wert der Summe kann dann nur erreicht werden, wenn die
 Abst\"ande der Datenpunkte von der Kurve tats\"achlich klein sind,
 unabh\"angig ob sie \"uber oder unter der Gerade liegen. Statt der
 Summe k\"onnen wir genauso gut fordern, dass der \emph{mittlere} Abstand
 \begin{equation}
  \label{meanabserror}
  f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
 \end{equation}
-der Menge der $N$ Datenpaare $(x_i, y_i)$ gegeben die Modellvorhersagen
+should be small. Commonly, the \enterm{mean squared distance} oder
-$y_i^{est}$ klein sein soll.
+\enterm{mean squared error}
 Am h\"aufigsten wird jedoch bei einem Kurvenfit der \determ[mittlerer
 quadratische Abstand]{mittlere quadratische Abstand} (\enterm{mean
  squared distance} oder \enterm{mean squared error})
 \begin{equation}
  \label{meansquarederror}
  f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
 \end{equation}
-verwendet (\figref{leastsquareerrorfig}). Wie beim Betrag sind die
+
-quadratischen Abst\"ande immer positiv, unabh\"angig ob die Datenwerte
+is used (\figref{leastsquareerrorfig}). Similar to the absolute
-\"uber oder unter der Kurve liegen. Durch das Quadrat werden
+distance, the square of the error($(y_i - y_i^{est})^2$) is always
-zus\"atzlich gro{\ss}e Abst\"ande st\"arker gewichtet.
+positive error values do not cancel out. The square further punishes
 large deviations.
 \begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
-  Schreibe eine Funktion \code{meanSquareError()}, die die mittlere
+  Implement a function \code{meanSquareError()}, that calculates the
-  quadratische Abweichung zwischen einem Vektor mit den beobachteten
+  \emph{mean square distance} bewteen a vector of observations ($y$)
-  Werten $y$ und einem Vektor mit den entsprechenden Vorhersagen
+  and respective predictions ($y^{est}$).  \pagebreak[4]
  $y^{est}$ berechnet.
  \pagebreak[4]
 \end{exercise}
-\section{Zielfunktion}
+\section{\tr{Objective function}{Zielfunktion}}
-$f_{cost}(\{(x_i, y_i)\}|\{y^{est}_i\})$ ist eine sogenannte
+$f_{cost}(\{(x_i, y_i)\}|\{y^{est}_i\})$ is a so called
-\determ{Zielfunktion}, oder \determ{Kostenfunktion} (\enterm{objective
+\enterm{objective function} or \enterm{cost function}. We aim to adapt
-  function}, \enterm{cost function}), da wir die Modellvorhersage so
+the model parameters to minimize the error (mean square error) and
-anpassen wollen, dass der mittlere quadratische Abstand, also die
+thus the \emph{objective function}. In Chapter~\ref{maximumlikelihood}
-Zielfunktion, minimiert wird. In
+we will show that the minimization of the mean square error is
-Kapitel~\ref{maximumlikelihoodchapter} werden wir sehen, dass die
+equivalent to maximizing the likelihood that the observations
-Minimierung des mittleren quadratischen Abstands \"aquivalent zur
+originate from the model (assuming a normal distribution of the data
-Maximierung der Wahrscheinlichkeit ist, dass die Daten aus der
+around the model prediction).
 Modellfunktion stammen, unter der Vorraussetzung, dass die Daten
 um die Modellfunktion normalverteilt streuen.
 \begin{figure}[t]
  \includegraphics[width=1\textwidth]{linear_least_squares}
-  \titlecaption{Ermittlung des mittleren quadratischen Abstands.}
+  \titlecaption{Estimating the \emph{mean square error}.}  {The
-  {Der Abstand (\enterm{error}, orange) zwischen der Vorhersage (rote
+    deviation (\enterm{error}, orange) between the prediction (red
-    Gerade) und den Messdaten (blaue Punkte) wird f\"ur jeden
+    line) and the observations (blue dots) is calculated for each data
-    gemessenen Datenpunkt ermittelt (links). Anschlie{\ss}end werden
+    point (left). Then the deviations are squared and the aveage is
-    die Differenzen zwischen Messwerten und Vorhersage quadriert
+    calculated (right).}
    (\enterm{squared error}) und der Mittelwert berechnet (rechts).}
  \label{leastsquareerrorfig}
 \end{figure}
-Die Kostenfunktion mu{\ss} nicht immer der mittlere quadratische
+The error or also \enterm{cost function} is not necessarily the mean
-Abstand sein. Je nach Problemstellung kann die Kostenfunktion eine
+square distance but can be any function that maps the predictions to a
-beliebige Funktion sein, die die Parameter eines Modells auf einen
+scalar value describing the quality of the fit. In the optimization
-Wert abbildet, der in irgendeiner Weise die Qualit\"at des Modells
+process we aim for the paramter combination that minimized the costs
-quantifiziert. Ziel ist es dann, diejenigen Parameterwerte zu finden,
+(error).
-bei der die Kostenfunktion minimiert wird.
+
-%%% Einfaches verbales Beispiel?
+%%% Einfaches verbales Beispiel? Eventuell aus der Populationsoekologie?
-
+Replacing $y^{est}$ with the linear equation (the model) in
-Wenn wir nun in unsere Gleichung \eqref{meansquarederror} f\"ur die
+(\eqnref{meansquarederror}) we yield:
-Modellvorhersage $y^{est}$ die Geradengleichung einsetzen, erhalten wir
+
 f\"ur die Zielfunktion
 \begin{eqnarray}
  f_{cost}(\{(x_i, y_i)\}|m,b) & = & \frac{1}{N} \sum_{i=1}^N (y_i - f(x_i;m,b))^2 \label{msefunc} \\
  & = & \frac{1}{N} \sum_{i=1}^N (y_i - m x_i - b)^2 \label{mseline}
 \end{eqnarray}
-den mittleren quadratischen Abstand der Datenpaare $(x_i, y_i)$
+
-gegeben die Parameterwerte $m$ und $b$ der Geradengleichung. Ziel des
+That is, the meas square error given the pairs $(x_i, y_i)$ and the
-Kurvenfits ist es, die Werte f\"ur $m$ und $b$ so zu optimieren, dass
+parameters $m$ and $b$ of the linear equation. The optimization
-der Fehler \eqnref{mseline} minimal wird (\determ{Methode der
+process will not try to optimize $m$ and $b$ to lead to the smallest
-  kleinsten Quadrate}, \enterm{least square error}).
+error, the method of the \enterm{least square error}.
 \begin{exercise}{lsqError.m}{}
-  Implementiere die Zielfunktion f\"ur die Optimierung mit der
+  Implement the objective function \code{lsqError()} that applies the
-  linearen Geradengleichung als Funktion \code{lsqError()}.
+  linear equation as a model.
  \begin{itemize}
-  \item Die Funktion \"ubernimmt drei Argumente: Das erste Argument
+  \item The function takes three arguments. The first is a 2-element
-    ist ein 2-elementiger Vektor, der die Parameter \varcode{m} und
+    vector that contains the values of parameters \varcode{m} and
-    \varcode{b} enth\"alt.  Das zweite ist ein Vektor mit den $x$-Werten,
+    \varcode{b}. The second is a vector of x-values the third contains
-    an denen gemessen wurde, und das dritte ein Vektor mit den
+    the measurements for each value of $x$, the respecive $y$-values.
-    zugeh\"origen $y$-Werten.
+  \item The function returns the mean square error \eqnref{mseline}.
-  \item Die Funktion gibt als Ergebniss den Fehler als mittleren
+  \item The function should call the function \code{meanSquareError()}
-    quadratischen Abstand \eqnref{mseline} zur\"uck.
+    defined in the previouos exercise to calculate the error.
  \item Die Funktion soll die Funktion \code{meanSquareError()} der
    vorherigen \"Ubung benutzen.
  \end{itemize}
 \end{exercise}
-\section{Fehlerfl\"ache}
+\section{Error surface}
-
+The two parameters of the model define a surface. For each combination
-Die beiden Parameter $m$ und $b$ der Geradengleichung spannen eine
+of $m$ and $b$ we can use \eqnref{mseline} to calculate the associated
-F\"ache auf. F\"ur jede Kombination aus $m$ und $b$ k\"onnen wir den
+error. We thus consider the objective function $f_{cost}(\{(x_i,
-Wert der Zielfunktion, hier der mittlere quadratische Abstand
+y_i)\}|m,b)$ as a function $f_{cost}(m,b)$, that maps the variables
-\eqnref{meansquarederror}, berechnen.  Wir betrachten also die
+$m$ and $b$ to an error value.
 Kostenfunktion $f_{cost}(\{(x_i, y_i)\}|m,b)$ nun als Funktion
 $f_{cost}(m,b)$, die die beiden Variablen $m$ und $b$ auf einen
 Fehlerwert abbildet.
-Es gibt also f\"ur jeden Punkt in der sogenannten
+Thus, for each spot of the surface we get an error that we can
-\determ{Fehlerfl\"ache} einen Fehlerwert. In diesem Beispiel eines
+illustrate graphically using a 3-d surface-plot, i.e. the error
-2-dimensionalen Problems (zwei freie Parameter) kann die
+surface. $m$ and $b$ are plotted on the $x-$ and $y-$ axis while the
-Fehlerfl\"ache graphisch durch einen 3-d \enterm{surface-plot}
+third dimension is used to indicate the error value
 dargestellt werden. Dabei werden auf der $x$- und der $y$-Achse die
 beiden Parameter und auf der $z$-Achse der Fehlerwert aufgetragen
 (\figref{errorsurfacefig}).
 \begin{figure}[t]
  \includegraphics[width=0.75\columnwidth]{error_surface.pdf}
-  \titlecaption{Fehlerfl\"ache.}{Die beiden freien Parameter
+  \titlecaption{Error surface.}{The two model parameters $m$ and $b$
-    unseres Modells $m$ und $b$ spannen die Grundfl\"ache des Plots
+    define the base area of the surface plot. For each parameter
-    auf. F\"ur jede Kombination von Steigung $m$ und
+    combination of slope and intercept the error is calculated. The
-    $y$-Achsenabschnitt $b$ wird die errechnete Vorhersage des Modells
+    resulting surface has a minimum which indicates the parameter
-    mit den Messwerten verglichen und der Fehlerwert geplottet. Die
+    combination that best fits the data.}\label{errorsurfacefig}
    sich ergebende Fehlerfl\"ache hat ein Minimum (roter Punkt) bei
    den Werten von $m$ und $b$, f\"ur die die Gerade die Daten am
    besten beschreibt.}\label{errorsurfacefig}
 \end{figure}
 \begin{exercise}{errorSurface.m}{}\label{errorsurfaceexercise}%
-  Lade den Datensatz \textit{lin\_regression.mat} in den Workspace (20
+  Load the dataset \textit{lin\_regression.mat} into the workspace (20
-  Datenpaare in den Vektoren \varcode{x} und \varcode{y}). Schreibe ein Skript
+  data pairs contained in the vectors \varcode{x} and
-  \file{errorSurface.m}, dass den Fehler, berechnet als mittleren
+  \varcode{y}). Implement a script \file{errorSurface.m}, that
-  quadratischen Abstand zwischen den Daten und einer Geraden mit
+  calculates the mean square error between data and a linear model und
-  Steigung $m$ und $y$-Achsenabschnitt $b$, in Abh\"angigkeit von $m$
+  illustrates the error surface using the \code{surf()} function
-  und $b$ als surface plot darstellt (siehe Hilfe f\"ur die
+  (consult the help to find out how to use \code{surf}.).
  \code{surf()} Funktion).
 \end{exercise}
 An der Fehlerfl\"ache kann direkt erkannt werden, bei welcher