[regression] started to imporve the chapter

2019-12-10 17:57:06 +01:00 · 2019-12-10 17:57:06 +01:00 · 357477e17f
commit 357477e17f
parent 5484a7136e
2 changed files with 35 additions and 27 deletions
--- a/header.tex
+++ b/header.tex
@ -226,7 +226,7 @@
 % \determ[index entry]{<german term>}
 % typeset the term in quotes and add it (or the optional argument) to
 % the german index.
-\newcommand{\determ}[2][]{``#2''\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
+\newcommand{\determ}[2][]{\selectlanguage{german}``#2''\selectlanguage{english}\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
 \newcommand{\file}[1]{\texttt{#1}}
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -1,16 +1,22 @@
 \chapter{Optimization and gradient descent}
 \exercisechapter{Optimization and gradient descent}
-To understand the behaviour of a given system sciences often probe the
+Optimization problems arise in many different contexts. For example,
-system with input signals and then try to explain the responses
+to understand the behavior of a given system, the system is probed
-through a model. Typically the model has a few parameter that specify
+with a range of input signals and then the resulting responses are
-how input and output signals are related. The question arises which
+measured. This input-output relation can be described by a model. Such
-combination of paramters are best suited to describe the relation of
+a model can be a simple function that maps the input signals to
-in- and output. The process of finding the best paramter set is called
+corresponding responses, it can be a filter, or a system of
-optimization or also \enterm{curve fitting}. One rather generic
+differential equations. In any case, the model has same parameter that
-approach to the problem is the so called gradient descent method which
+specify how input and output signals are related.  Which combination
-will be introduced in this chapter.
+of parameter values are best suited to describe the input-output
-
+relation? The process of finding the best parameter values is an
 optimization problem. For a simple parameterized function that maps
 input to output values, this is the special case of a \enterm{curve
  fitting} problem, where the average distance between the curve and
 the response values is minimized. One basic numerical method used for
 such optimization problems is the so called gradient descent, which is
 introduced in this chapter.
 \begin{figure}[t]
  \includegraphics[width=1\textwidth]{lin_regress}\hfill
@ -22,20 +28,21 @@ will be introduced in this chapter.
    y-axis (right panel).}\label{linregressiondatafig}
 \end{figure}
-The data plotted in \figref{linregressiondatafig} suggests a linear
+The data plotted in \figref{linregressiondatafig} suggest a linear
-relation between input and output of the invesitagted system. We thus
+relation between input and output of the system. We thus assume that a
-assume that the linear equation
+straight line
-\[y = f(x; m, b) = m\cdot x + b \] is an appropriate model to describe the system.
+\[y = f(x; m, b) = m\cdot x + b \] 
-The linear equation has two free paramteter $m$ and $b$ which denote
+is an appropriate model to describe the system.  The line has two free
-the slope and the y-intercept, respectively. In this chapter we will
+parameter, the slope $m$ and the $y$-intercept $b$. We need to find
-use this example to illustrate the methods behind several curve
+values for the slope and the intercept that best describe the measured
-fitting approaches. We will apply this method to find the combination
+data.  In this chapter we use this example to illustrate the gradient
-of slope and intercept that best describes the system.
+descent and how this methods can be used to find a combination of
 slope and intercept that best describes the system.
 \section{The error function --- mean squared error}
-Before the optimization can be done we need to specify what is
+Before the optimization can be done we need to specify what exactly is
 considered an optimal fit. In our example we search the parameter
 combination that describe the relation of $x$ and $y$ best. What is
 meant by this? Each input $x_i$ leads to an output $y_i$ and for each
@ -55,22 +62,23 @@ would cancel out and then sum up to values close to zero. A better
 approach is to consider the absolute value of the distance
 $\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
 all deviations are indeed small no matter if they are above or below
-the prediced line. Instead of the sum we could also ask for the
+the predicted line. Instead of the sum we could also ask for the
 \emph{average}
 \begin{equation}
  \label{meanabserror}
  f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
 \end{equation}
-should be small. Commonly, the \enterm{mean squared distance} oder
+should be small. Commonly, the \enterm{mean squared distance} or
-\enterm[square error!mean]{mean square error} (\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer Fehler})
+\enterm[square error!mean]{mean square error} (\determ[quadratischer
 Fehler!mittlerer]{mittlerer quadratischer Fehler})
 \begin{equation}
  \label{meansquarederror}
  f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
 \end{equation}
 is used (\figref{leastsquareerrorfig}). Similar to the absolute
-distance, the square of the error($(y_i - y_i^{est})^2$) is always
+distance, the square of the error, $(y_i - y_i^{est})^2$, is always
-positive error values do not cancel out. The square further punishes
+positive and thus error values do not cancel out. The square further
-large deviations.
+punishes large deviations over small deviations.
 \begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
  Implement a function \varcode{meanSquareError()}, that calculates the