diff --git a/header.tex b/header.tex
index a9866c7..8a8f55f 100644
--- a/header.tex
+++ b/header.tex
@@ -226,7 +226,7 @@
 % \determ[index entry]{<german term>}
 % typeset the term in quotes and add it (or the optional argument) to
 % the german index.
-\newcommand{\determ}[2][]{``#2''\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
+\newcommand{\determ}[2][]{\selectlanguage{german}``#2''\selectlanguage{english}\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
 
 \newcommand{\file}[1]{\texttt{#1}}
 
diff --git a/regression/lecture/regression.tex b/regression/lecture/regression.tex
index 7e020ae..1111796 100644
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@@ -1,16 +1,22 @@
 \chapter{Optimization and gradient descent}
 \exercisechapter{Optimization and gradient descent}
 
-To understand the behaviour of a given system sciences often probe the
-system with input signals and then try to explain the responses
-through a model. Typically the model has a few parameter that specify
-how input and output signals are related. The question arises which
-combination of paramters are best suited to describe the relation of
-in- and output. The process of finding the best paramter set is called
-optimization or also \enterm{curve fitting}. One rather generic
-approach to the problem is the so called gradient descent method which
-will be introduced in this chapter.
-
+Optimization problems arise in many different contexts. For example,
+to understand the behavior of a given system, the system is probed
+with a range of input signals and then the resulting responses are
+measured. This input-output relation can be described by a model. Such
+a model can be a simple function that maps the input signals to
+corresponding responses, it can be a filter, or a system of
+differential equations. In any case, the model has same parameter that
+specify how input and output signals are related.  Which combination
+of parameter values are best suited to describe the input-output
+relation? The process of finding the best parameter values is an
+optimization problem. For a simple parameterized function that maps
+input to output values, this is the special case of a \enterm{curve
+  fitting} problem, where the average distance between the curve and
+the response values is minimized. One basic numerical method used for
+such optimization problems is the so called gradient descent, which is
+introduced in this chapter.
 
 \begin{figure}[t]
   \includegraphics[width=1\textwidth]{lin_regress}\hfill
@@ -22,20 +28,21 @@ will be introduced in this chapter.
     y-axis (right panel).}\label{linregressiondatafig}
 \end{figure}
 
-The data plotted in \figref{linregressiondatafig} suggests a linear
-relation between input and output of the invesitagted system. We thus
-assume that the linear equation
-\[y = f(x; m, b) = m\cdot x + b \] is an appropriate model to describe the system.
-The linear equation has two free paramteter $m$ and $b$ which denote
-the slope and the y-intercept, respectively. In this chapter we will
-use this example to illustrate the methods behind several curve
-fitting approaches. We will apply this method to find the combination
-of slope and intercept that best describes the system.
+The data plotted in \figref{linregressiondatafig} suggest a linear
+relation between input and output of the system. We thus assume that a
+straight line
+\[y = f(x; m, b) = m\cdot x + b \] 
+is an appropriate model to describe the system.  The line has two free
+parameter, the slope $m$ and the $y$-intercept $b$. We need to find
+values for the slope and the intercept that best describe the measured
+data.  In this chapter we use this example to illustrate the gradient
+descent and how this methods can be used to find a combination of
+slope and intercept that best describes the system.
 
 
 \section{The error function --- mean squared error}
 
-Before the optimization can be done we need to specify what is
+Before the optimization can be done we need to specify what exactly is
 considered an optimal fit. In our example we search the parameter
 combination that describe the relation of $x$ and $y$ best. What is
 meant by this? Each input $x_i$ leads to an output $y_i$ and for each
@@ -55,22 +62,23 @@ would cancel out and then sum up to values close to zero. A better
 approach is to consider the absolute value of the distance
 $\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
 all deviations are indeed small no matter if they are above or below
-the prediced line. Instead of the sum we could also ask for the
+the predicted line. Instead of the sum we could also ask for the
 \emph{average}
 \begin{equation}
   \label{meanabserror}
   f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
 \end{equation}
-should be small. Commonly, the \enterm{mean squared distance} oder
-\enterm[square error!mean]{mean square error} (\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer Fehler})
+should be small. Commonly, the \enterm{mean squared distance} or
+\enterm[square error!mean]{mean square error} (\determ[quadratischer
+Fehler!mittlerer]{mittlerer quadratischer Fehler})
 \begin{equation}
   \label{meansquarederror}
   f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
 \end{equation}
 is used (\figref{leastsquareerrorfig}). Similar to the absolute
-distance, the square of the error($(y_i - y_i^{est})^2$) is always
-positive error values do not cancel out. The square further punishes
-large deviations.
+distance, the square of the error, $(y_i - y_i^{est})^2$, is always
+positive and thus error values do not cancel out. The square further
+punishes large deviations over small deviations.
 
 \begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
   Implement a function \varcode{meanSquareError()}, that calculates the