diff --git a/likelihood/lecture/likelihood.tex b/likelihood/lecture/likelihood.tex
index a505450..1994a8b 100644
--- a/likelihood/lecture/likelihood.tex
+++ b/likelihood/lecture/likelihood.tex
@@ -6,8 +6,9 @@
 
 A common problem in statistics is to estimate from a probability
 distribution one or more parameters $\theta$ that best describe the
-data $x_1, x_2, \ldots x_n$.  \enterm{Maximum likelihood estimators}
-(\enterm[mle|see{Maximum likelihood estimators}]{mle},
+data $x_1, x_2, \ldots x_n$.  \enterm[maximum likelihood
+estimator]{Maximum likelihood estimators} (\enterm[mle|see{maximum
+  likelihood estimator}]{mle},
 \determ{Maximum-Likelihood-Sch\"atzer}) choose the parameters such
 that they maximize the likelihood of the data $x_1, x_2, \ldots x_n$
 to originate from the distribution.
@@ -228,7 +229,7 @@ maximized respectively.
 \begin{figure}[t]
   \includegraphics[width=1\textwidth]{mlepropline}
   \titlecaption{\label{mleproplinefig} Maximum likelihood estimation
-    of the slope of line through the origin.}{The data (blue and
+    of the slope of a line through the origin.}{The data (blue and
     left histogram) originate from a straight line $y=mx$ trough the origin
     (red). The maximum-likelihood estimation of the slope $m$ of the
     regression line (orange), \eqnref{mleslope}, is close to the true
@@ -257,6 +258,7 @@ respect to $\theta$ and equate it to zero:
 This is an analytical expression for the estimation of the slope
 $\theta$ of the regression line (\figref{mleproplinefig}).
 
+\subsection{Linear and non-linear fits}
 A gradient descent, as we have done in the previous chapter, is not
 necessary for fitting the slope of a straight line, because the slope
 can be directly computed via \eqnref{mleslope}. More generally, this
@@ -275,6 +277,7 @@ exponential decay
 Such cases require numerical solutions for the optimization of the
 cost function, e.g. the gradient descent \matlabfun{lsqcurvefit()}.
 
+\subsection{Relation between slope and correlation coefficient}
 Let us have a closer look on \eqnref{mleslope}. If the standard
 deviation of the data $\sigma_i$ is the same for each data point,
 i.e. $\sigma_i = \sigma_j \; \forall \; i, j$, the standard deviation drops
diff --git a/plotting/lecture/plotting.tex b/plotting/lecture/plotting.tex
index a9d665b..e3d3280 100644
--- a/plotting/lecture/plotting.tex
+++ b/plotting/lecture/plotting.tex
@@ -93,7 +93,7 @@ number of datasets.
 \subsection{Simple plotting}
 
 Creating a simple line-plot is rather easy. Assuming there exists a
-variable \varcode{y} in the \codeterm{Workspace} that contains the
+variable \varcode{y} in the \codeterm{workspace} that contains the
 measurement data it is enough to call \code[plot()]{plot(y)}. At the
 first call of this function a new \codeterm{figure} will be opened and
 the data will be plotted with as a line plot. If you repeatedly call
diff --git a/pointprocesses/lecture/pointprocesses.tex b/pointprocesses/lecture/pointprocesses.tex
index cbeaf80..a8fae40 100644
--- a/pointprocesses/lecture/pointprocesses.tex
+++ b/pointprocesses/lecture/pointprocesses.tex
@@ -3,7 +3,7 @@
 \chapter{Spiketrain analysis}
 \exercisechapter{Spiketrain analysis}
 
-\enterm[Action potentials]{action potentials} (\enterm{spikes}) are
+\enterm[action potential]{Action potentials} (\enterm{spikes}) are
 the carriers of information in the nervous system. Thereby it is the
 time at which the spikes are generated that is of importance for
 information transmission. The waveform of the action potential is
@@ -110,9 +110,9 @@ describing the statistics of stochastic real-valued variables:
   \frac{1}{n}\sum\limits_{i=1}^n T_i$.
 \item Standard deviation of the interspike intervals: $\sigma_{ISI} = \sqrt{\langle (T - \langle T
     \rangle)^2 \rangle}$\vspace{1ex}
-\item \enterm{Coefficient of variation}: $CV_{ISI} =
+\item \enterm[coefficient of variation]{Coefficient of variation}: $CV_{ISI} =
   \frac{\sigma_{ISI}}{\mu_{ISI}}$.
-\item \enterm{Diffusion coefficient}: $D_{ISI} =
+\item \enterm[diffusion coefficient]{Diffusion coefficient}: $D_{ISI} =
   \frac{\sigma_{ISI}^2}{2\mu_{ISI}^3}$.
 \end{itemize}
 
diff --git a/programmingstyle/lecture/programmingstyle.tex b/programmingstyle/lecture/programmingstyle.tex
index 0542e65..bb24709 100644
--- a/programmingstyle/lecture/programmingstyle.tex
+++ b/programmingstyle/lecture/programmingstyle.tex
@@ -356,7 +356,7 @@ should take care when defining nested functions.
 \section{Specifics when using scripts}
 A similar problem as with nested function arises when using scripts
 (instead of functions). All variables that are defined within a script
-become available in the global \codeterm{Workspace}. There is the risk
+become available in the global \codeterm{workspace}. There is the risk
 of name conflicts, that is, a called sub-script redefines or uses the
 same variable name and may \emph{silently} change its content. The
 user will not be notified about this change and the calling script may
diff --git a/regression/lecture/regression.tex b/regression/lecture/regression.tex
index 71d415f..2770967 100644
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@@ -76,8 +76,8 @@ large deviations.
 
 \begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
   Implement a function \code{meanSquareError()}, that calculates the
-  \emph{mean square distance} bewteen a vector of observations ($y$)
-  and respective predictions ($y^{est}$).  \pagebreak[4]
+  \emph{mean square distance} between a vector of observations ($y$)
+  and respective predictions ($y^{est}$).
 \end{exercise}
 
 
diff --git a/statistics/lecture/statistics.tex b/statistics/lecture/statistics.tex
index 638136d..642aa49 100644
--- a/statistics/lecture/statistics.tex
+++ b/statistics/lecture/statistics.tex
@@ -147,11 +147,11 @@ data are smaller than the 3$^{\rm rd}$ quartile.
 %     from a normal distribution.}
 % \end{figure}
 
-\enterm{Box-whisker plots} are commonly used to visualize and compare
-the distribution of unimodal data. A box is drawn around the median
-that extends from the 1$^{\rm st}$ to the 3$^{\rm rd}$ quartile. The
-whiskers mark the minimum and maximum value of the data set
-(\figref{displayunivariatedatafig} (3)).
+\enterm[box-whisker plots]{Box-whisker plots} are commonly used to
+visualize and compare the distribution of unimodal data. A box is
+drawn around the median that extends from the 1$^{\rm st}$ to the
+3$^{\rm rd}$ quartile. The whiskers mark the minimum and maximum value
+of the data set (\figref{displayunivariatedatafig} (3)).
 
 \begin{exercise}{univariatedata.m}{}
   Generate 40 normally distributed random numbers with a mean of 2 and
@@ -175,7 +175,7 @@ The distribution of values in a data set is estimated by histograms
 
 \subsection{Histograms}
 
-\enterm[Histogram]{Histograms} count the frequency $n_i$ of
+\enterm[histogram]{Histograms} count the frequency $n_i$ of
 $N=\sum_{i=1}^M n_i$ measurements in each of $M$ bins $i$
 (\figref{diehistogramsfig} left).  The bins tile the data range
 usually into intervals of the same size. The width of the bins is
@@ -193,8 +193,9 @@ categories $i$ is the \enterm{histogram}, or the \enterm{frequency
     with the expected theoretical distribution of $P=1/6$.}
 \end{figure}
 
-Histograms are often used to estimate the \enterm{probability
-  distribution} of the data values.
+Histograms are often used to estimate the
+\enterm[probability!distribution]{probability distribution} of the
+data values.
 
 \subsection{Probabilities}
 In the frequentist interpretation of probability, the probability of
@@ -253,13 +254,14 @@ probability can also be expressed as $P(x_0<x<x_0 + \Delta x)$.
 In the limit to very small ranges $\Delta x$ the probability of
 getting a measurement between $x_0$ and $x_0+\Delta x$ scales down to
 zero with $\Delta x$:
-\[ P(x_0<x<x_0+\Delta x) \approx p(x_0) \cdot \Delta x \; . \] 
-In here the quantity $p(x_00)$ is a so called \enterm{probability
-  density} that is larger than zero and that describes the
-distribution of the data values. The probability density is not a
-unitless probability with values between 0 and 1, but a number that
-takes on any positive real number and has as a unit the inverse of the
-unit of the data values --- hence the name ``density''.
+\[ P(x_0<x<x_0+\Delta x) \approx p(x_0) \cdot \Delta x \; . \]
+In here the quantity $p(x_00)$ is a so called
+\enterm[probability!density]{probability density} that is larger than
+zero and that describes the distribution of the data values. The
+probability density is not a unitless probability with values between
+0 and 1, but a number that takes on any positive real number and has
+as a unit the inverse of the unit of the data values --- hence the
+name ``density''.
 
 \begin{figure}[t]
   \includegraphics[width=1\textwidth]{pdfprobabilities}
@@ -280,14 +282,14 @@ the probability density over the whole real axis must be one:
 \end{equation}
 
 The function $p(x)$, that assigns to every $x$ a probability density,
-is called \enterm{probability density function},
+is called \enterm[probability!density function]{probability density function},
 \enterm[pdf|see{probability density function}]{pdf}, or just
 \enterm[density|see{probability density function}]{density}
 (\determ{Wahrscheinlichkeitsdichtefunktion}). The well known
 \enterm{normal distribution} (\determ{Normalverteilung}) is an example of a
 probability density function
 \[ p_g(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
---- the \enterm{Guassian distribution}
+--- the \enterm{Gaussian distribution}
 (\determ{Gau{\ss}sche-Glockenkurve}) with mean $\mu$ and standard
 deviation $\sigma$.
 The factor in front of the exponential function ensures the normalization to