fixed many index entries

2019-12-09 20:01:27 +01:00
parent f24c14e6f5
commit bf52536b7b
12 changed files with 332 additions and 306 deletions
--- a/statistics/lecture/statistics.tex
+++ b/statistics/lecture/statistics.tex
@@ -7,17 +7,16 @@ Descriptive statistics characterizes data sets by means of a few measures.
 In addition to histograms that estimate the full distribution of the data,
 the following measures are used for characterizing univariate data:
 \begin{description}
-\item[Location, central tendency] (``Lagema{\ss}e''):
-  arithmetic mean, median, mode.
-\item[Spread, dispersion] (``Streuungsma{\ss}e''): variance,
-  standard deviation, inter-quartile range,\linebreak coefficient of variation
-  (``Variationskoeffizient'').
-\item[Shape]: skewness (``Schiefe''), kurtosis (``W\"olbung'').
+\item[Location, central tendency] (\determ{Lagema{\ss}e}):
+  \entermde[mean!arithmetic]{Mittel!arithmetisches}{arithmetic mean}, \entermde{Median}{median}, \enterm{mode}.
+\item[Spread, dispersion] (\determ{Streuungsma{\ss}e}): \entermde{Varianz}{variance},
+  \entermde{Standardabweichung}{standard deviation}, inter-quartile range,\linebreak \enterm{coefficient of variation} (\determ{Variationskoeffizient}).
+\item[Shape]: \enterm{skewness} (\determ{Schiefe}), \enterm{kurtosis} (\determ{W\"olbung}).
 \end{description}
 For bivariate and multivariate data sets we can also analyse their
 \begin{description}
-\item[Dependence, association] (``Zusammenhangsma{\ss}e''): Pearson's correlation coefficient,
-  Spearman's rank correlation coefficient.
+\item[Dependence, association] (\determ{Zusammenhangsma{\ss}e}): \entermde[correlation!coefficient!Pearson's]{Korrelation!Pearson}{Pearson's correlation coefficient},
+  \entermde[correlation!coefficient!Spearman's rank]{{Rangkorrelationskoeffizient!Spearman'scher}}{Spearman's rank correlation coefficient}.
 \end{description}

 The following is in no way a complete introduction to descriptive
@@ -26,15 +25,16 @@ daily data-analysis problems.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Mean, variance, and standard deviation}
-The \enterm{arithmetic mean} is a measure of location. For $n$ data values
-$x_i$ the arithmetic mean is computed by
+The \entermde[mean!arithmetic]{Mittel!arithmetisches}{arithmetic mean}
+is a measure of location. For $n$ data values $x_i$ the arithmetic
+mean is computed by
 \[ \bar x = \langle x \rangle = \frac{1}{N}\sum_{i=1}^n x_i \; . \]
 This computation (summing up all elements of a vector and dividing by
 the length of the vector) is provided by the function \mcode{mean()}.
 The mean has the same unit as the data values.

 The dispersion of the data values around the mean is quantified by
-their \enterm{variance}
+their \entermde{Varianz}{variance}
 \[ \sigma^2_x = \langle (x-\langle x \rangle)^2 \rangle = \frac{1}{N}\sum_{i=1}^n (x_i - \bar x)^2 \; . \]
 The variance is computed by the function \mcode{var()}.
 The unit of the variance is the unit of the data values squared.
@@ -42,14 +42,15 @@ Therefore, variances cannot be compared to the mean or the data values
 themselves. In particular, variances cannot be used for plotting error
 bars along with the mean.

-The standard deviation
-\[ \sigma_x = \sqrt{\sigma^2_x} \; , \] 
-as computed by the function \mcode{std()}, however, has the same unit
-as the data values and can (and should) be used to display the
-dispersion of the data together with their mean.
+In contrast to the variance, the
+\entermde{Standardabweichung}{standard deviation}
+\[ \sigma_x = \sqrt{\sigma^2_x} \; , \]
+as computed by the function \mcode{std()} has the same unit as the
+data values and can (and should) be used to display the dispersion of
+the data together with their mean.

 The mean of a data set can be displayed by a bar-plot
-\matlabfun{bar()}. Additional errorbars \matlabfun{errobar()} can be
+\matlabfun{bar()}. Additional errorbars \matlabfun{errorbar()} can be
 used to illustrate the standard deviation of the data
 (\figref{displayunivariatedatafig} (2)).

@@ -90,18 +91,18 @@ used to illustrate the standard deviation of the data
    identical with the mode.}
 \end{figure}

-The \enterm{mode} is the most frequent value, i.e. the position of the maximum of the probability distribution.
+The \enterm{mode} (\determ{Modus}) is the most frequent value,
+i.e. the position of the maximum of the probability distribution.

-The \enterm{median} separates a list of data values into two halves
-such that one half of the data is not greater and the other half is
-not smaller than the median (\figref{medianfig}).
+The \entermde{Median}{median} separates a list of data values into two
+halves such that one half of the data is not greater and the other
+half is not smaller than the median (\figref{medianfig}).  The
+function \mcode{median()} computes the median.

 \begin{exercise}{mymedian.m}{}
  Write a function \varcode{mymedian()} that computes the median of a vector.
 \end{exercise}

-\matlab{} provides the function \code{median()} for computing the median.
-
 \begin{exercise}{checkmymedian.m}{}
  Write a script that tests whether your median function really
  returns a median above which are the same number of data than
@@ -122,9 +123,9 @@ not smaller than the median (\figref{medianfig}).
 \end{figure}

 The distribution of data can be further characterized by the position
-of its \enterm[quartile]{quartiles}. Neighboring quartiles are
+of its \entermde[quartile]{Quartil}{quartiles}. Neighboring quartiles are
 separated by 25\,\% of the data (\figref{quartilefig}).
-\enterm[percentile]{Percentiles} allow to characterize the
+\entermde[percentile]{Perzentil}{Percentiles} allow to characterize the
 distribution of the data in more detail. The 3$^{\rm rd}$ quartile
 corresponds to the 75$^{\rm th}$ percentile, because 75\,\% of the
 data are smaller than the 3$^{\rm rd}$ quartile.
@@ -147,11 +148,12 @@ data are smaller than the 3$^{\rm rd}$ quartile.
 %     from a normal distribution.}
 % \end{figure}

-\enterm[box-whisker plots]{Box-whisker plots} are commonly used to
-visualize and compare the distribution of unimodal data. A box is
-drawn around the median that extends from the 1$^{\rm st}$ to the
-3$^{\rm rd}$ quartile. The whiskers mark the minimum and maximum value
-of the data set (\figref{displayunivariatedatafig} (3)).
+\entermde[box-whisker plots]{Box-Whisker-Plot}{Box-whisker plots}, or
+\entermde{Box-Plot}{box plot} are commonly used to visualize and
+compare the distribution of unimodal data. A box is drawn around the
+median that extends from the 1$^{\rm st}$ to the 3$^{\rm rd}$
+quartile. The whiskers mark the minimum and maximum value of the data
+set (\figref{displayunivariatedatafig} (3)).

 \begin{exercise}{univariatedata.m}{}
  Generate 40 normally distributed random numbers with a mean of 2 and
@@ -170,13 +172,14 @@ of the data set (\figref{displayunivariatedatafig} (3)).
 % \end{exercise}

 \section{Distributions}
-The distribution of values in a data set is estimated by histograms
-(\figref{displayunivariatedatafig} (4)).
+The \enterm{distribution} (\determ{Verteilung}) of values in a data
+set is estimated by histograms (\figref{displayunivariatedatafig}
+(4)).

 \subsection{Histograms}

-\enterm[histogram]{Histograms} count the frequency $n_i$ of
-$N=\sum_{i=1}^M n_i$ measurements in each of $M$ bins $i$
+\entermde[histogram]{Histogramm}{Histograms} count the frequency $n_i$
+of $N=\sum_{i=1}^M n_i$ measurements in each of $M$ bins $i$
 (\figref{diehistogramsfig} left).  The bins tile the data range
 usually into intervals of the same size. The width of the bins is
 called the bin width. The frequencies $n_i$ plotted against the
@@ -194,13 +197,14 @@ categories $i$ is the \enterm{histogram}, or the \enterm{frequency
 \end{figure}

 Histograms are often used to estimate the
-\enterm[probability!distribution]{probability distribution} of the
-data values.
+\enterm[probability!distribution]{probability distribution}
+(\determ[Wahrscheinlichkeits!-verteilung]{Wahrscheinlichkeitsverteilung}) of the data values.

 \subsection{Probabilities}
-In the frequentist interpretation of probability, the probability of
-an event (e.g. getting a six when rolling a die) is the relative
-occurrence of this event in the limit of a large number of trials.
+In the frequentist interpretation of probability, the
+\enterm{probability} (\determ{Wahrscheinlichkeit}) of an event
+(e.g. getting a six when rolling a die) is the relative occurrence of
+this event in the limit of a large number of trials.

 For a finite number of trials $N$ where the event $i$ occurred $n_i$
 times, the probability $P_i$ of this event is estimated by
@@ -212,15 +216,16 @@ the sum of the probabilities of all possible events is one:
 i.e. the probability of getting any event is one.


-\subsection{Probability distributions of categorial data}
+\subsection{Probability distributions of categorical data}

-For categorial data values (e.g. the faces of a die (as integer
-numbers or as colors)) a bin can be defined for each category $i$.
-The histogram is normalized by the total number of measurements to
-make it independent of the size of the data set
-(\figref{diehistogramsfig}). After this normalization the height of
-each histogram bar is an estimate of the probability $P_i$ of the
-category $i$, i.e. of getting a data value in the $i$-th bin.
+For \entermde[data!categorical]{Daten!kategorische}{categorical} data
+values (e.g. the faces of a die (as integer numbers or as colors)) a
+bin can be defined for each category $i$.  The histogram is normalized
+by the total number of measurements to make it independent of the size
+of the data set (\figref{diehistogramsfig}). After this normalization
+the height of each histogram bar is an estimate of the probability
+$P_i$ of the category $i$, i.e. of getting a data value in the $i$-th
+bin.

 \begin{exercise}{rollthedie.m}{}
  Write a function that simulates rolling a die $n$ times.
@@ -236,12 +241,14 @@ category $i$, i.e. of getting a data value in the $i$-th bin.

 \subsection{Probability densities functions}

-In cases where we deal with data sets of measurements of a real
-quantity (e.g. lengths of snakes, weights of elephants, times
-between succeeding spikes) there is no natural bin width for computing
-a histogram. In addition, the probability of measuring a data value that
-equals exactly a specific real number like, e.g., 0.123456789 is zero, because
-there are uncountable many real numbers.
+In cases where we deal with
+\entermde[data!continuous]{Daten!kontinuierliche}{continuous data},
+(measurements of real-valued quantities, e.g. lengths of snakes,
+weights of elephants, times between succeeding spikes) there is no
+natural bin width for computing a histogram. In addition, the
+probability of measuring a data value that equals exactly a specific
+real number like, e.g., 0.123456789 is zero, because there are
+uncountable many real numbers.

 We can only ask for the probability to get a measurement value in some
 range.  For example, we can ask for the probability $P(1.2<x<1.3)$ to
@@ -254,14 +261,14 @@ probability can also be expressed as $P(x_0<x<x_0 + \Delta x)$.
 In the limit to very small ranges $\Delta x$ the probability of
 getting a measurement between $x_0$ and $x_0+\Delta x$ scales down to
 zero with $\Delta x$:
-\[ P(x_0<x<x_0+\Delta x) \approx p(x_0) \cdot \Delta x \; . \]
-In here the quantity $p(x_00)$ is a so called
-\enterm[probability!density]{probability density} that is larger than
-zero and that describes the distribution of the data values. The
-probability density is not a unitless probability with values between
-0 and 1, but a number that takes on any positive real number and has
-as a unit the inverse of the unit of the data values --- hence the
-name ``density''.
+\[ P(x_0<x<x_0+\Delta x) \approx p(x_0) \cdot \Delta x \; . \] In here
+the quantity $p(x_00)$ is a so called
+\enterm[probability!density]{probability density}
+(\determ[Wahrscheinlichkeits!-dichte]{Wahrscheinlichkeitsdichte}) that is larger than zero and that
+describes the distribution of the data values. The probability density
+is not a unitless probability with values between 0 and 1, but a
+number that takes on any positive real number and has as a unit the
+inverse of the unit of the data values --- hence the name ``density''.

 \begin{figure}[t]
  \includegraphics[width=1\textwidth]{pdfprobabilities}
@@ -282,17 +289,18 @@ the probability density over the whole real axis must be one:
 \end{equation}

 The function $p(x)$, that assigns to every $x$ a probability density,
-is called \enterm[probability!density function]{probability density function},
-\enterm[pdf|see{probability density function}]{pdf}, or just
-\enterm[density|see{probability density function}]{density}
-(\determ{Wahrscheinlichkeitsdichtefunktion}). The well known
-\enterm{normal distribution} (\determ{Normalverteilung}) is an example of a
-probability density function
+is called \enterm[probability!density function]{probability density
+  function}, \enterm[pdf|see{probability density function}]{pdf}, or
+just \enterm[density|see{probability density function}]{density}
+(\determ[Wahrscheinlichkeits!-dichtefunktion]{Wahrscheinlichkeitsdichtefunktion},
+\determ[Wahrscheinlichkeits!-dichte]{Wahrscheinlichkeitsdichte}). The
+well known \entermde{Normalverteilung}{normal distribution} is an
+example of a probability density function
 \[ p_g(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
 --- the \enterm{Gaussian distribution}
 (\determ{Gau{\ss}sche-Glockenkurve}) with mean $\mu$ and standard
 deviation $\sigma$.
-The factor in front of the exponential function ensures the normalization to
+The factor in front of the exponential function ensures normalization to
 $\int p_g(x) \, dx = 1$, \eqnref{pdfnorm}.

 \begin{exercise}{gaussianpdf.m}{gaussianpdf.out}
@@ -322,13 +330,15 @@ values fall within each bin (\figref{pdfhistogramfig} left).

 To turn such histograms to estimates of probability densities they
 need to be normalized such that according to \eqnref{pdfnorm} their
-integral equals one. While histograms of categorial data are
+integral equals one. While histograms of categorical data are
 normalized such that their sum equals one, here we need to integrate
 over the histogram. The integral is the area (not the height) of the
 histogram bars. Each bar has the height $n_i$ and the width $\Delta
 x$.  The total area $A$ of the histogram is thus
 \[ A = \sum_{i=1}^N ( n_i \cdot \Delta x ) = \Delta x \sum_{i=1}^N n_i = N \, \Delta x \]
-and the normalized histogram has the heights
+and the
+\entermde[histogram!normalized]{Histogramm!normiertes}{normalized
+  histogram} has the heights
 \[ p(x_i) = \frac{n_i}{A} = \frac{n_i}{\Delta x \sum_{i=1}^N n_i} =
   \frac{n_i}{N \Delta x} \; .\]
 A histogram needs to be divided by both the sum of the frequencies
@@ -375,14 +385,14 @@ shape histogram depends on the exact position of its bins
    (here Gaussian kernels with standard deviation of $\sigma=0.2$).}
 \end{figure}

-To avoid this problem one can use so called \enterm{kernel densities}
-for estimating probability densities from data. Here every data point
-is replaced by a kernel (a function with integral one, like for
-example the Gaussian) that is moved exactly to the position
-indicated by the data value. Then all the kernels of all the data
-values are summed up, the sum is divided by the number of data values,
-and we get an estimate of the probability density
-(\figref{kerneldensityfig} right).
+To avoid this problem so called \entermde[kernel
+density]{Kerndichte}{kernel densities} can be used for estimating
+probability densities from data. Here every data point is replaced by
+a kernel (a function with integral one, like for example the Gaussian)
+that is moved exactly to the position indicated by the data
+value. Then all the kernels of all the data values are summed up, the
+sum is divided by the number of data values, and we get an estimate of
+the probability density (\figref{kerneldensityfig} right).

 As for the histogram, where we need to choose a bin width, we need to
 choose the width of the kernels appropriately.
@@ -457,7 +467,9 @@ bivariate or multivariate data sets where we have pairs or tuples of
 data values (e.g. size and weight of elephants) we want to analyze
 dependencies between the variables.

-The \enterm[correlation!correlation coefficient]{correlation coefficient}
+The
+\entermde[correlation!coefficient]{Korrelation!-skoeffizient}{correlation
+  coefficient}
 \begin{equation}
  \label{correlationcoefficient}
  r_{x,y} = \frac{Cov(x,y)}{\sigma_x \sigma_y} = \frac{\langle
@@ -467,8 +479,8 @@ The \enterm[correlation!correlation coefficient]{correlation coefficient}
 \end{equation}
 quantifies linear relationships between two variables
 \matlabfun{corr()}.  The correlation coefficient is the
-\enterm{covariance} normalized by the standard deviations of the
-single variables.  Perfectly correlated variables result in a
+\entermde{Kovarianz}{covariance} normalized by the standard deviations
+of the single variables.  Perfectly correlated variables result in a
 correlation coefficient of $+1$, anit-correlated or negatively
 correlated data in a correlation coefficient of $-1$ and un-correlated
 data in a correlation coefficient close to zero