[bootstrap] fixed english text

2019-11-26 14:41:51 +01:00 · 2019-11-26 14:41:51 +01:00 · 155f6d7e54
commit 155f6d7e54
parent 025d4eb640
2 changed files with 49 additions and 55 deletions
--- a/bootstrap/lecture/bootstrap.tex
+++ b/bootstrap/lecture/bootstrap.tex
@ -3,8 +3,6 @@
 \chapter{\tr{Bootstrap methods}{Bootstrap Methoden}}
 \label{bootstrapchapter}

-\selectlanguage{english}
-
 Bootstrapping methods are applied to create distributions of
 statistical measures via resampling of a sample. Bootstrapping offers several
 advantages:
@ -12,9 +10,9 @@ advantages:
 \item Fewer assumptions (e.g. a measured sample does not need to be
  normally distributed).
 \item Increased precision as compared to classical methods. %such as?
-\item General applicability: The bootstrapping methods are very
+\item General applicability: the bootstrapping methods are very
  similar for different statistics and there is no need to specialize
-  the method depending on the investigated statistic measure.
+  the method to specific statistic measures.
 \end{itemize}

 \begin{figure}[tp]
@ -22,27 +20,26 @@ advantages:
  \includegraphics[width=0.8\textwidth]{2012-10-29_16-41-39_523}\\[2ex]
  \includegraphics[width=0.8\textwidth]{2012-10-29_16-29-35_312}
  \titlecaption{\label{statisticalpopulationfig} Why can't we measure
-    the statistical population but only draw samples?}{}
+    properties of the full population but only draw samples?}{}
 \end{figure}

-Reminder: in statistics we are interested in properties of the
-``statistical population'' (in German: \determ{Grundgesamtheit}), e.g. the
+Reminder: in statistics we are interested in properties of a
+\enterm{statistical population} (\determ{Grundgesamtheit}), e.g. the
 average length of all pickles (\figref{statisticalpopulationfig}). But
-we cannot measure the lengths of all pickles in the statistical
-population. Rather, we draw samples (simple random sample
-\enterm[SRS|see{simple random sample}]{SRS}, in German:
-\determ{Stichprobe}). We then estimate a statistical measure of interest
-(e.g. the average length of the pickles) within this sample and
-hope that it is a good approximation of the unknown and immeasurable
-real average length of the statistical population (in German aka
-\determ{Populationsparameter}). We apply statistical methods to find
-out how precise this approximation is.
-
-If we could draw a large number of \textit{simple random samples} we could
-calculate the statistical measure of interest for each sample and
-estimate the probability distribution using a histogram. This
-distribution is called the \enterm{sampling distribution} (German:
-\determ{Stichprobenverteilung},
+we cannot measure the lengths of all pickles in the
+population. Rather, we draw samples (\enterm{simple random sample}
+\enterm[SRS|see{simple random sample}]{SRS}, \determ{Stichprobe}). We
+then estimate a statistical measure of interest (e.g. the average
+length of the pickles) within this sample and hope that it is a good
+approximation of the unknown and immeasurable true average length of
+the population (\determ{Populationsparameter}). We apply statistical
+methods to find out how precise this approximation is.
+
+If we could draw a large number of \enterm{simple random samples} we
+could calculate the statistical measure of interest for each sample
+and estimate its probability distribution using a histogram. This
+distribution is called the \enterm{sampling distribution}
+(\determ{Stichprobenverteilung},
 \subfigref{bootstrapsamplingdistributionfig}{a}).

 \begin{figure}[tp]
@ -67,16 +64,14 @@ Commonly, there will be only a single SRS. In such cases we make use
 of certain assumptions (e.g. we assume a normal distribution) that
 allow us to infer the precision of our estimation based on the
 SRS. For example the formula $\sigma/\sqrt{n}$ gives the standard
-error of the mean which is the standard deviation of the distribution
-of average values around the mean of the statistical population
-estimated in many SRS
+error of the mean which is the standard deviation of the sampling
+distribution of average values around the true mean of the population
 (\subfigref{bootstrapsamplingdistributionfig}{b}).
-%explicitely state that this is based on the assumption of a normal distribution?

 Alternatively, we can use ``bootstrapping'' to generate new samples
-from the one set of measurements (resampling). From these bootstrapped
-samples we calculate the desired statistical measure and estimate
-their distribution (\enterm{bootstrap distribution},
+from one set of measurements (resampling). From these bootstrapped
+samples we compute the desired statistical measure and estimate their
+distribution (\enterm{bootstrap distribution},
 \subfigref{bootstrapsamplingdistributionfig}{c}). Interestingly, this
 distribution is very similar to the sampling distribution regarding
 its width. The only difference is that the bootstrapped values are
@ -89,7 +84,7 @@ Bootstrapping methods create bootstrapped samples from a SRS by
 resampling. The bootstrapped samples are used to estimate the sampling
 distribution of a statistical measure. The bootstrapped samples have
 the same size as the original sample and are created by randomly drawing with
-replacement, that is, each value of the original sample can occur
+replacement. That is, each value of the original sample can occur
 once, multiple time, or not at all in a bootstrapped sample.


@ -107,10 +102,10 @@ of the statistical population.
    error of the mean.}{The --- usually unknown --- sampling
    distribution of the mean is distributed around the true mean of
    the statistical population ($\mu=0$, red). The bootstrap
-    distribution of the means calculated for many bootstrapped samples
+    distribution of the means computed from many bootstrapped samples
    has the same shape as the sampling distribution but is centered
    around the mean of the SRS used for resampling. The standard
-    deviation of the bootstrap distribution (blue) is thus an estimator for
+    deviation of the bootstrap distribution (blue) is an estimator for
    the standard error of the mean.}
 \end{figure}

@ -137,8 +132,8 @@ distribution is the standard error of the mean.


 \section{Permutation tests}
-Statistical tests ask for the probability that a measured value
-originates from the null hypothesis. Is this probability smaller than
+Statistical tests ask for the probability of a measured value
+to originate from a null hypothesis. Is this probability smaller than
 the desired significance level, the null hypothesis may be rejected.

 Traditionally, such probabilities are taken from theoretical
@ -148,36 +143,37 @@ data. An alternative approach is to calculate the probability density
 of the null hypothesis directly from the data itself. To do this, we
 need to resample the data according to the null hypothesis from the
 SRS. By such permutation operations we destroy the feature of interest
-while we conserve all other features of the data.
+while we conserve all other statistical properties of the data.

 \begin{figure}[tp]
  \includegraphics[width=1\textwidth]{permutecorrelation}
  \titlecaption{\label{permutecorrelationfig}Permutation test for
    correlations.}{Let the correlation coefficient of a dataset with
    200 samples be $\rho=0.21$. The distribution of the null
-    hypothesis, yielded from the correlation coefficients of
-    permuted and uncorrelated datasets is centered around zero
-    (yellow). The measured correlation coefficient is larger than the
+    hypothesis (yellow), optained from the correlation coefficients of
+    permuted and therefore uncorrelated datasets is centered around
+    zero. The measured correlation coefficient is larger than the
    95\,\% percentile of the null hypothesis. The null hypothesis may
-    thus be rejected and the measured correlation is statistically
-    significant.}
+    thus be rejected and the measured correlation is considered
+    statistically significant.}
 \end{figure}

 A good example for the application of a permutaion test is the
 statistical assessment of correlations. Given are measured pairs of
 data points $(x_i, y_i)$. By calculating the correlation coefficient
 we can quantify how strongly $y$ depends on $x$. The correlation
-coefficient alone, however, does not tell whether it is statistically
+coefficient alone, however, does not tell whether the correlation is
 significantly different from a random correlation. The null hypothesis
 for such a situation would be that $y$ does not depend on $x$. In
-order to perform a permutation test, we now destroy the correlation by
-permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the $x_i$ and
-$y_i$ values in a random fashion. By creating many sets of random
-pairs and calculating the resulting correlation coefficients, we yield
-a distribution of correlation coefficients that are a result of
-randomness. From this distribution we can directly measure the
-statistical significance (figure\,\ref{permutecorrelationfig}).
-
+order to perform a permutation test, we need to destroy the
+correlation by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the
+$x_i$ and $y_i$ values in a random fashion. Generating many sets of
+random pairs and computing the resulting correlation coefficients,
+yields a distribution of correlation coefficients that result
+randomnly from uncorrelated data. By comparing the actually measured
+correlation coefficient with this distribution we can directly assess
+the significance of the correlation
+(figure\,\ref{permutecorrelationfig}).

 \begin{exercise}{correlationsignificance.m}{correlationsignificance.out}
 Estimate the statistical significance of a correlation coefficient.
@ -190,10 +186,8 @@ Estimate the statistical significance of a correlation coefficient.
  generating uncorrelated pairs. For this permute $x$- and $y$-values
  \matlabfun{randperm()} 1000 times and calculate for each
  permutation the correlation coefficient.
-\item Read out the 95\,\% percentile from the resulting null
-  hypothesis distribution and compare it with the correlation
-  coefficient calculated for the original data.
+\item Read out the 95\,\% percentile from the resulting distribution
+  of the null hypothesis and compare it with the correlation
+  coefficient computed from the original data.
 \end{enumerate}
 \end{exercise}
-
-\selectlanguage{english}
--- a/statistics/exercises/exercises01.tex
+++ b/statistics/exercises/exercises01.tex
@ -15,7 +15,7 @@
 \else
 \newcommand{\stitle}{}
 \fi
-\header{{\bfseries\large Exercise 7\stitle}}{{\bfseries\large Statistics}}{{\bfseries\large December 2nd, 2019}}
+\header{{\bfseries\large Exercise 8\stitle}}{{\bfseries\large Statistics}}{{\bfseries\large December 2nd, 2019}}
 \firstpagefooter{Prof. Dr. Jan Benda}{Phone: 29 74573}{Email:
 jan.benda@uni-tuebingen.de}
 \runningfooter{}{\thepage}{}