%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{\tr{Bootstrap methods}{Bootstrap Methoden}} \label{bootstrapchapter} \selectlanguage{english} Bootstrapping methods are applied to create distributions of statistical measures via resampling of a sample. Bootstrapping offers several advantages: \begin{itemize} \item Fewer assumptions (e.g. a measured sample does not need to be normally distributed). \item Increased precision as compared to classical methods. %such as? \item General applicability: The bootstrapping methods are very similar for different statistics and there is no need to specialize the method depending on the investigated statistic measure. \end{itemize} \begin{figure}[tp] \includegraphics[width=0.8\textwidth]{2012-10-29_16-26-05_771}\\[2ex] \includegraphics[width=0.8\textwidth]{2012-10-29_16-41-39_523}\\[2ex] \includegraphics[width=0.8\textwidth]{2012-10-29_16-29-35_312} \titlecaption{\label{statisticalpopulationfig} Why can't we measure the statistical population but only draw samples?}{} \end{figure} Reminder: in statistics we are interested in properties of the ``statistical population'' (in German: \determ{Grundgesamtheit}), e.g. the average length of all pickles (\figref{statisticalpopulationfig}). But we cannot measure the lengths of all pickles in the statistical population. Rather, we draw samples (simple random sample \enterm[SRS|see{simple random sample}]{SRS}, in German: \determ{Stichprobe}). We then estimate a statistical measure of interest (e.g. the average length of the pickles) within this sample and hope that it is a good approximation of the unknown and immeasurable real average length of the statistical population (in German aka \determ{Populationsparameter}). We apply statistical methods to find out how precise this approximation is. If we could draw a large number of \textit{simple random samples} we could calculate the statistical measure of interest for each sample and estimate the probability distribution using a histogram. This distribution is called the \enterm{sampling distribution} (German: \determ{Stichprobenverteilung}, \subfigref{bootstrapsamplingdistributionfig}{a}). \begin{figure}[tp] \includegraphics[height=0.2\textheight]{srs1}\\[2ex] \includegraphics[height=0.2\textheight]{srs2}\\[2ex] \includegraphics[height=0.2\textheight]{srs3} \titlecaption{\label{bootstrapsamplingdistributionfig}Bootstrapping the sampling distribution.}{(a) Simple random samples (SRS) are drawn from a statistical population with an unknown population parameter (e.g. the average $\mu$). The statistical measure (the estimation of $\bar x$) is calculated for each sample. The measured values originate from the sampling distribution. Often only a single random sample is drawn! (b) By applying assumption and theories one can guess the sampling distribution without actually measuring it. (c) Alternatively, one can generate many bootstrap-samples from the same SRS (resampling) and use these to estimate the sampling distribution empirically. From Hesterberg et al. 2003, Bootstrap Methods and Permutation Tests} \end{figure} Commonly, there will be only a single SRS. In such cases we make use of certain assumptions (e.g. we assume a normal distribution) that allow us to infer the precision of our estimation based on the SRS. For example the formula $\sigma/\sqrt{n}$ gives the standard error of the mean which is the standard deviation of the distribution of average values around the mean of the statistical population estimated in many SRS (\subfigref{bootstrapsamplingdistributionfig}{b}). %explicitely state that this is based on the assumption of a normal distribution? Alternatively, we can use ``bootstrapping'' to generate new samples from the one set of measurements (resampling). From these bootstrapped samples we calculate the desired statistical measure and estimate their distribution (\enterm{bootstrap distribution}, \subfigref{bootstrapsamplingdistributionfig}{c}). Interestingly, this distribution is very similar to the sampling distribution regarding its width. The only difference is that the bootstrapped values are distributed around the measure of the original sample and not the one of the statistical population. We can use the bootstrap distribution to draw conclusion regarding the precision of our estimation (e.g. standard errors and confidence intervals). Bootstrapping methods create bootstrapped samples from a SRS by resampling. The bootstrapped samples are used to estimate the sampling distribution of a statistical measure. The bootstrapped samples have the same size as the original sample and are created by randomly drawing with replacement, that is, each value of the original sample can occur once, multiple time, or not at all in a bootstrapped sample. \section{Bootstrap of the standard error} Bootstrapping can be nicely illustrated at the example of the standard error of the mean. The arithmetic mean is calculated for a simple random sample. The standard error of the mean is the standard deviation of the expected distribution of mean values around the mean of the statistical population. \begin{figure}[tp] \includegraphics[width=1\textwidth]{bootstrapsem} \titlecaption{\label{bootstrapsemfig}Bootstrapping the standard error of the mean.}{The --- usually unknown --- sampling distribution of the mean is distributed around the true mean of the statistical population ($\mu=0$, red). The bootstrap distribution of the means calculated for many bootstrapped samples has the same shape as the sampling distribution but is centered around the mean of the SRS used for resampling. The standard deviation of the bootstrap distribution (blue) is thus an estimator for the standard error of the mean.} \end{figure} Via bootstrapping we create a distribution of the mean values (\figref{bootstrapsemfig}) and the standard deviation of this distribution is the standard error of the mean. \pagebreak[4] \begin{exercise}{bootstrapsem.m}{bootstrapsem.out} Create the distribution of mean values from bootstrapped samples resampled from a single SRS. Use this distribution to estimate the standard error of the mean. \begin{enumerate} \item Draw 1000 normally distributed random number and calculate the mean, the standard deviation, and the standard error ($\sigma/\sqrt{n}$). \item Resample the data 1000 times (randomly draw and replace) and calculate the mean of each bootstrapped sample. \item Plot a histogram of the respective distribution and calculate its mean and standard deviation. Compare with the original values based on the statistical population. \end{enumerate} \end{exercise} \section{Permutation tests} Statistical tests ask for the probability that a measured value originates from the null hypothesis. Is this probability smaller than the desired significance level, the null hypothesis may be rejected. Traditionally, such probabilities are taken from theoretical distributions which are based on assumptions about the data. Thus the applied statistical test has to be appropriate for the type of data. An alternative approach is to calculate the probability density of the null hypothesis directly from the data itself. To do this, we need to resample the data according to the null hypothesis from the SRS. By such permutation operations we destroy the feature of interest while we conserve all other features of the data. \begin{figure}[tp] \includegraphics[width=1\textwidth]{permutecorrelation} \titlecaption{\label{permutecorrelationfig}Permutation test for correlations.}{Let the correlation coefficient of a dataset with 200 samples be $\rho=0.21$. The distribution of the null hypothesis, yielded from the correlation coefficients of permuted and uncorrelated datasets is centered around zero (yellow). The measured correlation coefficient is larger than the 95\,\% percentile of the null hypothesis. The null hypothesis may thus be rejected and the measured correlation is statistically significant.} \end{figure} A good example for the application of a permutaion test is the statistical assessment of correlations. Given are measured pairs of data points $(x_i, y_i)$. By calculating the correlation coefficient we can quantify how strongly $y$ depends on $x$. The correlation coefficient alone, however, does not tell whether it is statistically significantly different from a random correlation. The null hypothesis for such a situation would be that $y$ does not depend on $x$. In order to perform a permutation test, we now destroy the correlation by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the $x_i$ and $y_i$ values in a random fashion. By creating many sets of random pairs and calculating the resulting correlation coefficients, we yield a distribution of correlation coefficients that are a result of randomness. From this distribution we can directly measure the statistical significance (figure\,\ref{permutecorrelationfig}). \begin{exercise}{correlationsignificance.m}{correlationsignificance.out} Estimate the statistical significance of a correlation coefficient. \begin{enumerate} \item Create pairs of $(x_i, y_i)$ values. Randomly choose $x$-values and calculate the respective $y$-values according to $y_i =0.2 \cdot x_i + u_i$ where $u_i$ is a random number drawn from a normal distribution. \item Calculate the correlation coefficient. \item Generate the distribution according to the null hypothesis by generating uncorrelated pairs. For this permute $x$- and $y$-values \matlabfun{randperm()} 1000 times and calculate for each permutation the correlation coefficient. \item Read out the 95\,\% percentile from the resulting null hypothesis distribution and compare it with the correlation coefficient calculated for the original data. \end{enumerate} \end{exercise} \selectlanguage{english}