%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Resampling methods} \label{bootstrapchapter} \exercisechapter{Resampling methods} \entermde{Resampling methoden}{Resampling methods} are applied to generate distributions of statistical measures via resampling of existing samples. Resampling offers several advantages: \begin{itemize} \item Fewer assumptions (e.g. a measured sample does not need to be normally distributed). \item Increased precision as compared to classical methods. %such as? \item General applicability: the resampling methods are very similar for different statistics and there is no need to specialize the method to specific statistic measures. \end{itemize} Resampling methods can be used for both estimating the precision of estimated statistics (e.g. standard error of the mean, confidence intervals) and testing for significane. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Bootstrapping} \begin{figure}[tp] \includegraphics[width=0.8\textwidth]{2012-10-29_16-26-05_771}\\[2ex] \includegraphics[width=0.8\textwidth]{2012-10-29_16-41-39_523}\\[2ex] \includegraphics[width=0.8\textwidth]{2012-10-29_16-29-35_312} \titlecaption{\label{statisticalpopulationfig} Why can't we measure properties of the full population but only draw samples?}{} \end{figure} Reminder: in statistics we are interested in properties of a \enterm{statistical population} (\determ{Grundgesamtheit}), e.g. the average length of all pickles (\figref{statisticalpopulationfig}). But we cannot measure the lengths of all pickles in the population. Rather, we draw samples (\enterm{simple random sample} \enterm[SRS|see{simple random sample}]{SRS}, \determ{Stichprobe}). We then estimate a statistical measure of interest (e.g. the average length of the pickles) within this sample and hope that it is a good approximation of the unknown and immeasurable true average length of the population (\entermde{Populationsparameter}{population parameter}). We apply statistical methods to find out how precise this approximation is. If we could draw a large number of simple random samples we could calculate the statistical measure of interest for each sample and estimate its probability distribution using a histogram. This distribution is called the \enterm{sampling distribution} (\determ{Stichprobenverteilung}, \subfigref{bootstrapsamplingdistributionfig}{a}). \begin{figure}[tp] \includegraphics[height=0.2\textheight]{srs1}\\[2ex] \includegraphics[height=0.2\textheight]{srs2}\\[2ex] \includegraphics[height=0.2\textheight]{srs3} \titlecaption{\label{bootstrapsamplingdistributionfig}Bootstrapping the sampling distribution.}{(a) Simple random samples (SRS) are drawn from a statistical population with an unknown population parameter (e.g. the average $\mu$). The statistical measure (the estimation of $\bar x$) is calculated for each sample. The measured values originate from the sampling distribution. Often only a single random sample is drawn! (b) By applying assumption and theories one can guess the sampling distribution without actually measuring it. (c) Alternatively, one can generate many bootstrap-samples from the same SRS (resampling) and use these to estimate the sampling distribution empirically. From Hesterberg et al. 2003, Bootstrap Methods and Permutation Tests} \end{figure} Commonly, there will be only a single SRS. In such cases we make use of certain assumptions (e.g. we assume a normal distribution) that allow us to infer the precision of our estimation based on the SRS. For example the formula $\sigma/\sqrt{n}$ gives the standard error of the mean which is the standard deviation of the sampling distribution of average values around the true mean of the population (\subfigref{bootstrapsamplingdistributionfig}{b}). Alternatively, we can use \enterm{bootstrapping} (\determ[Bootstrap!Verfahren]{Bootstrapverfahren}) to generate new samples from one set of measurements (\entermde{Resampling}{resampling}). From these bootstrapped samples we compute the desired statistical measure and estimate their distribution (\entermde{Bootstrap!Verteilung}{bootstrap distribution}, \subfigref{bootstrapsamplingdistributionfig}{c}). Interestingly, this distribution is very similar to the sampling distribution regarding its width. The only difference is that the bootstrapped values are distributed around the measure of the original sample and not the one of the statistical population. We can use the bootstrap distribution to draw conclusion regarding the precision of our estimation (e.g. standard errors and confidence intervals). Bootstrapping methods generate bootstrapped samples from a SRS by resampling. The bootstrapped samples are used to estimate the sampling distribution of a statistical measure. The bootstrapped samples have the same size as the original sample and are generated by randomly drawing with replacement. That is, each value of the original sample can occur once, multiple times, or not at all in a bootstrapped sample. This can be implemented by generating random indices into the data set using the \code{randi()} function. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Bootstrap the standard error} Bootstrapping can be nicely illustrated on the example of the \enterm{standard error} of the mean (\determ{Standardfehler}). The arithmetic mean is calculated for a simple random sample. The standard error of the mean is the standard deviation of the expected distribution of mean values around the mean of the statistical population. \begin{figure}[tp] \includegraphics[width=1\textwidth]{bootstrapsem} \titlecaption{\label{bootstrapsemfig}Bootstrapping the standard error of the mean.}{The --- usually unknown --- sampling distribution of the mean is distributed around the true mean of the statistical population ($\mu=0$, red). The bootstrap distribution of the means computed from many bootstrapped samples has the same shape as the sampling distribution but is centered around the mean of the SRS used for resampling. The standard deviation of the bootstrap distribution (blue) is an estimator for the standard error of the mean.} \end{figure} Via bootstrapping we generate a distribution of mean values (\figref{bootstrapsemfig}) and the standard deviation of this distribution is the standard error of the sample mean. \begin{exercise}{bootstrapsem.m}{bootstrapsem.out} Create the distribution of mean values from bootstrapped samples resampled from a single SRS. Use this distribution to estimate the standard error of the mean. \begin{enumerate} \item Draw 1000 normally distributed random number and calculate the mean, the standard deviation, and the standard error ($\sigma/\sqrt{n}$). \item Resample the data 1000 times (randomly draw and replace) and calculate the mean of each bootstrapped sample. \item Plot a histogram of the respective distribution and calculate its mean and standard deviation. Compare with the original values based on the statistical population. \end{enumerate} \end{exercise} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Permutation tests} Statistical tests ask for the probability of a measured value to originate from a null hypothesis. Is this probability smaller than the desired \entermde{Signifikanz}{significance level}, the \entermde{Nullhypothese}{null hypothesis} can be rejected. Traditionally, such probabilities are taken from theoretical distributions which have been derived based on some assumptions about the data. For example, the data should be normally distributed. Given some data one has to find an appropriate test that matches the properties of the data. An alternative approach is to calculate the probability density of the null hypothesis directly from the data themselves. To do so, we need to resample the data according to the null hypothesis from the SRS. By such permutation operations we destroy the feature of interest while conserving all other statistical properties of the data. \subsection{Significance of a difference in the mean} Often we would like to know whether two data sets differ in their mean. Whether the ears of foxes are larger in southern Europe compared to the ones from Scandinavia, whether a drug decreases blood pressure in humans, whether a sensory stimulus increases the firing rate of a neuron, etc. The \entermde{Nullhypothese}{null hypothesis} is that they do not differ in their means, i.e. that both data sets come from the same distribution. But even if the two data sets come from the same distribution, their sample means may nevertheless differ by chance. We need to figure out how these differences of the means are distributed. Only if the measured difference between the means is significantly larger than the ones obtained by chance we can reject the null hypothesis and consider the two data sets to differ significantly in their means. We can easily estimate the distribution of the null hypothesis by putting the data of both data sets in one big bag. By merging the two data sets we assume that all the data values come from the same distribution. We then randomly separate the data values into two new data sets. These random data sets contain data from both original data sets and thus come from the same distribution. From these random data sets we compute the difference of their sample means. This procedure is repeated many, say one thousand, times and each time we get a value for a difference of means. The distribution of these values is the distribution of the null hypothesis. It is the distribution of differences of mean values that we get by chance although the two data sets come from the same distribution. For a one-sided test that checks whether the measured difference of means is significantly larger than zero at a significance level of 5\,\% we compute the value of the 95\,\% percentile of the null distribution. If the measured value is larger, we can reject the null hypothesis and consider the two data sets to differ significantly in their means. By using the original data to estimate the null hypothesis, we make no assumption about the properties of the data. We do not need to worry about the data being normally distributed. We do not need to memorize which test to use in which situation. And we better understand what we are testing, because we design the test ourselves. Nowadays, computer are powerful enough to iterate even ten thousand times over the data to compute the distribution of the null hypothesis --- with only a few lines of code. This is why \entermde{Permutationstest}{permutaion test} are getting quite popular. \begin{figure}[tp] \includegraphics[width=1\textwidth]{permuteaverage} \titlecaption{\label{permuteaverage}Permutation test for differences in means.}{We want to test whether two datasets $\left\{x_i\right\}$ (red) and $\left\{y_i\right\}$ (blue) come from different distributions by assessing the significance of the difference in their sample means. The data sets were generated with a difference in their population means of $d=0.7$. For generating the distribution of the null hypothesis, i.e. the distribution of differences in the means if the two data sets come from the same distribution, we randomly select the same number of samples from both data sets (top right). This is repeated many times and results in the desired distribution of differences of means (bottom). The measured difference is clearly beyond the 95\,\% percentile of this distribution and thus indicates a significant difference between the distributions of the two original data sets.} \end{figure} \begin{exercise}{meandiffsignificance.m}{meandiffsignificance.out} Estimate the statistical significance of a difference in the mean of two data sets. \vspace{-1ex} \begin{enumerate} \item Generate two independent data sets, $\left\{x_i\right\}$ and $\left\{y_i\right\}$, of $n=200$ samples each, by drawing random numbers from a normal distribution. Add 0.2 to all the $y_i$ samples to ensure the population means to differ by 0.2. \item Calculate the difference between the sample means of the two data sets. \item Estimate the distribution of the null hypothesis of no difference of the means by generating new data sets with the same number of samples randomly selected from both data sets. For this lump the two data sets together into a single vector. Then permute the order of the elements in this vector using the function \varcode{randperm()}, split it into two data sets and calculate the difference of their means. Repeat this 1000 times. \item Read out the 95\,\% percentile from the resulting distribution of the differences in the mean, the null hypothesis using the \varcode{quantile()} function, and compare it with the difference of means measured from the original data sets. \end{enumerate} \end{exercise} \subsection{Significance of correlations} Another nice example for the application of a \entermde{Permutationstest}{permutaion test} is testing for significant \entermde[correlation]{Korrelation}{correlations} (figure\,\ref{permutecorrelationfig}). Given are measured pairs of data points $(x_i, y_i)$. By calculating the \entermde[correlation!correlation coefficient]{Korrelationskoeffizient}{correlation coefficient} we can quantify how strongly $y$ depends on $x$. The correlation coefficient alone, however, does not tell whether the correlation is significantly different from a non-zero correlation that we might get although there is no true correlation in the data. The \entermde{Nullhypothese}{null hypothesis} for such a situation is that $y$ does not depend on $x$. In order to perform a permutation test, we need to destroy the correlation between the data pairs by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the $x_i$ and $y_i$ values in a random fashion. Generating many sets of random pairs and computing the corresponding correlation coefficients yields a distribution of correlation coefficients that result randomly from truly uncorrelated data. By comparing the actually measured correlation coefficient with this distribution we can directly assess the significance of the correlation. \begin{figure}[tp] \includegraphics[width=1\textwidth]{permutecorrelation} \titlecaption{\label{permutecorrelationfig}Permutation test for correlations.}{Let the correlation coefficient of a dataset with 200 samples be $\rho=0.21$ (top left). By shuffling the data pairs we destroy any true correlation (top right). The resulting distribution of the null hypothesis (bottm, yellow), optained from the correlation coefficients of permuted and therefore uncorrelated datasets is centered around zero. The measured correlation coefficient is larger than the 95\,\% percentile of the null hypothesis. The null hypothesis may thus be rejected and the measured correlation is considered statistically significant.} \end{figure} \begin{exercise}{correlationsignificance.m}{correlationsignificance.out} Estimate the statistical significance of a correlation coefficient. \begin{enumerate} \item Generate pairs of $(x_i, y_i)$ values. Randomly choose $x$-values and calculate the respective $y$-values according to $y_i =0.2 \cdot x_i + u_i$ where $u_i$ is a random number drawn from a normal distribution. \item Calculate the correlation coefficient. \item Estimate the distribution of the null hypothesis by generating uncorrelated pairs. For this permute $x$- and $y$-values \matlabfun{randperm()} 1000 times and calculate for each permutation the correlation coefficient. \item Read out the 95\,\% percentile from the resulting distribution of the null hypothesis using the \varcode{quantile()} function and compare it with the correlation coefficient computed from the original data. \end{enumerate} \end{exercise} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \printsolutions