From b76b2d35cce208d129f86594243831d1a9acbf41 Mon Sep 17 00:00:00 2001 From: Jan Benda Date: Sun, 13 Dec 2020 09:08:05 +0100 Subject: [PATCH] [bootstrap] generalized intro --- bootstrap/lecture/bootstrap-chapter.tex | 11 +++++- bootstrap/lecture/bootstrap.tex | 51 ++++++++++++++----------- 2 files changed, 37 insertions(+), 25 deletions(-) diff --git a/bootstrap/lecture/bootstrap-chapter.tex b/bootstrap/lecture/bootstrap-chapter.tex index 78d7d3c..9009d78 100644 --- a/bootstrap/lecture/bootstrap-chapter.tex +++ b/bootstrap/lecture/bootstrap-chapter.tex @@ -23,7 +23,14 @@ This chapter easily covers two lectures: \item 1. Bootstrapping with a proper introduction of of confidence intervals \item 2. Permutation test with a proper introduction of statistical tests (distribution of nullhypothesis, significance, power, etc.) \end{itemize} - -Add jacknife methods to bootstrapping +ToDo: +\begin{itemize} +\item Add jacknife methods to bootstrapping +\item Add discussion of confidence intervals to descriptive statistics chapter +\item Have a separate chapter on statistical tests before. What is the + essence of a statistical test (null hypothesis distribution), power + analysis, and a few examples of existing functions for statistical + tests. +\end{itemize} \end{document} diff --git a/bootstrap/lecture/bootstrap.tex b/bootstrap/lecture/bootstrap.tex index 8f72366..f71b268 100644 --- a/bootstrap/lecture/bootstrap.tex +++ b/bootstrap/lecture/bootstrap.tex @@ -5,20 +5,24 @@ \exercisechapter{Resampling methods} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\section{Bootstrapping} - -Bootstrapping methods are applied to create distributions of -statistical measures via resampling of a sample. Bootstrapping offers several -advantages: +\entermde{Resampling methoden}{Resampling methods} are applied to +generate distributions of statistical measures via resampling of +existing samples. Resampling offers several advantages: \begin{itemize} \item Fewer assumptions (e.g. a measured sample does not need to be normally distributed). \item Increased precision as compared to classical methods. %such as? -\item General applicability: the bootstrapping methods are very +\item General applicability: the resampling methods are very similar for different statistics and there is no need to specialize the method to specific statistic measures. \end{itemize} +Resampling methods can be used for both estimating the precision of +estimated statistics (e.g. standard error of the mean, confidence +intervals) and testing for significane. + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section{Bootstrapping} \begin{figure}[tp] \includegraphics[width=0.8\textwidth]{2012-10-29_16-26-05_771}\\[2ex] @@ -88,20 +92,20 @@ of the statistical population. We can use the bootstrap distribution to draw conclusion regarding the precision of our estimation (e.g. standard errors and confidence intervals). -Bootstrapping methods create bootstrapped samples from a SRS by +Bootstrapping methods generate bootstrapped samples from a SRS by resampling. The bootstrapped samples are used to estimate the sampling distribution of a statistical measure. The bootstrapped samples have -the same size as the original sample and are created by randomly +the same size as the original sample and are generated by randomly drawing with replacement. That is, each value of the original sample -can occur once, multiple time, or not at all in a bootstrapped +can occur once, multiple times, or not at all in a bootstrapped sample. This can be implemented by generating random indices into the data set using the \code{randi()} function. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsection{Bootstrap of the standard error} +\subsection{Bootstrap the standard error} -Bootstrapping can be nicely illustrated at the example of the +Bootstrapping can be nicely illustrated on the example of the \enterm{standard error} of the mean (\determ{Standardfehler}). The arithmetic mean is calculated for a simple random sample. The standard error of the mean is the standard deviation of the expected @@ -121,9 +125,9 @@ population. the standard error of the mean.} \end{figure} -Via bootstrapping we create a distribution of the mean values +Via bootstrapping we generate a distribution of mean values (\figref{bootstrapsemfig}) and the standard deviation of this -distribution is the standard error of the mean. +distribution is the standard error of the sample mean. \begin{exercise}{bootstrapsem.m}{bootstrapsem.out} Create the distribution of mean values from bootstrapped samples @@ -148,17 +152,18 @@ distribution is the standard error of the mean. Statistical tests ask for the probability of a measured value to originate from a null hypothesis. Is this probability smaller than the desired \entermde{Signifikanz}{significance level}, the -\entermde{Nullhypothese}{null hypothesis} may be rejected. +\entermde{Nullhypothese}{null hypothesis} can be rejected. Traditionally, such probabilities are taken from theoretical -distributions which are based on some assumptions about the data. For -example, the data should be normally distributed. Given some data one -has to find an appropriate test that matches the properties of the -data. An alternative approach is to calculate the probability density -of the null hypothesis directly from the data themselves. To do so, we -need to resample the data according to the null hypothesis from the -SRS. By such permutation operations we destroy the feature of interest -while conserving all other statistical properties of the data. +distributions which have been derived based on some assumptions about +the data. For example, the data should be normally distributed. Given +some data one has to find an appropriate test that matches the +properties of the data. An alternative approach is to calculate the +probability density of the null hypothesis directly from the data +themselves. To do so, we need to resample the data according to the +null hypothesis from the SRS. By such permutation operations we +destroy the feature of interest while conserving all other statistical +properties of the data. \subsection{Significance of a difference in the mean}