[bootstrap] generalized intro

This commit is contained in:
Jan Benda 2020-12-13 09:08:05 +01:00
parent 3dd0660b21
commit b76b2d35cc
2 changed files with 37 additions and 25 deletions

View File

@ -23,7 +23,14 @@ This chapter easily covers two lectures:
\item 1. Bootstrapping with a proper introduction of of confidence intervals \item 1. Bootstrapping with a proper introduction of of confidence intervals
\item 2. Permutation test with a proper introduction of statistical tests (distribution of nullhypothesis, significance, power, etc.) \item 2. Permutation test with a proper introduction of statistical tests (distribution of nullhypothesis, significance, power, etc.)
\end{itemize} \end{itemize}
ToDo:
Add jacknife methods to bootstrapping \begin{itemize}
\item Add jacknife methods to bootstrapping
\item Add discussion of confidence intervals to descriptive statistics chapter
\item Have a separate chapter on statistical tests before. What is the
essence of a statistical test (null hypothesis distribution), power
analysis, and a few examples of existing functions for statistical
tests.
\end{itemize}
\end{document} \end{document}

View File

@ -5,20 +5,24 @@
\exercisechapter{Resampling methods} \exercisechapter{Resampling methods}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \entermde{Resampling methoden}{Resampling methods} are applied to
\section{Bootstrapping} generate distributions of statistical measures via resampling of
existing samples. Resampling offers several advantages:
Bootstrapping methods are applied to create distributions of
statistical measures via resampling of a sample. Bootstrapping offers several
advantages:
\begin{itemize} \begin{itemize}
\item Fewer assumptions (e.g. a measured sample does not need to be \item Fewer assumptions (e.g. a measured sample does not need to be
normally distributed). normally distributed).
\item Increased precision as compared to classical methods. %such as? \item Increased precision as compared to classical methods. %such as?
\item General applicability: the bootstrapping methods are very \item General applicability: the resampling methods are very
similar for different statistics and there is no need to specialize similar for different statistics and there is no need to specialize
the method to specific statistic measures. the method to specific statistic measures.
\end{itemize} \end{itemize}
Resampling methods can be used for both estimating the precision of
estimated statistics (e.g. standard error of the mean, confidence
intervals) and testing for significane.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Bootstrapping}
\begin{figure}[tp] \begin{figure}[tp]
\includegraphics[width=0.8\textwidth]{2012-10-29_16-26-05_771}\\[2ex] \includegraphics[width=0.8\textwidth]{2012-10-29_16-26-05_771}\\[2ex]
@ -88,20 +92,20 @@ of the statistical population. We can use the bootstrap distribution
to draw conclusion regarding the precision of our estimation (e.g. to draw conclusion regarding the precision of our estimation (e.g.
standard errors and confidence intervals). standard errors and confidence intervals).
Bootstrapping methods create bootstrapped samples from a SRS by Bootstrapping methods generate bootstrapped samples from a SRS by
resampling. The bootstrapped samples are used to estimate the sampling resampling. The bootstrapped samples are used to estimate the sampling
distribution of a statistical measure. The bootstrapped samples have distribution of a statistical measure. The bootstrapped samples have
the same size as the original sample and are created by randomly the same size as the original sample and are generated by randomly
drawing with replacement. That is, each value of the original sample drawing with replacement. That is, each value of the original sample
can occur once, multiple time, or not at all in a bootstrapped can occur once, multiple times, or not at all in a bootstrapped
sample. This can be implemented by generating random indices into the sample. This can be implemented by generating random indices into the
data set using the \code{randi()} function. data set using the \code{randi()} function.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Bootstrap of the standard error} \subsection{Bootstrap the standard error}
Bootstrapping can be nicely illustrated at the example of the Bootstrapping can be nicely illustrated on the example of the
\enterm{standard error} of the mean (\determ{Standardfehler}). The \enterm{standard error} of the mean (\determ{Standardfehler}). The
arithmetic mean is calculated for a simple random sample. The standard arithmetic mean is calculated for a simple random sample. The standard
error of the mean is the standard deviation of the expected error of the mean is the standard deviation of the expected
@ -121,9 +125,9 @@ population.
the standard error of the mean.} the standard error of the mean.}
\end{figure} \end{figure}
Via bootstrapping we create a distribution of the mean values Via bootstrapping we generate a distribution of mean values
(\figref{bootstrapsemfig}) and the standard deviation of this (\figref{bootstrapsemfig}) and the standard deviation of this
distribution is the standard error of the mean. distribution is the standard error of the sample mean.
\begin{exercise}{bootstrapsem.m}{bootstrapsem.out} \begin{exercise}{bootstrapsem.m}{bootstrapsem.out}
Create the distribution of mean values from bootstrapped samples Create the distribution of mean values from bootstrapped samples
@ -148,17 +152,18 @@ distribution is the standard error of the mean.
Statistical tests ask for the probability of a measured value to Statistical tests ask for the probability of a measured value to
originate from a null hypothesis. Is this probability smaller than the originate from a null hypothesis. Is this probability smaller than the
desired \entermde{Signifikanz}{significance level}, the desired \entermde{Signifikanz}{significance level}, the
\entermde{Nullhypothese}{null hypothesis} may be rejected. \entermde{Nullhypothese}{null hypothesis} can be rejected.
Traditionally, such probabilities are taken from theoretical Traditionally, such probabilities are taken from theoretical
distributions which are based on some assumptions about the data. For distributions which have been derived based on some assumptions about
example, the data should be normally distributed. Given some data one the data. For example, the data should be normally distributed. Given
has to find an appropriate test that matches the properties of the some data one has to find an appropriate test that matches the
data. An alternative approach is to calculate the probability density properties of the data. An alternative approach is to calculate the
of the null hypothesis directly from the data themselves. To do so, we probability density of the null hypothesis directly from the data
need to resample the data according to the null hypothesis from the themselves. To do so, we need to resample the data according to the
SRS. By such permutation operations we destroy the feature of interest null hypothesis from the SRS. By such permutation operations we
while conserving all other statistical properties of the data. destroy the feature of interest while conserving all other statistical
properties of the data.
\subsection{Significance of a difference in the mean} \subsection{Significance of a difference in the mean}