[bootstrap] generalized intro

This commit is contained in:
Jan Benda 2020-12-13 09:08:05 +01:00
parent 3dd0660b21
commit b76b2d35cc
2 changed files with 37 additions and 25 deletions

View File

@ -23,7 +23,14 @@ This chapter easily covers two lectures:
\item 1. Bootstrapping with a proper introduction of of confidence intervals
\item 2. Permutation test with a proper introduction of statistical tests (distribution of nullhypothesis, significance, power, etc.)
\end{itemize}
Add jacknife methods to bootstrapping
ToDo:
\begin{itemize}
\item Add jacknife methods to bootstrapping
\item Add discussion of confidence intervals to descriptive statistics chapter
\item Have a separate chapter on statistical tests before. What is the
essence of a statistical test (null hypothesis distribution), power
analysis, and a few examples of existing functions for statistical
tests.
\end{itemize}
\end{document}

View File

@ -5,20 +5,24 @@
\exercisechapter{Resampling methods}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Bootstrapping}
Bootstrapping methods are applied to create distributions of
statistical measures via resampling of a sample. Bootstrapping offers several
advantages:
\entermde{Resampling methoden}{Resampling methods} are applied to
generate distributions of statistical measures via resampling of
existing samples. Resampling offers several advantages:
\begin{itemize}
\item Fewer assumptions (e.g. a measured sample does not need to be
normally distributed).
\item Increased precision as compared to classical methods. %such as?
\item General applicability: the bootstrapping methods are very
\item General applicability: the resampling methods are very
similar for different statistics and there is no need to specialize
the method to specific statistic measures.
\end{itemize}
Resampling methods can be used for both estimating the precision of
estimated statistics (e.g. standard error of the mean, confidence
intervals) and testing for significane.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Bootstrapping}
\begin{figure}[tp]
\includegraphics[width=0.8\textwidth]{2012-10-29_16-26-05_771}\\[2ex]
@ -88,20 +92,20 @@ of the statistical population. We can use the bootstrap distribution
to draw conclusion regarding the precision of our estimation (e.g.
standard errors and confidence intervals).
Bootstrapping methods create bootstrapped samples from a SRS by
Bootstrapping methods generate bootstrapped samples from a SRS by
resampling. The bootstrapped samples are used to estimate the sampling
distribution of a statistical measure. The bootstrapped samples have
the same size as the original sample and are created by randomly
the same size as the original sample and are generated by randomly
drawing with replacement. That is, each value of the original sample
can occur once, multiple time, or not at all in a bootstrapped
can occur once, multiple times, or not at all in a bootstrapped
sample. This can be implemented by generating random indices into the
data set using the \code{randi()} function.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Bootstrap of the standard error}
\subsection{Bootstrap the standard error}
Bootstrapping can be nicely illustrated at the example of the
Bootstrapping can be nicely illustrated on the example of the
\enterm{standard error} of the mean (\determ{Standardfehler}). The
arithmetic mean is calculated for a simple random sample. The standard
error of the mean is the standard deviation of the expected
@ -121,9 +125,9 @@ population.
the standard error of the mean.}
\end{figure}
Via bootstrapping we create a distribution of the mean values
Via bootstrapping we generate a distribution of mean values
(\figref{bootstrapsemfig}) and the standard deviation of this
distribution is the standard error of the mean.
distribution is the standard error of the sample mean.
\begin{exercise}{bootstrapsem.m}{bootstrapsem.out}
Create the distribution of mean values from bootstrapped samples
@ -148,17 +152,18 @@ distribution is the standard error of the mean.
Statistical tests ask for the probability of a measured value to
originate from a null hypothesis. Is this probability smaller than the
desired \entermde{Signifikanz}{significance level}, the
\entermde{Nullhypothese}{null hypothesis} may be rejected.
\entermde{Nullhypothese}{null hypothesis} can be rejected.
Traditionally, such probabilities are taken from theoretical
distributions which are based on some assumptions about the data. For
example, the data should be normally distributed. Given some data one
has to find an appropriate test that matches the properties of the
data. An alternative approach is to calculate the probability density
of the null hypothesis directly from the data themselves. To do so, we
need to resample the data according to the null hypothesis from the
SRS. By such permutation operations we destroy the feature of interest
while conserving all other statistical properties of the data.
distributions which have been derived based on some assumptions about
the data. For example, the data should be normally distributed. Given
some data one has to find an appropriate test that matches the
properties of the data. An alternative approach is to calculate the
probability density of the null hypothesis directly from the data
themselves. To do so, we need to resample the data according to the
null hypothesis from the SRS. By such permutation operations we
destroy the feature of interest while conserving all other statistical
properties of the data.
\subsection{Significance of a difference in the mean}