[bootstrap] generalized intro
This commit is contained in:
parent
3dd0660b21
commit
b76b2d35cc
@ -23,7 +23,14 @@ This chapter easily covers two lectures:
|
||||
\item 1. Bootstrapping with a proper introduction of of confidence intervals
|
||||
\item 2. Permutation test with a proper introduction of statistical tests (distribution of nullhypothesis, significance, power, etc.)
|
||||
\end{itemize}
|
||||
|
||||
Add jacknife methods to bootstrapping
|
||||
ToDo:
|
||||
\begin{itemize}
|
||||
\item Add jacknife methods to bootstrapping
|
||||
\item Add discussion of confidence intervals to descriptive statistics chapter
|
||||
\item Have a separate chapter on statistical tests before. What is the
|
||||
essence of a statistical test (null hypothesis distribution), power
|
||||
analysis, and a few examples of existing functions for statistical
|
||||
tests.
|
||||
\end{itemize}
|
||||
|
||||
\end{document}
|
||||
|
@ -5,20 +5,24 @@
|
||||
\exercisechapter{Resampling methods}
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{Bootstrapping}
|
||||
|
||||
Bootstrapping methods are applied to create distributions of
|
||||
statistical measures via resampling of a sample. Bootstrapping offers several
|
||||
advantages:
|
||||
\entermde{Resampling methoden}{Resampling methods} are applied to
|
||||
generate distributions of statistical measures via resampling of
|
||||
existing samples. Resampling offers several advantages:
|
||||
\begin{itemize}
|
||||
\item Fewer assumptions (e.g. a measured sample does not need to be
|
||||
normally distributed).
|
||||
\item Increased precision as compared to classical methods. %such as?
|
||||
\item General applicability: the bootstrapping methods are very
|
||||
\item General applicability: the resampling methods are very
|
||||
similar for different statistics and there is no need to specialize
|
||||
the method to specific statistic measures.
|
||||
\end{itemize}
|
||||
Resampling methods can be used for both estimating the precision of
|
||||
estimated statistics (e.g. standard error of the mean, confidence
|
||||
intervals) and testing for significane.
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{Bootstrapping}
|
||||
|
||||
\begin{figure}[tp]
|
||||
\includegraphics[width=0.8\textwidth]{2012-10-29_16-26-05_771}\\[2ex]
|
||||
@ -88,20 +92,20 @@ of the statistical population. We can use the bootstrap distribution
|
||||
to draw conclusion regarding the precision of our estimation (e.g.
|
||||
standard errors and confidence intervals).
|
||||
|
||||
Bootstrapping methods create bootstrapped samples from a SRS by
|
||||
Bootstrapping methods generate bootstrapped samples from a SRS by
|
||||
resampling. The bootstrapped samples are used to estimate the sampling
|
||||
distribution of a statistical measure. The bootstrapped samples have
|
||||
the same size as the original sample and are created by randomly
|
||||
the same size as the original sample and are generated by randomly
|
||||
drawing with replacement. That is, each value of the original sample
|
||||
can occur once, multiple time, or not at all in a bootstrapped
|
||||
can occur once, multiple times, or not at all in a bootstrapped
|
||||
sample. This can be implemented by generating random indices into the
|
||||
data set using the \code{randi()} function.
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Bootstrap of the standard error}
|
||||
\subsection{Bootstrap the standard error}
|
||||
|
||||
Bootstrapping can be nicely illustrated at the example of the
|
||||
Bootstrapping can be nicely illustrated on the example of the
|
||||
\enterm{standard error} of the mean (\determ{Standardfehler}). The
|
||||
arithmetic mean is calculated for a simple random sample. The standard
|
||||
error of the mean is the standard deviation of the expected
|
||||
@ -121,9 +125,9 @@ population.
|
||||
the standard error of the mean.}
|
||||
\end{figure}
|
||||
|
||||
Via bootstrapping we create a distribution of the mean values
|
||||
Via bootstrapping we generate a distribution of mean values
|
||||
(\figref{bootstrapsemfig}) and the standard deviation of this
|
||||
distribution is the standard error of the mean.
|
||||
distribution is the standard error of the sample mean.
|
||||
|
||||
\begin{exercise}{bootstrapsem.m}{bootstrapsem.out}
|
||||
Create the distribution of mean values from bootstrapped samples
|
||||
@ -148,17 +152,18 @@ distribution is the standard error of the mean.
|
||||
Statistical tests ask for the probability of a measured value to
|
||||
originate from a null hypothesis. Is this probability smaller than the
|
||||
desired \entermde{Signifikanz}{significance level}, the
|
||||
\entermde{Nullhypothese}{null hypothesis} may be rejected.
|
||||
\entermde{Nullhypothese}{null hypothesis} can be rejected.
|
||||
|
||||
Traditionally, such probabilities are taken from theoretical
|
||||
distributions which are based on some assumptions about the data. For
|
||||
example, the data should be normally distributed. Given some data one
|
||||
has to find an appropriate test that matches the properties of the
|
||||
data. An alternative approach is to calculate the probability density
|
||||
of the null hypothesis directly from the data themselves. To do so, we
|
||||
need to resample the data according to the null hypothesis from the
|
||||
SRS. By such permutation operations we destroy the feature of interest
|
||||
while conserving all other statistical properties of the data.
|
||||
distributions which have been derived based on some assumptions about
|
||||
the data. For example, the data should be normally distributed. Given
|
||||
some data one has to find an appropriate test that matches the
|
||||
properties of the data. An alternative approach is to calculate the
|
||||
probability density of the null hypothesis directly from the data
|
||||
themselves. To do so, we need to resample the data according to the
|
||||
null hypothesis from the SRS. By such permutation operations we
|
||||
destroy the feature of interest while conserving all other statistical
|
||||
properties of the data.
|
||||
|
||||
|
||||
\subsection{Significance of a difference in the mean}
|
||||
|
Reference in New Issue
Block a user