This commit is contained in:
Jan Benda 2019-12-04 17:51:53 +01:00
commit cc332ee25d
11 changed files with 72 additions and 69 deletions

View File

@ -15,7 +15,7 @@
\else \else
\newcommand{\stitle}{} \newcommand{\stitle}{}
\fi \fi
\header{{\bfseries\large Exercise 9\stitle}}{{\bfseries\large Bootstrap}}{{\bfseries\large November 20th, 2018}} \header{{\bfseries\large Exercise 9\stitle}}{{\bfseries\large Bootstrap}}{{\bfseries\large December 9th, 2019}}
\firstpagefooter{Prof. Dr. Jan Benda}{Phone: 29 74573}{Email: \firstpagefooter{Prof. Dr. Jan Benda}{Phone: 29 74573}{Email:
jan.benda@uni-tuebingen.de} jan.benda@uni-tuebingen.de}
\runningfooter{}{\thepage}{} \runningfooter{}{\thepage}{}
@ -86,7 +86,7 @@ jan.benda@uni-tuebingen.de}
\begin{questions} \begin{questions}
\question \qt{Bootstrap of the standard error of the mean} \question \qt{Bootstrap the standard error of the mean}
We want to compute the standard error of the mean of a data set by We want to compute the standard error of the mean of a data set by
means of the bootstrap method and compare the result with the formula means of the bootstrap method and compare the result with the formula
``standard deviation divided by the square-root of $n$''. ``standard deviation divided by the square-root of $n$''.
@ -119,23 +119,24 @@ means of the bootstrap method and compare the result with the formula
\question \qt{Student t-distribution} \question \qt{Student t-distribution}
The distribution of Student's t, $t=\bar x/(\sigma_x/\sqrt{m})$, the The distribution of Student's t, $t=\bar x/(\sigma_x/\sqrt{n})$, the
estimated mean of a data set divided by the estimated standard error estimated mean $\bar x$ of a data set of size $n$ divided by the
of the mean, is not a normal distribution but a Student-t distribution. estimated standard error of the mean $\sigma_x/\sqrt{n}$, where
We want to compute the Student-t distribution and compare it with the $\sigma_x$ is the estimated standard deviation, is not a normal
normal distribution. distribution but a Student-t distribution. We want to compute the
Student-t distribution and compare it with the normal distribution.
\begin{parts} \begin{parts}
\part Generate 100000 normally distributed random numbers. \part Generate 100000 normally distributed random numbers.
\part Draw from these data 1000 samples of size $n=3$, 5, 10, and 50. \part Draw from these data 1000 samples of size $n=3$, 5, 10, and
\part Compute the mean $\bar x$ of the samples and plot the 50. For each sample size $n$ ...
\part ... compute the mean $\bar x$ of the samples and plot the
probability density of these means. probability density of these means.
\part Compare the resulting probability densities with corresponding \part ... compare the resulting probability densities with corresponding
normal distributions. normal distributions.
\part Compute in addition $t=\bar x/(\sigma_x/\sqrt{n})$ (standard \part ... compute Student's $t=\bar x/(\sigma_x/\sqrt{n})$ and compare its
deviation of the samples $\sigma_x$) and compare their distribution distribution with the normal distribution with standard deviation of
with the normal distribution with standard deviation of one. Is $t$ one. Is $t$ normally distributed? Under which conditions is $t$
normally distributed? Under which conditions is $t$ normally normally distributed?
distributed?
\end{parts} \end{parts}
\newsolutionpage \newsolutionpage
\begin{solution} \begin{solution}
@ -167,16 +168,16 @@ y = randn(n, 1) + a*x;
\part Compute and plot the probability density of these correlation \part Compute and plot the probability density of these correlation
coefficients. coefficients.
\part Is the correlation of the original data set significant? \part Is the correlation of the original data set significant?
\part What does significance of the correlation mean? \part What does ``significance of the correlation'' mean?
\part Vary the sample size \code{n} and compute in the same way the % \part Vary the sample size \code{n} and compute in the same way the
significance of the correlation. % significance of the correlation.
\end{parts} \end{parts}
\begin{solution} \begin{solution}
\lstinputlisting{correlationsignificance.m} \lstinputlisting{correlationsignificance.m}
\includegraphics[width=1\textwidth]{correlationsignificance} \includegraphics[width=1\textwidth]{correlationsignificance}
\end{solution} \end{solution}
\question \qt{Bootstrap of the correlation coefficient} \question \qt{Bootstrap the correlation coefficient}
The permutation test generates the distribution of the null hypothesis The permutation test generates the distribution of the null hypothesis
of uncorrelated data and we check whether the correlation coefficient of uncorrelated data and we check whether the correlation coefficient
of the data differs significantly from this of the data differs significantly from this
@ -184,7 +185,7 @@ distribution. Alternatively we can bootstrap the data while keeping
the pairs and determine the confidence interval of the correlation the pairs and determine the confidence interval of the correlation
coefficient of the data. If this differs significantly from a coefficient of the data. If this differs significantly from a
correlation coefficient of zero we can conclude that the correlation correlation coefficient of zero we can conclude that the correlation
coefficient of the data quantifies indeed a correlated data. coefficient of the data indeed quantifies correlated data.
We take the same data set that we have generated in exercise We take the same data set that we have generated in exercise
\ref{permutationtest} (\ref{permutationtestdata}). \ref{permutationtest} (\ref{permutationtestdata}).

View File

@ -84,9 +84,11 @@ standard errors and confidence intervals).
Bootstrapping methods create bootstrapped samples from a SRS by Bootstrapping methods create bootstrapped samples from a SRS by
resampling. The bootstrapped samples are used to estimate the sampling resampling. The bootstrapped samples are used to estimate the sampling
distribution of a statistical measure. The bootstrapped samples have distribution of a statistical measure. The bootstrapped samples have
the same size as the original sample and are created by randomly drawing with the same size as the original sample and are created by randomly
replacement. That is, each value of the original sample can occur drawing with replacement. That is, each value of the original sample
once, multiple time, or not at all in a bootstrapped sample. can occur once, multiple time, or not at all in a bootstrapped
sample. This can be implemented by generating random indices into the
data set using the \code{randi()} function.
\section{Bootstrap of the standard error} \section{Bootstrap of the standard error}
@ -165,13 +167,13 @@ data points $(x_i, y_i)$. By calculating the correlation coefficient
we can quantify how strongly $y$ depends on $x$. The correlation we can quantify how strongly $y$ depends on $x$. The correlation
coefficient alone, however, does not tell whether the correlation is coefficient alone, however, does not tell whether the correlation is
significantly different from a random correlation. The null hypothesis significantly different from a random correlation. The null hypothesis
for such a situation would be that $y$ does not depend on $x$. In for such a situation is that $y$ does not depend on $x$. In
order to perform a permutation test, we need to destroy the order to perform a permutation test, we need to destroy the
correlation by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the correlation by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the
$x_i$ and $y_i$ values in a random fashion. Generating many sets of $x_i$ and $y_i$ values in a random fashion. Generating many sets of
random pairs and computing the resulting correlation coefficients, random pairs and computing the resulting correlation coefficients
yields a distribution of correlation coefficients that result yields a distribution of correlation coefficients that result
randomnly from uncorrelated data. By comparing the actually measured randomly from uncorrelated data. By comparing the actually measured
correlation coefficient with this distribution we can directly assess correlation coefficient with this distribution we can directly assess
the significance of the correlation the significance of the correlation
(figure\,\ref{permutecorrelationfig}). (figure\,\ref{permutecorrelationfig}).
@ -183,10 +185,10 @@ Estimate the statistical significance of a correlation coefficient.
and calculate the respective $y$-values according to $y_i =0.2 \cdot x_i + u_i$ and calculate the respective $y$-values according to $y_i =0.2 \cdot x_i + u_i$
where $u_i$ is a random number drawn from a normal distribution. where $u_i$ is a random number drawn from a normal distribution.
\item Calculate the correlation coefficient. \item Calculate the correlation coefficient.
\item Generate the distribution according to the null hypothesis by \item Generate the distribution of the null hypothesis by generating
generating uncorrelated pairs. For this permute $x$- and $y$-values uncorrelated pairs. For this permute $x$- and $y$-values
\matlabfun{randperm()} 1000 times and calculate for each \matlabfun{randperm()} 1000 times and calculate for each permutation
permutation the correlation coefficient. the correlation coefficient.
\item Read out the 95\,\% percentile from the resulting distribution \item Read out the 95\,\% percentile from the resulting distribution
of the null hypothesis and compare it with the correlation of the null hypothesis and compare it with the correlation
coefficient computed from the original data. coefficient computed from the original data.

View File

@ -1,7 +1,7 @@
%!PS-Adobe-2.0 EPSF-2.0 %!PS-Adobe-2.0 EPSF-2.0
%%Title: pointprocessscetchA.tex %%Title: pointprocessscetchA.tex
%%Creator: gnuplot 4.6 patchlevel 4 %%Creator: gnuplot 4.6 patchlevel 4
%%CreationDate: Mon Dec 2 13:03:15 2019 %%CreationDate: Tue Dec 3 08:08:50 2019
%%DocumentFonts: %%DocumentFonts:
%%BoundingBox: 50 50 373 135 %%BoundingBox: 50 50 373 135
%%EndComments %%EndComments
@ -430,10 +430,10 @@ SDict begin [
/Title (pointprocessscetchA.tex) /Title (pointprocessscetchA.tex)
/Subject (gnuplot plot) /Subject (gnuplot plot)
/Creator (gnuplot 4.6 patchlevel 4) /Creator (gnuplot 4.6 patchlevel 4)
/Author (benda) /Author (jan)
% /Producer (gnuplot) % /Producer (gnuplot)
% /Keywords () % /Keywords ()
/CreationDate (Mon Dec 2 13:03:15 2019) /CreationDate (Tue Dec 3 08:08:50 2019)
/DOCINFO pdfmark /DOCINFO pdfmark
end end
} ifelse } ifelse

View File

@ -1,7 +1,7 @@
%!PS-Adobe-2.0 EPSF-2.0 %!PS-Adobe-2.0 EPSF-2.0
%%Title: pointprocessscetchB.tex %%Title: pointprocessscetchB.tex
%%Creator: gnuplot 4.6 patchlevel 4 %%Creator: gnuplot 4.6 patchlevel 4
%%CreationDate: Mon Dec 2 13:03:15 2019 %%CreationDate: Tue Dec 3 08:08:50 2019
%%DocumentFonts: %%DocumentFonts:
%%BoundingBox: 50 50 373 237 %%BoundingBox: 50 50 373 237
%%EndComments %%EndComments
@ -430,10 +430,10 @@ SDict begin [
/Title (pointprocessscetchB.tex) /Title (pointprocessscetchB.tex)
/Subject (gnuplot plot) /Subject (gnuplot plot)
/Creator (gnuplot 4.6 patchlevel 4) /Creator (gnuplot 4.6 patchlevel 4)
/Author (benda) /Author (jan)
% /Producer (gnuplot) % /Producer (gnuplot)
% /Keywords () % /Keywords ()
/CreationDate (Mon Dec 2 13:03:15 2019) /CreationDate (Tue Dec 3 08:08:50 2019)
/DOCINFO pdfmark /DOCINFO pdfmark
end end
} ifelse } ifelse