Merge branch 'master' of https://whale.am28.uni-tuebingen.de/git/teaching/scientificComputing
This commit is contained in:
commit
cc332ee25d
@ -15,7 +15,7 @@
|
|||||||
\else
|
\else
|
||||||
\newcommand{\stitle}{}
|
\newcommand{\stitle}{}
|
||||||
\fi
|
\fi
|
||||||
\header{{\bfseries\large Exercise 9\stitle}}{{\bfseries\large Bootstrap}}{{\bfseries\large November 20th, 2018}}
|
\header{{\bfseries\large Exercise 9\stitle}}{{\bfseries\large Bootstrap}}{{\bfseries\large December 9th, 2019}}
|
||||||
\firstpagefooter{Prof. Dr. Jan Benda}{Phone: 29 74573}{Email:
|
\firstpagefooter{Prof. Dr. Jan Benda}{Phone: 29 74573}{Email:
|
||||||
jan.benda@uni-tuebingen.de}
|
jan.benda@uni-tuebingen.de}
|
||||||
\runningfooter{}{\thepage}{}
|
\runningfooter{}{\thepage}{}
|
||||||
@ -86,7 +86,7 @@ jan.benda@uni-tuebingen.de}
|
|||||||
|
|
||||||
\begin{questions}
|
\begin{questions}
|
||||||
|
|
||||||
\question \qt{Bootstrap of the standard error of the mean}
|
\question \qt{Bootstrap the standard error of the mean}
|
||||||
We want to compute the standard error of the mean of a data set by
|
We want to compute the standard error of the mean of a data set by
|
||||||
means of the bootstrap method and compare the result with the formula
|
means of the bootstrap method and compare the result with the formula
|
||||||
``standard deviation divided by the square-root of $n$''.
|
``standard deviation divided by the square-root of $n$''.
|
||||||
@ -119,23 +119,24 @@ means of the bootstrap method and compare the result with the formula
|
|||||||
|
|
||||||
|
|
||||||
\question \qt{Student t-distribution}
|
\question \qt{Student t-distribution}
|
||||||
The distribution of Student's t, $t=\bar x/(\sigma_x/\sqrt{m})$, the
|
The distribution of Student's t, $t=\bar x/(\sigma_x/\sqrt{n})$, the
|
||||||
estimated mean of a data set divided by the estimated standard error
|
estimated mean $\bar x$ of a data set of size $n$ divided by the
|
||||||
of the mean, is not a normal distribution but a Student-t distribution.
|
estimated standard error of the mean $\sigma_x/\sqrt{n}$, where
|
||||||
We want to compute the Student-t distribution and compare it with the
|
$\sigma_x$ is the estimated standard deviation, is not a normal
|
||||||
normal distribution.
|
distribution but a Student-t distribution. We want to compute the
|
||||||
|
Student-t distribution and compare it with the normal distribution.
|
||||||
\begin{parts}
|
\begin{parts}
|
||||||
\part Generate 100000 normally distributed random numbers.
|
\part Generate 100000 normally distributed random numbers.
|
||||||
\part Draw from these data 1000 samples of size $n=3$, 5, 10, and 50.
|
\part Draw from these data 1000 samples of size $n=3$, 5, 10, and
|
||||||
\part Compute the mean $\bar x$ of the samples and plot the
|
50. For each sample size $n$ ...
|
||||||
|
\part ... compute the mean $\bar x$ of the samples and plot the
|
||||||
probability density of these means.
|
probability density of these means.
|
||||||
\part Compare the resulting probability densities with corresponding
|
\part ... compare the resulting probability densities with corresponding
|
||||||
normal distributions.
|
normal distributions.
|
||||||
\part Compute in addition $t=\bar x/(\sigma_x/\sqrt{n})$ (standard
|
\part ... compute Student's $t=\bar x/(\sigma_x/\sqrt{n})$ and compare its
|
||||||
deviation of the samples $\sigma_x$) and compare their distribution
|
distribution with the normal distribution with standard deviation of
|
||||||
with the normal distribution with standard deviation of one. Is $t$
|
one. Is $t$ normally distributed? Under which conditions is $t$
|
||||||
normally distributed? Under which conditions is $t$ normally
|
normally distributed?
|
||||||
distributed?
|
|
||||||
\end{parts}
|
\end{parts}
|
||||||
\newsolutionpage
|
\newsolutionpage
|
||||||
\begin{solution}
|
\begin{solution}
|
||||||
@ -167,16 +168,16 @@ y = randn(n, 1) + a*x;
|
|||||||
\part Compute and plot the probability density of these correlation
|
\part Compute and plot the probability density of these correlation
|
||||||
coefficients.
|
coefficients.
|
||||||
\part Is the correlation of the original data set significant?
|
\part Is the correlation of the original data set significant?
|
||||||
\part What does significance of the correlation mean?
|
\part What does ``significance of the correlation'' mean?
|
||||||
\part Vary the sample size \code{n} and compute in the same way the
|
% \part Vary the sample size \code{n} and compute in the same way the
|
||||||
significance of the correlation.
|
% significance of the correlation.
|
||||||
\end{parts}
|
\end{parts}
|
||||||
\begin{solution}
|
\begin{solution}
|
||||||
\lstinputlisting{correlationsignificance.m}
|
\lstinputlisting{correlationsignificance.m}
|
||||||
\includegraphics[width=1\textwidth]{correlationsignificance}
|
\includegraphics[width=1\textwidth]{correlationsignificance}
|
||||||
\end{solution}
|
\end{solution}
|
||||||
|
|
||||||
\question \qt{Bootstrap of the correlation coefficient}
|
\question \qt{Bootstrap the correlation coefficient}
|
||||||
The permutation test generates the distribution of the null hypothesis
|
The permutation test generates the distribution of the null hypothesis
|
||||||
of uncorrelated data and we check whether the correlation coefficient
|
of uncorrelated data and we check whether the correlation coefficient
|
||||||
of the data differs significantly from this
|
of the data differs significantly from this
|
||||||
@ -184,7 +185,7 @@ distribution. Alternatively we can bootstrap the data while keeping
|
|||||||
the pairs and determine the confidence interval of the correlation
|
the pairs and determine the confidence interval of the correlation
|
||||||
coefficient of the data. If this differs significantly from a
|
coefficient of the data. If this differs significantly from a
|
||||||
correlation coefficient of zero we can conclude that the correlation
|
correlation coefficient of zero we can conclude that the correlation
|
||||||
coefficient of the data quantifies indeed a correlated data.
|
coefficient of the data indeed quantifies correlated data.
|
||||||
|
|
||||||
We take the same data set that we have generated in exercise
|
We take the same data set that we have generated in exercise
|
||||||
\ref{permutationtest} (\ref{permutationtestdata}).
|
\ref{permutationtest} (\ref{permutationtestdata}).
|
||||||
|
@ -84,9 +84,11 @@ standard errors and confidence intervals).
|
|||||||
Bootstrapping methods create bootstrapped samples from a SRS by
|
Bootstrapping methods create bootstrapped samples from a SRS by
|
||||||
resampling. The bootstrapped samples are used to estimate the sampling
|
resampling. The bootstrapped samples are used to estimate the sampling
|
||||||
distribution of a statistical measure. The bootstrapped samples have
|
distribution of a statistical measure. The bootstrapped samples have
|
||||||
the same size as the original sample and are created by randomly drawing with
|
the same size as the original sample and are created by randomly
|
||||||
replacement. That is, each value of the original sample can occur
|
drawing with replacement. That is, each value of the original sample
|
||||||
once, multiple time, or not at all in a bootstrapped sample.
|
can occur once, multiple time, or not at all in a bootstrapped
|
||||||
|
sample. This can be implemented by generating random indices into the
|
||||||
|
data set using the \code{randi()} function.
|
||||||
|
|
||||||
|
|
||||||
\section{Bootstrap of the standard error}
|
\section{Bootstrap of the standard error}
|
||||||
@ -165,13 +167,13 @@ data points $(x_i, y_i)$. By calculating the correlation coefficient
|
|||||||
we can quantify how strongly $y$ depends on $x$. The correlation
|
we can quantify how strongly $y$ depends on $x$. The correlation
|
||||||
coefficient alone, however, does not tell whether the correlation is
|
coefficient alone, however, does not tell whether the correlation is
|
||||||
significantly different from a random correlation. The null hypothesis
|
significantly different from a random correlation. The null hypothesis
|
||||||
for such a situation would be that $y$ does not depend on $x$. In
|
for such a situation is that $y$ does not depend on $x$. In
|
||||||
order to perform a permutation test, we need to destroy the
|
order to perform a permutation test, we need to destroy the
|
||||||
correlation by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the
|
correlation by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the
|
||||||
$x_i$ and $y_i$ values in a random fashion. Generating many sets of
|
$x_i$ and $y_i$ values in a random fashion. Generating many sets of
|
||||||
random pairs and computing the resulting correlation coefficients,
|
random pairs and computing the resulting correlation coefficients
|
||||||
yields a distribution of correlation coefficients that result
|
yields a distribution of correlation coefficients that result
|
||||||
randomnly from uncorrelated data. By comparing the actually measured
|
randomly from uncorrelated data. By comparing the actually measured
|
||||||
correlation coefficient with this distribution we can directly assess
|
correlation coefficient with this distribution we can directly assess
|
||||||
the significance of the correlation
|
the significance of the correlation
|
||||||
(figure\,\ref{permutecorrelationfig}).
|
(figure\,\ref{permutecorrelationfig}).
|
||||||
@ -183,10 +185,10 @@ Estimate the statistical significance of a correlation coefficient.
|
|||||||
and calculate the respective $y$-values according to $y_i =0.2 \cdot x_i + u_i$
|
and calculate the respective $y$-values according to $y_i =0.2 \cdot x_i + u_i$
|
||||||
where $u_i$ is a random number drawn from a normal distribution.
|
where $u_i$ is a random number drawn from a normal distribution.
|
||||||
\item Calculate the correlation coefficient.
|
\item Calculate the correlation coefficient.
|
||||||
\item Generate the distribution according to the null hypothesis by
|
\item Generate the distribution of the null hypothesis by generating
|
||||||
generating uncorrelated pairs. For this permute $x$- and $y$-values
|
uncorrelated pairs. For this permute $x$- and $y$-values
|
||||||
\matlabfun{randperm()} 1000 times and calculate for each
|
\matlabfun{randperm()} 1000 times and calculate for each permutation
|
||||||
permutation the correlation coefficient.
|
the correlation coefficient.
|
||||||
\item Read out the 95\,\% percentile from the resulting distribution
|
\item Read out the 95\,\% percentile from the resulting distribution
|
||||||
of the null hypothesis and compare it with the correlation
|
of the null hypothesis and compare it with the correlation
|
||||||
coefficient computed from the original data.
|
coefficient computed from the original data.
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
%!PS-Adobe-2.0 EPSF-2.0
|
%!PS-Adobe-2.0 EPSF-2.0
|
||||||
%%Title: pointprocessscetchA.tex
|
%%Title: pointprocessscetchA.tex
|
||||||
%%Creator: gnuplot 4.6 patchlevel 4
|
%%Creator: gnuplot 4.6 patchlevel 4
|
||||||
%%CreationDate: Mon Dec 2 13:03:15 2019
|
%%CreationDate: Tue Dec 3 08:08:50 2019
|
||||||
%%DocumentFonts:
|
%%DocumentFonts:
|
||||||
%%BoundingBox: 50 50 373 135
|
%%BoundingBox: 50 50 373 135
|
||||||
%%EndComments
|
%%EndComments
|
||||||
@ -430,10 +430,10 @@ SDict begin [
|
|||||||
/Title (pointprocessscetchA.tex)
|
/Title (pointprocessscetchA.tex)
|
||||||
/Subject (gnuplot plot)
|
/Subject (gnuplot plot)
|
||||||
/Creator (gnuplot 4.6 patchlevel 4)
|
/Creator (gnuplot 4.6 patchlevel 4)
|
||||||
/Author (benda)
|
/Author (jan)
|
||||||
% /Producer (gnuplot)
|
% /Producer (gnuplot)
|
||||||
% /Keywords ()
|
% /Keywords ()
|
||||||
/CreationDate (Mon Dec 2 13:03:15 2019)
|
/CreationDate (Tue Dec 3 08:08:50 2019)
|
||||||
/DOCINFO pdfmark
|
/DOCINFO pdfmark
|
||||||
end
|
end
|
||||||
} ifelse
|
} ifelse
|
||||||
|
Binary file not shown.
@ -1,7 +1,7 @@
|
|||||||
%!PS-Adobe-2.0 EPSF-2.0
|
%!PS-Adobe-2.0 EPSF-2.0
|
||||||
%%Title: pointprocessscetchB.tex
|
%%Title: pointprocessscetchB.tex
|
||||||
%%Creator: gnuplot 4.6 patchlevel 4
|
%%Creator: gnuplot 4.6 patchlevel 4
|
||||||
%%CreationDate: Mon Dec 2 13:03:15 2019
|
%%CreationDate: Tue Dec 3 08:08:50 2019
|
||||||
%%DocumentFonts:
|
%%DocumentFonts:
|
||||||
%%BoundingBox: 50 50 373 237
|
%%BoundingBox: 50 50 373 237
|
||||||
%%EndComments
|
%%EndComments
|
||||||
@ -430,10 +430,10 @@ SDict begin [
|
|||||||
/Title (pointprocessscetchB.tex)
|
/Title (pointprocessscetchB.tex)
|
||||||
/Subject (gnuplot plot)
|
/Subject (gnuplot plot)
|
||||||
/Creator (gnuplot 4.6 patchlevel 4)
|
/Creator (gnuplot 4.6 patchlevel 4)
|
||||||
/Author (benda)
|
/Author (jan)
|
||||||
% /Producer (gnuplot)
|
% /Producer (gnuplot)
|
||||||
% /Keywords ()
|
% /Keywords ()
|
||||||
/CreationDate (Mon Dec 2 13:03:15 2019)
|
/CreationDate (Tue Dec 3 08:08:50 2019)
|
||||||
/DOCINFO pdfmark
|
/DOCINFO pdfmark
|
||||||
end
|
end
|
||||||
} ifelse
|
} ifelse
|
||||||
|
Binary file not shown.
Reference in New Issue
Block a user