diff --git a/bootstrap/exercises/correlationbootstrap.m b/bootstrap/exercises/correlationbootstrap.m new file mode 100644 index 0000000..5abb951 --- /dev/null +++ b/bootstrap/exercises/correlationbootstrap.m @@ -0,0 +1,37 @@ +%% (a) bootstrap: +nperm = 1000; +rb = zeros(nperm,1); +for i=1:nperm + % indices for resampling the data: + inx = randi(length(x), length(x), 1); + % resampled data pairs: + xb=x(inx); + yb=y(inx); + rb(i) = corr(xb, yb); +end + +%% (b) pdf of the correlation coefficients: +[hb,bb] = hist(rb, 20 ); +hb = hb/sum(hb)/(bb(2)-bb(1)); % normalization + +%% (c) significance: +rbq = quantile(rb, 0.05); +fprintf('correlation coefficient at 5%% significance = %.2f\n', rbq ); +if rbq > 0.0 + fprintf('--> correlation r=%.2f is significant\n', rd); +else + fprintf('--> r=%.2f is not a significant correlation\n', rd); +end + +%% plot: +hold on; +bar(b, h, 'facecolor', [0.5 0.5 0.5]); +bar(bb, hb, 'facecolor', 'b'); +bar(bb(bb<=rbq), hb(bb<=rbq), 'facecolor', 'r'); +plot( [rd rd], [0 4], 'r', 'linewidth', 2 ); +xlim([-0.25 0.75]) +xlabel('Correlation coefficient'); +ylabel('Probability density'); +hold off; + +savefigpdf( gcf, 'correlationbootstrap.pdf', 12, 6 ); diff --git a/bootstrap/exercises/correlationbootstrap.pdf b/bootstrap/exercises/correlationbootstrap.pdf new file mode 100644 index 0000000..68f35d1 Binary files /dev/null and b/bootstrap/exercises/correlationbootstrap.pdf differ diff --git a/bootstrap/exercises/correlationsignificance.pdf b/bootstrap/exercises/correlationsignificance.pdf index 9240e4f..1094f94 100644 Binary files a/bootstrap/exercises/correlationsignificance.pdf and b/bootstrap/exercises/correlationsignificance.pdf differ diff --git a/bootstrap/exercises/exercises01.tex b/bootstrap/exercises/exercises01.tex index 6366da8..b5065de 100644 --- a/bootstrap/exercises/exercises01.tex +++ b/bootstrap/exercises/exercises01.tex @@ -148,32 +148,56 @@ distributed? \continue -\question \qt{Permutation test} +\question \qt{Permutation test} \label{permutationtest} We want to compute the significance of a correlation by means of a permutation test. \begin{parts} -\part Generate 1000 correlated pairs $x$, $y$ of random numbers according to: + \part \label{permutationtestdata} Generate 1000 correlated pairs + $x$, $y$ of random numbers according to: \begin{verbatim} n = 1000 a = 0.2; x = randn(n, 1); y = randn(n, 1) + a*x; \end{verbatim} -\part Generate a scatter plot of the two variables. -\part Why is $y$ correlated with $x$? -\part Compute the correlation coefficient between $x$ and $y$. -\part What do you need to do in order to destroy the correlations between the $x$-$y$ pairs? -\part Do exactly this 1000 times and compute each time the correlation coefficient. -\part Compute the probability density of these correlation coefficients. -\part Is the correlation of the original data set significant? -\part What does significance of the correlation mean? -\part Vary the sample size \code{n} and compute in the same way the -significance of the correlation. + \part Generate a scatter plot of the two variables. + \part Why is $y$ correlated with $x$? + \part Compute the correlation coefficient between $x$ and $y$. + \part What do you need to do in order to destroy the correlations between the $x$-$y$ pairs? + \part Do exactly this 1000 times and compute each time the correlation coefficient. + \part Compute and plot the probability density of these correlation + coefficients. + \part Is the correlation of the original data set significant? + \part What does significance of the correlation mean? + \part Vary the sample size \code{n} and compute in the same way the + significance of the correlation. \end{parts} \begin{solution} \lstinputlisting{correlationsignificance.m} \includegraphics[width=1\textwidth]{correlationsignificance} \end{solution} +\question \qt{Bootstrap of the correlation coefficient} +The permutation test generates the distribution of the null hypothesis +of uncorrelated data and we check whether the correlation coefficient +of the data differs significantly from this +distribution. Alternatively we can bootstrap the data while keeping +the pairs and determine the confidence interval of the correlation +coefficient of the data. If this differs significantly from a +correlation coefficient of zero we can conclude that the correlation +coefficient of the data quantifies indeed a correlated data. + +We take the same data set that we have generated in exercise +\ref{permutationtest} (\ref{permutationtestdata}). +\begin{parts} + \part Bootstrap 1000 times the correlation coefficient from the data. + \part Compute and plot the probability density of these correlation + coefficients. + \part Is the correlation of the original data set significant? +\end{parts} +\begin{solution} + \lstinputlisting{correlationbootstrap.m} + \includegraphics[width=1\textwidth]{correlationbootstrap} +\end{solution} \end{questions}