diff --git a/bootstrap/lecture/bootstrap.tex b/bootstrap/lecture/bootstrap.tex index f0fae62..79024fa 100644 --- a/bootstrap/lecture/bootstrap.tex +++ b/bootstrap/lecture/bootstrap.tex @@ -33,10 +33,11 @@ population. Rather, we draw samples (\enterm{simple random sample} then estimate a statistical measure of interest (e.g. the average length of the pickles) within this sample and hope that it is a good approximation of the unknown and immeasurable true average length of -the population (\determ{Populationsparameter}). We apply statistical -methods to find out how precise this approximation is. +the population (\endeterm{Populationsparameter}{population + parameter}). We apply statistical methods to find out how precise +this approximation is. -If we could draw a large number of \enterm{simple random samples} we +If we could draw a large number of simple random samples we could calculate the statistical measure of interest for each sample and estimate its probability distribution using a histogram. This distribution is called the \enterm{sampling distribution} @@ -69,17 +70,18 @@ error of the mean which is the standard deviation of the sampling distribution of average values around the true mean of the population (\subfigref{bootstrapsamplingdistributionfig}{b}). -Alternatively, we can use ``bootstrapping'' to generate new samples -from one set of measurements (resampling). From these bootstrapped -samples we compute the desired statistical measure and estimate their -distribution (\enterm{bootstrap distribution}, -\subfigref{bootstrapsamplingdistributionfig}{c}). Interestingly, this -distribution is very similar to the sampling distribution regarding -its width. The only difference is that the bootstrapped values are -distributed around the measure of the original sample and not the one -of the statistical population. We can use the bootstrap distribution -to draw conclusion regarding the precision of our estimation (e.g. -standard errors and confidence intervals). +Alternatively, we can use \enterm{bootstrapping} +(\determ{Bootstrap-Verfahren}) to generate new samples from one set of +measurements (\endeterm{Resampling}{resampling}). From these +bootstrapped samples we compute the desired statistical measure and +estimate their distribution (\endeterm{Bootstrapverteilung}{bootstrap + distribution}, \subfigref{bootstrapsamplingdistributionfig}{c}). +Interestingly, this distribution is very similar to the sampling +distribution regarding its width. The only difference is that the +bootstrapped values are distributed around the measure of the original +sample and not the one of the statistical population. We can use the +bootstrap distribution to draw conclusion regarding the precision of +our estimation (e.g. standard errors and confidence intervals). Bootstrapping methods create bootstrapped samples from a SRS by resampling. The bootstrapped samples are used to estimate the sampling @@ -93,11 +95,12 @@ data set using the \code{randi()} function. \section{Bootstrap of the standard error} -Bootstrapping can be nicely illustrated at the example of the standard -error of the mean. The arithmetic mean is calculated for a simple -random sample. The standard error of the mean is the standard -deviation of the expected distribution of mean values around the mean -of the statistical population. +Bootstrapping can be nicely illustrated at the example of the +\enterm{standard error} of the mean (\determ{Standardfehler}). The +arithmetic mean is calculated for a simple random sample. The standard +error of the mean is the standard deviation of the expected +distribution of mean values around the mean of the statistical +population. \begin{figure}[tp] \includegraphics[width=1\textwidth]{bootstrapsem} @@ -135,9 +138,10 @@ distribution is the standard error of the mean. \section{Permutation tests} -Statistical tests ask for the probability of a measured value -to originate from a null hypothesis. Is this probability smaller than -the desired significance level, the null hypothesis may be rejected. +Statistical tests ask for the probability of a measured value to +originate from a null hypothesis. Is this probability smaller than the +desired \endeterm{Signifikanz}{significance level}, the +\endeterm{Nullhypothese}{null hypothesis} may be rejected. Traditionally, such probabilities are taken from theoretical distributions which are based on assumptions about the data. Thus the @@ -161,22 +165,25 @@ while we conserve all other statistical properties of the data. statistically significant.} \end{figure} -A good example for the application of a permutaion test is the -statistical assessment of correlations. Given are measured pairs of -data points $(x_i, y_i)$. By calculating the correlation coefficient -we can quantify how strongly $y$ depends on $x$. The correlation -coefficient alone, however, does not tell whether the correlation is -significantly different from a random correlation. The null hypothesis -for such a situation is that $y$ does not depend on $x$. In -order to perform a permutation test, we need to destroy the -correlation by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the -$x_i$ and $y_i$ values in a random fashion. Generating many sets of -random pairs and computing the resulting correlation coefficients -yields a distribution of correlation coefficients that result -randomly from uncorrelated data. By comparing the actually measured -correlation coefficient with this distribution we can directly assess -the significance of the correlation -(figure\,\ref{permutecorrelationfig}). +A good example for the application of a +\endeterm{Permutationstest}{permutaion test} is the statistical +assessment of \endeterm[correlation]{Korrelation}{correlations}. Given +are measured pairs of data points $(x_i, y_i)$. By calculating the +\endeterm[correlation!correlation +coefficient]{Korrelation!Korrelationskoeffizient}{correlation + coefficient} we can quantify how strongly $y$ depends on $x$. The +correlation coefficient alone, however, does not tell whether the +correlation is significantly different from a random correlation. The +\endeterm[]{Nullhypothese}{null hypothesis} for such a situation is that +$y$ does not depend on $x$. In order to perform a permutation test, we +need to destroy the correlation by permuting the $(x_i, y_i)$ pairs, +i.e. we rearrange the $x_i$ and $y_i$ values in a random +fashion. Generating many sets of random pairs and computing the +resulting correlation coefficients yields a distribution of +correlation coefficients that result randomly from uncorrelated +data. By comparing the actually measured correlation coefficient with +this distribution we can directly assess the significance of the +correlation (figure\,\ref{permutecorrelationfig}). \begin{exercise}{correlationsignificance.m}{correlationsignificance.out} Estimate the statistical significance of a correlation coefficient. diff --git a/header.tex b/header.tex index b83bb39..0b4d53c 100644 --- a/header.tex +++ b/header.tex @@ -212,9 +212,19 @@ %%%%% english, german, code and file terms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \usepackage{ifthen} + +% \enterm[english index entry]{} \newcommand{\enterm}[2][]{\tr{\textit{#2}}{``#2''}\ifthenelse{\equal{#1}{}}{\tr{\protect\sindex[term]{#2}}{\protect\sindex[enterm]{#2}}}{\tr{\protect\sindex[term]{#1}}{\protect\sindex[enterm]{#1}}}} + +% \endeterm[english index entry]{}{} +\newcommand{\endeterm}[3][]{\tr{\textit{#3}}{``#3''}\ifthenelse{\equal{#1}{}}{\tr{\protect\sindex[term]{#3}}{\protect\sindex[enterm]{#3}}}{\tr{\protect\sindex[term]{#1}}{\protect\sindex[enterm]{#1}}}\protect\sindex[determ]{#2}} + +% \determ[index entry]{} \newcommand{\determ}[2][]{\tr{``#2''}{\textit{#2}}\ifthenelse{\equal{#1}{}}{\tr{\protect\sindex[determ]{#2}}{\protect\sindex[term]{#2}}}{\tr{\protect\sindex[determ]{#1}}{\protect\sindex[term]{#1}}}} + +% \codeterm[index entry]{} \newcommand{\codeterm}[2][]{\textit{#2}\ifthenelse{\equal{#1}{}}{\protect\sindex[term]{#2}}{\protect\sindex[term]{#1}}} + \newcommand{\file}[1]{\texttt{#1}} % for escaping special characters into the index: diff --git a/statistics/lecture/statistics.tex b/statistics/lecture/statistics.tex index 3f59d46..6a6c0b4 100644 --- a/statistics/lecture/statistics.tex +++ b/statistics/lecture/statistics.tex @@ -455,7 +455,7 @@ bivariate or multivariate data sets where we have pairs or tuples of data values (e.g. size and weight of elephants) we want to analyze dependencies between the variables. -The \enterm{correlation coefficient} +The \enterm[correlation!correlation coefficient]{correlation coefficient} \begin{equation} \label{correlationcoefficient} r_{x,y} = \frac{Cov(x,y)}{\sigma_x \sigma_y} = \frac{\langle @@ -465,7 +465,7 @@ The \enterm{correlation coefficient} \end{equation} quantifies linear relationships between two variables \matlabfun{corr()}. The correlation coefficient is the -\determ{covariance} normalized by the standard deviations of the +\enterm{covariance} normalized by the standard deviations of the single variables. Perfectly correlated variables result in a correlation coefficient of $+1$, anit-correlated or negatively correlated data in a correlation coefficient of $-1$ and un-correlated