new command \endeterm for english terms that also make an entry into the german index - not working yet

This commit is contained in:
Jan Benda 2019-12-05 09:16:24 +01:00
parent 006fa998cc
commit 4d2bedd78c
3 changed files with 57 additions and 40 deletions

View File

@ -33,10 +33,11 @@ population. Rather, we draw samples (\enterm{simple random sample}
then estimate a statistical measure of interest (e.g. the average then estimate a statistical measure of interest (e.g. the average
length of the pickles) within this sample and hope that it is a good length of the pickles) within this sample and hope that it is a good
approximation of the unknown and immeasurable true average length of approximation of the unknown and immeasurable true average length of
the population (\determ{Populationsparameter}). We apply statistical the population (\endeterm{Populationsparameter}{population
methods to find out how precise this approximation is. parameter}). We apply statistical methods to find out how precise
this approximation is.
If we could draw a large number of \enterm{simple random samples} we If we could draw a large number of simple random samples we
could calculate the statistical measure of interest for each sample could calculate the statistical measure of interest for each sample
and estimate its probability distribution using a histogram. This and estimate its probability distribution using a histogram. This
distribution is called the \enterm{sampling distribution} distribution is called the \enterm{sampling distribution}
@ -69,17 +70,18 @@ error of the mean which is the standard deviation of the sampling
distribution of average values around the true mean of the population distribution of average values around the true mean of the population
(\subfigref{bootstrapsamplingdistributionfig}{b}). (\subfigref{bootstrapsamplingdistributionfig}{b}).
Alternatively, we can use ``bootstrapping'' to generate new samples Alternatively, we can use \enterm{bootstrapping}
from one set of measurements (resampling). From these bootstrapped (\determ{Bootstrap-Verfahren}) to generate new samples from one set of
samples we compute the desired statistical measure and estimate their measurements (\endeterm{Resampling}{resampling}). From these
distribution (\enterm{bootstrap distribution}, bootstrapped samples we compute the desired statistical measure and
\subfigref{bootstrapsamplingdistributionfig}{c}). Interestingly, this estimate their distribution (\endeterm{Bootstrapverteilung}{bootstrap
distribution is very similar to the sampling distribution regarding distribution}, \subfigref{bootstrapsamplingdistributionfig}{c}).
its width. The only difference is that the bootstrapped values are Interestingly, this distribution is very similar to the sampling
distributed around the measure of the original sample and not the one distribution regarding its width. The only difference is that the
of the statistical population. We can use the bootstrap distribution bootstrapped values are distributed around the measure of the original
to draw conclusion regarding the precision of our estimation (e.g. sample and not the one of the statistical population. We can use the
standard errors and confidence intervals). bootstrap distribution to draw conclusion regarding the precision of
our estimation (e.g. standard errors and confidence intervals).
Bootstrapping methods create bootstrapped samples from a SRS by Bootstrapping methods create bootstrapped samples from a SRS by
resampling. The bootstrapped samples are used to estimate the sampling resampling. The bootstrapped samples are used to estimate the sampling
@ -93,11 +95,12 @@ data set using the \code{randi()} function.
\section{Bootstrap of the standard error} \section{Bootstrap of the standard error}
Bootstrapping can be nicely illustrated at the example of the standard Bootstrapping can be nicely illustrated at the example of the
error of the mean. The arithmetic mean is calculated for a simple \enterm{standard error} of the mean (\determ{Standardfehler}). The
random sample. The standard error of the mean is the standard arithmetic mean is calculated for a simple random sample. The standard
deviation of the expected distribution of mean values around the mean error of the mean is the standard deviation of the expected
of the statistical population. distribution of mean values around the mean of the statistical
population.
\begin{figure}[tp] \begin{figure}[tp]
\includegraphics[width=1\textwidth]{bootstrapsem} \includegraphics[width=1\textwidth]{bootstrapsem}
@ -135,9 +138,10 @@ distribution is the standard error of the mean.
\section{Permutation tests} \section{Permutation tests}
Statistical tests ask for the probability of a measured value Statistical tests ask for the probability of a measured value to
to originate from a null hypothesis. Is this probability smaller than originate from a null hypothesis. Is this probability smaller than the
the desired significance level, the null hypothesis may be rejected. desired \endeterm{Signifikanz}{significance level}, the
\endeterm{Nullhypothese}{null hypothesis} may be rejected.
Traditionally, such probabilities are taken from theoretical Traditionally, such probabilities are taken from theoretical
distributions which are based on assumptions about the data. Thus the distributions which are based on assumptions about the data. Thus the
@ -161,22 +165,25 @@ while we conserve all other statistical properties of the data.
statistically significant.} statistically significant.}
\end{figure} \end{figure}
A good example for the application of a permutaion test is the A good example for the application of a
statistical assessment of correlations. Given are measured pairs of \endeterm{Permutationstest}{permutaion test} is the statistical
data points $(x_i, y_i)$. By calculating the correlation coefficient assessment of \endeterm[correlation]{Korrelation}{correlations}. Given
we can quantify how strongly $y$ depends on $x$. The correlation are measured pairs of data points $(x_i, y_i)$. By calculating the
coefficient alone, however, does not tell whether the correlation is \endeterm[correlation!correlation
significantly different from a random correlation. The null hypothesis coefficient]{Korrelation!Korrelationskoeffizient}{correlation
for such a situation is that $y$ does not depend on $x$. In coefficient} we can quantify how strongly $y$ depends on $x$. The
order to perform a permutation test, we need to destroy the correlation coefficient alone, however, does not tell whether the
correlation by permuting the $(x_i, y_i)$ pairs, i.e. we rearrange the correlation is significantly different from a random correlation. The
$x_i$ and $y_i$ values in a random fashion. Generating many sets of \endeterm[]{Nullhypothese}{null hypothesis} for such a situation is that
random pairs and computing the resulting correlation coefficients $y$ does not depend on $x$. In order to perform a permutation test, we
yields a distribution of correlation coefficients that result need to destroy the correlation by permuting the $(x_i, y_i)$ pairs,
randomly from uncorrelated data. By comparing the actually measured i.e. we rearrange the $x_i$ and $y_i$ values in a random
correlation coefficient with this distribution we can directly assess fashion. Generating many sets of random pairs and computing the
the significance of the correlation resulting correlation coefficients yields a distribution of
(figure\,\ref{permutecorrelationfig}). correlation coefficients that result randomly from uncorrelated
data. By comparing the actually measured correlation coefficient with
this distribution we can directly assess the significance of the
correlation (figure\,\ref{permutecorrelationfig}).
\begin{exercise}{correlationsignificance.m}{correlationsignificance.out} \begin{exercise}{correlationsignificance.m}{correlationsignificance.out}
Estimate the statistical significance of a correlation coefficient. Estimate the statistical significance of a correlation coefficient.

View File

@ -212,9 +212,19 @@
%%%%% english, german, code and file terms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% english, german, code and file terms: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{ifthen} \usepackage{ifthen}
% \enterm[english index entry]{<english term>}
\newcommand{\enterm}[2][]{\tr{\textit{#2}}{``#2''}\ifthenelse{\equal{#1}{}}{\tr{\protect\sindex[term]{#2}}{\protect\sindex[enterm]{#2}}}{\tr{\protect\sindex[term]{#1}}{\protect\sindex[enterm]{#1}}}} \newcommand{\enterm}[2][]{\tr{\textit{#2}}{``#2''}\ifthenelse{\equal{#1}{}}{\tr{\protect\sindex[term]{#2}}{\protect\sindex[enterm]{#2}}}{\tr{\protect\sindex[term]{#1}}{\protect\sindex[enterm]{#1}}}}
% \endeterm[english index entry]{<german index entry>}{<english term>}
\newcommand{\endeterm}[3][]{\tr{\textit{#3}}{``#3''}\ifthenelse{\equal{#1}{}}{\tr{\protect\sindex[term]{#3}}{\protect\sindex[enterm]{#3}}}{\tr{\protect\sindex[term]{#1}}{\protect\sindex[enterm]{#1}}}\protect\sindex[determ]{#2}}
% \determ[index entry]{<german term>}
\newcommand{\determ}[2][]{\tr{``#2''}{\textit{#2}}\ifthenelse{\equal{#1}{}}{\tr{\protect\sindex[determ]{#2}}{\protect\sindex[term]{#2}}}{\tr{\protect\sindex[determ]{#1}}{\protect\sindex[term]{#1}}}} \newcommand{\determ}[2][]{\tr{``#2''}{\textit{#2}}\ifthenelse{\equal{#1}{}}{\tr{\protect\sindex[determ]{#2}}{\protect\sindex[term]{#2}}}{\tr{\protect\sindex[determ]{#1}}{\protect\sindex[term]{#1}}}}
% \codeterm[index entry]{<code>}
\newcommand{\codeterm}[2][]{\textit{#2}\ifthenelse{\equal{#1}{}}{\protect\sindex[term]{#2}}{\protect\sindex[term]{#1}}} \newcommand{\codeterm}[2][]{\textit{#2}\ifthenelse{\equal{#1}{}}{\protect\sindex[term]{#2}}{\protect\sindex[term]{#1}}}
\newcommand{\file}[1]{\texttt{#1}} \newcommand{\file}[1]{\texttt{#1}}
% for escaping special characters into the index: % for escaping special characters into the index:

View File

@ -455,7 +455,7 @@ bivariate or multivariate data sets where we have pairs or tuples of
data values (e.g. size and weight of elephants) we want to analyze data values (e.g. size and weight of elephants) we want to analyze
dependencies between the variables. dependencies between the variables.
The \enterm{correlation coefficient} The \enterm[correlation!correlation coefficient]{correlation coefficient}
\begin{equation} \begin{equation}
\label{correlationcoefficient} \label{correlationcoefficient}
r_{x,y} = \frac{Cov(x,y)}{\sigma_x \sigma_y} = \frac{\langle r_{x,y} = \frac{Cov(x,y)}{\sigma_x \sigma_y} = \frac{\langle
@ -465,7 +465,7 @@ The \enterm{correlation coefficient}
\end{equation} \end{equation}
quantifies linear relationships between two variables quantifies linear relationships between two variables
\matlabfun{corr()}. The correlation coefficient is the \matlabfun{corr()}. The correlation coefficient is the
\determ{covariance} normalized by the standard deviations of the \enterm{covariance} normalized by the standard deviations of the
single variables. Perfectly correlated variables result in a single variables. Perfectly correlated variables result in a
correlation coefficient of $+1$, anit-correlated or negatively correlation coefficient of $+1$, anit-correlated or negatively
correlated data in a correlation coefficient of $-1$ and un-correlated correlated data in a correlation coefficient of $-1$ and un-correlated