This repository has been archived on 2021-05-17. You can view files and clone it, but cannot push or open issues or pull requests.
scientificComputing/statistics-fabian/assignments/day3.tex

66 lines
2.3 KiB
TeX

\documentclass[addpoints,10pt]{exam}
\usepackage{url}
\usepackage{color}
\usepackage{hyperref}
\pagestyle{headandfoot}
\runningheadrule
\firstpageheadrule
\firstpageheader{Scientific Computing}{afternoon assignment day 02}{10/22/2014}
%\runningheader{Homework 01}{Page \thepage\ of \numpages}{23. October 2014}
\firstpagefooter{}{}{}
\runningfooter{}{}{}
\pointsinmargin
\bracketedpoints
%\printanswers
\shadedsolutions
\begin{document}
%%%%%%%%%%%%%%%%%%%%% Submission instructions %%%%%%%%%%%%%%%%%%%%%%%%%
\sffamily
%%%%%%%%%%%%%% Questions %%%%%%%%%%%%%%%%%%%%%%%%%
\begin{questions}
\question When the p-value is small, we reject the null
hypothesis. For example, if you want to test whether two means are
not equal, the null hypothesis is ``means are equal''. If e.g. $p\le
0.05$ then we take it as sufficient evidence that the null
hypothesis is not true. Therefore, we assume that the means are not
equal (which is what you want to show).
In this exercise we will look at what kind of p-values we expect if
the null hypothesis is true. In our example, this would be the case
if the true means of two datasets are actually equal.
\begin{parts}
\part Think about how you expect the p-values to behave in that
situation.
\part Simulate the situation in which the means are equal by
repeating the following at least $1000$ times:
\begin{enumerate}
\item Generate two arrays {\tt x} and {\tt y} with $10$ normally
(Gaussian) distributed random numbers using {\tt randn}. By
construction, the true means behind the random number are zero.
\item Perform a two sample t-test ({\tt ttest2}) on {\tt x} and
{\tt y}. Store the p-value.
\end{enumerate}
\part Plot a histogram of the $1000$ p-values. What do you think
is the distribution the p-values (i.e. if you repeated this
experiment many more times, how would the histogram look like)?
\part Given what you find, think about whether the following
strategy is statistically valid: You collect $10$ data points from
each group and perform a test. If the test is not significant, you
collect $10$ more and repeat the test. If the test tells you that
there is a significant difference you stop. Otherwise you repeat
the procedure until the test is significant.
\end{parts}
\end{questions}
\end{document}