q-values project

This commit is contained in:
Fabian Sinz 2014-10-31 14:07:52 +01:00
parent 2171efaee6
commit 61cb4445c7
3 changed files with 1443 additions and 0 deletions

View File

@ -0,0 +1,10 @@
latex:
pdflatex *.tex > /dev/null
pdflatex *.tex > /dev/null
clean:
rm -rf *.log *.aux *.zip *.out auto
rm -f `basename *.tex .tex`.pdf
zip: latex
zip `basename *.tex .tex`.zip *.pdf *.dat *.mat

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,81 @@
\documentclass[addpoints,10pt]{exam}
\usepackage{url}
\usepackage{color}
\usepackage{hyperref}
\pagestyle{headandfoot}
\runningheadrule
\firstpageheadrule
\firstpageheader{Scientific Computing}{Project Assignment}{11/05/2014
-- 11/06/2014}
%\runningheader{Homework 01}{Page \thepage\ of \numpages}{23. October 2014}
\firstpagefooter{}{}{}
\runningfooter{}{}{}
\pointsinmargin
\bracketedpoints
%\printanswers
%\shadedsolutions
\begin{document}
%%%%%%%%%%%%%%%%%%%%% Submission instructions %%%%%%%%%%%%%%%%%%%%%%%%%
\sffamily
% \begin{flushright}
% \gradetable[h][questions]
% \end{flushright}
\begin{center}
\input{../disclaimer.tex}
\end{center}
%%%%%%%%%%%%%% Questions %%%%%%%%%%%%%%%%%%%%%%%%%
\begin{questions}
\question The p-value corresponds to the probability
$$P(\mbox{result seems significant}| H_0 \mbox{is true}).$$
This means that if your significance threshold is $\alpha=0.05$ and
you accept all test with $p \le \alpha$ as significant, then $5\%$
of all cases in which $H_0$ was true (there was no effect) your test
will appear significant (false positive).
The problem with that is that you do not know for how many of the
tests $H_0$ is actually true. What you really would like to know is:
From all those tests that came out significant ($p\le\alpha$) how
many of them are false positives? This probability corresponds to
$$P(H_0 \mbox{is true}|\mbox{result seems significant})$$ and is
called {\em false discovery rate}. In general you cannot compute
it. However, if you have many p-values, then you can actually
estimate it. The corresponding ``p-value'' for the false discovery
rate is called ``q-value''.
In the paper
{\em Storey, J. D., \& Tibshirani, R. (2003). Statistical
significance for genomewide studies. Proceedings of the National
Academy of Sciences of the United States of America, 100,
94409445. doi:10.1073/pnas.1530509100}
you can find an algorithm how to compute q-values from p-values.
The attached data file {\tt p\_values.dat} contains p-values from
test of several neurons whether they respond to a certain stimulus
condition or not.
\begin{parts}
\part Plot a histogram of the p-values.
\part Read and understand the paper by Storey and
Tibshirani. Visualize their method at your histogram.
\part Implement their method and convert each p-value to a
q-value.
\part From running the script, estimate the proportion of neurons
that show a true effect (i.e. $P(H_A)$).
\end{parts}
\end{questions}
\end{document}