scientificComputing/projects/project_q-values/qvalues.tex

\documentclass[addpoints,10pt]{exam}
\usepackage{url}
\usepackage{color}
\usepackage{hyperref}

\pagestyle{headandfoot}
\runningheadrule
\firstpageheadrule
\firstpageheader{Scientific Computing}{Project Assignment}{11/05/2014
  -- 11/06/2014}
%\runningheader{Homework 01}{Page \thepage\ of \numpages}{23. October 2014}
\firstpagefooter{}{}{}
\runningfooter{}{}{}
\pointsinmargin
\bracketedpoints

%\printanswers
%\shadedsolutions


\begin{document}
%%%%%%%%%%%%%%%%%%%%% Submission instructions %%%%%%%%%%%%%%%%%%%%%%%%%
\sffamily
% \begin{flushright}
% \gradetable[h][questions]
% \end{flushright}

\begin{center}
  \input{../disclaimer.tex}
\end{center}

%%%%%%%%%%%%%% Questions %%%%%%%%%%%%%%%%%%%%%%%%%

\begin{questions}
  \question The p-value corresponds to the probability
  $$P(\mbox{result seems significant}| H_0 \mbox{is true}).$$
  This means that if your significance threshold is $\alpha=0.05$ and
  you accept all test with $p \le \alpha$ as significant, then $5\%$
  of all cases in which $H_0$ was true (there was no effect) your test
  will appear significant (false positive).

  The problem with that is that you do not know for how many of the
  tests $H_0$ is actually true. What you really would like to know is:
  From all those tests that came out significant ($p\le\alpha$) how
  many of them are false positives? This probability corresponds to
  $$P(H_0 \mbox{is true}|\mbox{result seems significant})$$ and is
  called {\em false discovery rate}. In general you cannot compute
  it. However, if you have many p-values, then you can actually
  estimate it. The corresponding ``p-value'' for the false discovery
  rate is called ``q-value''.

  In the paper

  {\em Storey, J. D., \& Tibshirani, R. (2003). Statistical
    significance for genomewide studies. Proceedings of the National
    Academy of Sciences of the United States of America, 100,
    9440–9445. doi:10.1073/pnas.1530509100}

  you can find an algorithm how to compute q-values from p-values.

  The attached data file {\tt p\_values.dat} contains p-values from
  test of several neurons whether they respond to a certain stimulus
  condition or not.

  \begin{parts}
    \part Plot a histogram of the p-values.
    \part Read and understand the paper by Storey and
    Tibshirani. Visualize their method at your histogram.
    \part Implement their method and convert each p-value to a
    q-value.
    \part From running the script, estimate the proportion of neurons
    that show a true effect (i.e. $P(H_A)$).
  \end{parts}

\end{questions}


\end{document}