scientificComputing/projects/project_PCA_natural_images/pca_natural_images.tex

\documentclass[addpoints,10pt]{exam}
\usepackage{url}
\usepackage{color}
\usepackage{hyperref}

\pagestyle{headandfoot}
\runningheadrule
\firstpageheadrule

\firstpageheader{Essential Statistics}{Homework 01 due 10/29/2014 23:59 am}{23. October 2014}
\runningheader{Homework 01}{Page \thepage\ of \numpages}{23. October 2014}
\firstpagefooter{}{}{}
\runningfooter{}{}{}
\pointsinmargin
\bracketedpoints

%\printanswers
\shadedsolutions


\begin{document}
%%%%%%%%%%%%%%%%%%%%% Submission instructions %%%%%%%%%%%%%%%%%%%%%%%%%
\sffamily
\begin{flushright}
\gradetable[h][questions]
\end{flushright}

\begin{center}
  \fbox{\parbox{0.985\linewidth}{ \small Please answer all questions
      in an electronic file (.txt, .doc are ok, but we prefer .pdf) and
      submit in ILIAS.

      Use complete and correct sentences unless otherwise
      noted. Please be succinct. Use your own words. Write down a
      concise reasoning, not just the result.  We expect you to do
      exercises on your own, but you are encouraged to discuss the
      exercises with your fellow students. If you blindly copy your
      results from others, you miss out on a chance to learn something
      new. Use all resources available to you, but always make sure
      that you truly understand why you give the answer you give.
    }}
\end{center}

%%%%%%%%%%%%%% Questions %%%%%%%%%%%%%%%%%%%%%%%%%

\begin{questions}
  \question {\bf Reading assignment: Do not submit answers to this
    question! }

  Read chapter 1. up to 2.4 (including) of Samuels/Wittmer/Schaffner.

  Pay special attention to the following questions.
  \begin{enumerate}
  \item What types of scientific evidence do the authors list? How
    strong are these evidences?
  \item What are the different types of data encountered in
    statistical analysis?
  \item What is a population? What is a random sample? What are
    sampling errors? What are nonsampling errors?
  \item What is a descriptive statistic?
  \item What property do robust statistics have?
  \end{enumerate}

  \question Install python and a suitable editor on your computer.
  \begin{parts}
    \part For installing python, I recommend the anaconda
    distribution: \url{http://continuum.io/downloads}. It does not
    matter whether you install python 2.7 or 3.4. I will use python
    3.4 syntax.

    \part As editor I recommend either sublime text (for people new to
    programming) or pycharm (for people with programming
    experience). I do not recommend to use a text editor that comes
    with your operating system (like word pad). Text processing
    programs like Mircosoft Word or Libre-Office {\bf won't work at
      all}. Programming needs a little more than just typing text and
    you will make your life unnecessarily hard by using an editor not
    suited for it.
    \part Find out how to run a python program on your operating
    system and how to install new python packages. Install the
    packages {\tt pandas} and {\tt seaborn}.
  \end{parts}

  \question To publish scientific results, you will usually need
  to use statistical methods. Some journals provide you with a brief
  description of how they expect you to apply statistical methods. One
  example can be found in the author guidelines of the journal
  Nature

  \begin{center}
    \url{http://www.nature.com/neuro/pdf/sm_checklist.pdf}
  \end{center}

  Please read the ‘checklist’ and answer the following questions:

  \begin{parts}
    \part[2] Why is it important that statistical methods are applied
    correctly?

    \begin{solution}
      When not applied correctly, the results of statistical methods
      might not support your hypothesis and can lead to false
      conclusions.
    \end{solution}

    \part[2] Name two common descriptive statistics and what you have
    to specify for them in nature.

    \begin{solution}
      \begin{itemize}
      \item A clearly defined number $n$ of data points should be
        specified. If the sample is small, plot points instead of
        using descriptive statistics. Errorbars should be clearly
        defined.
      \item measure of center: mean, median
      \item measure of variability: standard deviation, range
      \end{itemize}
    \end{solution}

    \part[3] Name one statistical test that you have heard of or
    used. If you were to apply any of them, what would you have to
    specify to follow the Nature guidelines?

    \begin{solution}
      {\bf Student's T-Test} for testing whether the mean of two
      populations is the same
      \begin{itemize}
      \item a clearly defined $n$ for the test
      \item a justification for the sample size used
      \item a clear description of the statistical method: since the
        t-test is very common, stating that a two independent sample
        t-test was used should be sufficient.
      \item Justify that the data meets the definition: the two
        populations should be normally distributed with the same
        variance; the data was sampled independently from the two
        populations being compared.
      \item Is the variance in the different groups different.
      \item was it one-sided or two-sided
      \end{itemize}
    \end{solution}

    \part[3] Why are you asked to justify each incidence in which
    you exclude some of the data that you collected? What could be a
    valid reason to exclude a data point?

    \begin{solution}
      Excluded data points might make a sample from a population not
      representative anymore, and can therefore alter the outcome and
      conclusions of a study. They might be excluded if there is a
      good reason to believe that they are not part of the population
      under investigation.
    \end{solution}

  \end{parts}

  \question {\bf Robust statistics} In 1888, P. Topinard published
  data on the brain weights of hundreds of French men and women. Here
  are ten brain weights (in Gramm) of female brains from the dataset
  \begin{center} [1125, 1027, 1112, 983, 1090, 1247, 1045, 983, 972, 1045]
  \end{center}

  Open a new file ``brain\_weight.py'' with you text editor to write
  the following python program (please hand in the plots and the program).
  \begin{parts}
    \part[2] Create a list that contains the above brain weights.
    \part[2] Create a new list that contains the following ten means:
    Each mean is computed from the original list after removing one
    element (hint use slicing and adding lists for that; we did this
    in the lecture already). {\bf Warning:} I {\em do not} expect you
    to use {\tt for}-loops. Only use them if you know them already. If
    you do use them, be prepared to explain your code to me to get
    credits for this task.
    \part[2] Create yet another list that does the same, only for the
    median.
    \part[2] Make a boxplot with the different means and medias (like
    in the lecture). To show the plot at the end of the program
    you need to put a {\tt plt.show()} at the end of the program. If
    you want to save the plot, put the command {\tt
      plt.gcf().savefig('YOUR\_NAME\_homework01.pdf')} before that. Label
    the y-axis by using the function {\tt plt.ylabel('FILL IN YOUR LABEL')}
    \part[2] What can you observe and what does that tell you about
    the robustness of the statistic?
  \end{parts}
  \begin{solution}
    \begin{verbatim}
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

sns.set_context("paper", font_scale=1.5, rc={"lines.linewidth": 2.5})

w = [1125, 1027, 1112, 983, 1090, 1247, 1045, 983, 972, 1045]

brain_means = [ np.mean(w[1:]), np.mean(w[:1] + w[2:]), np.mean(w[:2] + w[3:]), \
                np.mean(w[:3] + w[4:]), np.mean(w[:4] + w[5:]), np.mean(w[:5] + w[6:]), \
                np.mean(w[:6] + w[7:]), np.mean(w[:7] + w[8:]), np.mean(w[:8] + w[9:]),\
                np.mean(w[:9]) ]
brain_medians = [ np.median(w[1:]), np.median(w[:1] + w[2:]), np.median(w[:2] + w[3:]), \
                np.median(w[:3] + w[4:]), np.median(w[:4] + w[5:]), np.median(w[:5] + w[6:]), \
                np.median(w[:6] + w[7:]), np.median(w[:7] + w[8:]), np.median(w[:8] + w[9:]),\
                np.median(w[:9]) ]

sns.boxplot([brain_means, brain_medians], names=['means', 'medians'])
plt.ylabel('brain weight [g]')
plt.gcf().savefig('fabian_sinz_homework01.pdf')
plt.show()
    \end{verbatim}
  \end{solution}


\end{questions}


\end{document}