218 lines
8.3 KiB
TeX
Executable File
218 lines
8.3 KiB
TeX
Executable File
\documentclass[addpoints,10pt]{exam}
|
||
\usepackage{url}
|
||
\usepackage{color}
|
||
\usepackage{hyperref}
|
||
|
||
\pagestyle{headandfoot}
|
||
\runningheadrule
|
||
\firstpageheadrule
|
||
|
||
\firstpageheader{Essential Statistics}{Homework 01 due 10/29/2014 23:59 am}{23. October 2014}
|
||
\runningheader{Homework 01}{Page \thepage\ of \numpages}{23. October 2014}
|
||
\firstpagefooter{}{}{}
|
||
\runningfooter{}{}{}
|
||
\pointsinmargin
|
||
\bracketedpoints
|
||
|
||
%\printanswers
|
||
\shadedsolutions
|
||
|
||
|
||
\begin{document}
|
||
%%%%%%%%%%%%%%%%%%%%% Submission instructions %%%%%%%%%%%%%%%%%%%%%%%%%
|
||
\sffamily
|
||
\begin{flushright}
|
||
\gradetable[h][questions]
|
||
\end{flushright}
|
||
|
||
\begin{center}
|
||
\fbox{\parbox{0.985\linewidth}{ \small Please answer all questions
|
||
in an electronic file (.txt, .doc are ok, but we prefer .pdf) and
|
||
submit in ILIAS.
|
||
|
||
Use complete and correct sentences unless otherwise
|
||
noted. Please be succinct. Use your own words. Write down a
|
||
concise reasoning, not just the result. We expect you to do
|
||
exercises on your own, but you are encouraged to discuss the
|
||
exercises with your fellow students. If you blindly copy your
|
||
results from others, you miss out on a chance to learn something
|
||
new. Use all resources available to you, but always make sure
|
||
that you truly understand why you give the answer you give.
|
||
}}
|
||
\end{center}
|
||
|
||
%%%%%%%%%%%%%% Questions %%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
||
\begin{questions}
|
||
\question {\bf Reading assignment: Do not submit answers to this
|
||
question! }
|
||
|
||
Read chapter 1. up to 2.4 (including) of Samuels/Wittmer/Schaffner.
|
||
|
||
Pay special attention to the following questions.
|
||
\begin{enumerate}
|
||
\item What types of scientific evidence do the authors list? How
|
||
strong are these evidences?
|
||
\item What are the different types of data encountered in
|
||
statistical analysis?
|
||
\item What is a population? What is a random sample? What are
|
||
sampling errors? What are nonsampling errors?
|
||
\item What is a descriptive statistic?
|
||
\item What property do robust statistics have?
|
||
\end{enumerate}
|
||
|
||
\question Install python and a suitable editor on your computer.
|
||
\begin{parts}
|
||
\part For installing python, I recommend the anaconda
|
||
distribution: \url{http://continuum.io/downloads}. It does not
|
||
matter whether you install python 2.7 or 3.4. I will use python
|
||
3.4 syntax.
|
||
|
||
\part As editor I recommend either sublime text (for people new to
|
||
programming) or pycharm (for people with programming
|
||
experience). I do not recommend to use a text editor that comes
|
||
with your operating system (like word pad). Text processing
|
||
programs like Mircosoft Word or Libre-Office {\bf won't work at
|
||
all}. Programming needs a little more than just typing text and
|
||
you will make your life unnecessarily hard by using an editor not
|
||
suited for it.
|
||
\part Find out how to run a python program on your operating
|
||
system and how to install new python packages. Install the
|
||
packages {\tt pandas} and {\tt seaborn}.
|
||
\end{parts}
|
||
|
||
\question To publish scientific results, you will usually need
|
||
to use statistical methods. Some journals provide you with a brief
|
||
description of how they expect you to apply statistical methods. One
|
||
example can be found in the author guidelines of the journal
|
||
Nature
|
||
|
||
\begin{center}
|
||
\url{http://www.nature.com/neuro/pdf/sm_checklist.pdf}
|
||
\end{center}
|
||
|
||
Please read the ‘checklist’ and answer the following questions:
|
||
|
||
\begin{parts}
|
||
\part[2] Why is it important that statistical methods are applied
|
||
correctly?
|
||
|
||
\begin{solution}
|
||
When not applied correctly, the results of statistical methods
|
||
might not support your hypothesis and can lead to false
|
||
conclusions.
|
||
\end{solution}
|
||
|
||
\part[2] Name two common descriptive statistics and what you have
|
||
to specify for them in nature.
|
||
|
||
\begin{solution}
|
||
\begin{itemize}
|
||
\item A clearly defined number $n$ of data points should be
|
||
specified. If the sample is small, plot points instead of
|
||
using descriptive statistics. Errorbars should be clearly
|
||
defined.
|
||
\item measure of center: mean, median
|
||
\item measure of variability: standard deviation, range
|
||
\end{itemize}
|
||
\end{solution}
|
||
|
||
\part[3] Name one statistical test that you have heard of or
|
||
used. If you were to apply any of them, what would you have to
|
||
specify to follow the Nature guidelines?
|
||
|
||
\begin{solution}
|
||
{\bf Student's T-Test} for testing whether the mean of two
|
||
populations is the same
|
||
\begin{itemize}
|
||
\item a clearly defined $n$ for the test
|
||
\item a justification for the sample size used
|
||
\item a clear description of the statistical method: since the
|
||
t-test is very common, stating that a two independent sample
|
||
t-test was used should be sufficient.
|
||
\item Justify that the data meets the definition: the two
|
||
populations should be normally distributed with the same
|
||
variance; the data was sampled independently from the two
|
||
populations being compared.
|
||
\item Is the variance in the different groups different.
|
||
\item was it one-sided or two-sided
|
||
\end{itemize}
|
||
\end{solution}
|
||
|
||
\part[3] Why are you asked to justify each incidence in which
|
||
you exclude some of the data that you collected? What could be a
|
||
valid reason to exclude a data point?
|
||
|
||
\begin{solution}
|
||
Excluded data points might make a sample from a population not
|
||
representative anymore, and can therefore alter the outcome and
|
||
conclusions of a study. They might be excluded if there is a
|
||
good reason to believe that they are not part of the population
|
||
under investigation.
|
||
\end{solution}
|
||
|
||
\end{parts}
|
||
|
||
\question {\bf Robust statistics} In 1888, P. Topinard published
|
||
data on the brain weights of hundreds of French men and women. Here
|
||
are ten brain weights (in Gramm) of female brains from the dataset
|
||
\begin{center} [1125, 1027, 1112, 983, 1090, 1247, 1045, 983, 972, 1045]
|
||
\end{center}
|
||
|
||
Open a new file ``brain\_weight.py'' with you text editor to write
|
||
the following python program (please hand in the plots and the program).
|
||
\begin{parts}
|
||
\part[2] Create a list that contains the above brain weights.
|
||
\part[2] Create a new list that contains the following ten means:
|
||
Each mean is computed from the original list after removing one
|
||
element (hint use slicing and adding lists for that; we did this
|
||
in the lecture already). {\bf Warning:} I {\em do not} expect you
|
||
to use {\tt for}-loops. Only use them if you know them already. If
|
||
you do use them, be prepared to explain your code to me to get
|
||
credits for this task.
|
||
\part[2] Create yet another list that does the same, only for the
|
||
median.
|
||
\part[2] Make a boxplot with the different means and medias (like
|
||
in the lecture). To show the plot at the end of the program
|
||
you need to put a {\tt plt.show()} at the end of the program. If
|
||
you want to save the plot, put the command {\tt
|
||
plt.gcf().savefig('YOUR\_NAME\_homework01.pdf')} before that. Label
|
||
the y-axis by using the function {\tt plt.ylabel('FILL IN YOUR LABEL')}
|
||
\part[2] What can you observe and what does that tell you about
|
||
the robustness of the statistic?
|
||
\end{parts}
|
||
\begin{solution}
|
||
\begin{verbatim}
|
||
import matplotlib.pyplot as plt
|
||
import seaborn as sns
|
||
import numpy as np
|
||
|
||
sns.set_context("paper", font_scale=1.5, rc={"lines.linewidth": 2.5})
|
||
|
||
w = [1125, 1027, 1112, 983, 1090, 1247, 1045, 983, 972, 1045]
|
||
|
||
brain_means = [ np.mean(w[1:]), np.mean(w[:1] + w[2:]), np.mean(w[:2] + w[3:]), \
|
||
np.mean(w[:3] + w[4:]), np.mean(w[:4] + w[5:]), np.mean(w[:5] + w[6:]), \
|
||
np.mean(w[:6] + w[7:]), np.mean(w[:7] + w[8:]), np.mean(w[:8] + w[9:]),\
|
||
np.mean(w[:9]) ]
|
||
brain_medians = [ np.median(w[1:]), np.median(w[:1] + w[2:]), np.median(w[:2] + w[3:]), \
|
||
np.median(w[:3] + w[4:]), np.median(w[:4] + w[5:]), np.median(w[:5] + w[6:]), \
|
||
np.median(w[:6] + w[7:]), np.median(w[:7] + w[8:]), np.median(w[:8] + w[9:]),\
|
||
np.median(w[:9]) ]
|
||
|
||
sns.boxplot([brain_means, brain_medians], names=['means', 'medians'])
|
||
plt.ylabel('brain weight [g]')
|
||
plt.gcf().savefig('fabian_sinz_homework01.pdf')
|
||
plt.show()
|
||
\end{verbatim}
|
||
\end{solution}
|
||
|
||
|
||
\end{questions}
|
||
|
||
|
||
|
||
|
||
|
||
\end{document}
|