178 lines
6.6 KiB
TeX
178 lines
6.6 KiB
TeX
\documentclass[12pt,a4paper,pdftex]{exam}
|
|
|
|
\newcommand{\exercisetopic}{Statistics}
|
|
\newcommand{\exercisenum}{7}
|
|
\newcommand{\exercisedate}{December 8th, 2020}
|
|
|
|
\input{../../exercisesheader}
|
|
|
|
\firstpagefooter{Prof. Dr. Jan Benda}{}{jan.benda@uni-tuebingen.de}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\begin{document}
|
|
|
|
\input{../../exercisestitle}
|
|
|
|
\ifprintanswers%
|
|
\else
|
|
|
|
\begin{itemize}
|
|
\item Convince yourself that each single line of your code really does
|
|
what it should do! Test it with small examples directly in the
|
|
command line.
|
|
\item Always try to break down your solution into small and meaningful
|
|
functions. As soon something similar is computed more than once you
|
|
should definitely put it into a function.
|
|
\item Initially test computationally expensive \code{for} loops, vectors,
|
|
matrices, etc. with small numbers of repetitions and/or
|
|
sizes. Once it is working use large repetitions and/or sizes for
|
|
getting a good statistics, i.e. smooth curves.
|
|
\item Use the help functions of \code{matlab} (\code{help command} or
|
|
\code{doc command}) and the internet to figure out how specific
|
|
\code{matlab} functions are used and what features they offer. In
|
|
addition, the internet offers a lot of material and suggestions for
|
|
any question you have regarding your code !
|
|
\item Work in groups! Nevertheless everybody should write down his/her own solution.
|
|
\item Please upload your solution to the exercises to ILIAS as a zip-archive with the name
|
|
``statistics\_\{last name\}\_\{first name\}.zip''.
|
|
\end{itemize}
|
|
|
|
\fi
|
|
|
|
|
|
\begin{questions}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\question \textbf{Read chapter 4 of the script on ``code style''!}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\question \qt{Probabilities of a die}
|
|
The computer can roll dice with more than 6 faces!
|
|
\begin{parts}
|
|
\part Simulate 10000 times rolling a die with eight faces by
|
|
generating integer random numbers $x_i = 1, 2, \ldots 8$ .
|
|
|
|
\part Compute the probability $P(5)$ of getting a five by counting the number of fives
|
|
occurring in the data set.
|
|
|
|
Does the result fit to your expectation?
|
|
|
|
Check the probabilities $P(x_i)$ of the other numbers.
|
|
|
|
Is the die a fair die?
|
|
|
|
\part Store the computed probabilities $P(x_i)$ in a vector and use
|
|
the \code{bar()} function for plotting the probabilities as a
|
|
function of the corresponding face values.
|
|
|
|
\part Compute a normalized histogram of the face values by means of
|
|
the \code{hist()} and \code{bar()} functions.
|
|
|
|
\part \extra Simulate a loaded die with the six showing up
|
|
three-times as often as the other numbers.
|
|
|
|
Compute a normalized histogram of the face values from rolling the loaded die 10000 times.
|
|
\end{parts}
|
|
\begin{solution}
|
|
\lstinputlisting{rollthedie.m}
|
|
\lstinputlisting{diehist.m}
|
|
\lstinputlisting{die1.m}
|
|
\includegraphics[width=1\textwidth]{die1}
|
|
\end{solution}
|
|
|
|
|
|
\continue
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\question \qt{Histogram of the normal distribution}
|
|
\vspace{-3ex}
|
|
\begin{parts}
|
|
\part Generate a data set $X = (x_1, x_2, ... x_n)$ of
|
|
$n=10000$ normally distributed random numbers with mean $\mu=0$ and
|
|
standard deviation $\sigma=1$ (\code{randn()} function).
|
|
|
|
\part Compute from this data set the probability $P(0\le x<0.5)$.
|
|
|
|
\part What happens to the probability of drawing a number from a
|
|
specific range (z.B. $P(0\le x<a)$), if this range gets smaller and
|
|
smaller, i.e. $a \to 0$?
|
|
|
|
Write a script that illustrates this by plotting $P(0\le x<a)$
|
|
as a function of $a$ (use $0 \le a \le 4$).
|
|
|
|
\part \label{manualpdf} Compute and plot the probability density of
|
|
the data set (the normalized histogram). First, define the positions
|
|
of the bins (width of 0.5) in a vector. Count in a \code{for} loop
|
|
for each bin die number of data values falling into the
|
|
bin. Finally, normalize the resulting histogram and plot it using
|
|
the \code{bar()} function.
|
|
|
|
\part \label{gaussianpdf} Draw into the same plot the normal
|
|
distribution
|
|
\[ p_g(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]
|
|
for a comparison.
|
|
|
|
\part Plot the probability density as in (\ref{manualpdf}) and
|
|
(\ref{gaussianpdf}), but this time by means of the \code{hist()} and
|
|
\code{bar()} functions.
|
|
\end{parts}
|
|
\begin{solution}
|
|
\lstinputlisting{normhist.m}
|
|
\includegraphics[width=1\textwidth]{normhist}
|
|
\end{solution}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\question \qt{Probabilities of a normal distribution}
|
|
Which fraction of a normally distributed data set is contained in ranges
|
|
that are symmetric around the mean?
|
|
\begin{parts}
|
|
\part Generate a data set $X = (x_1, x_2, ... x_n)$ of
|
|
$n=10000$ normally distributed numbers with mean $\mu=0$ and
|
|
standard deviation $\sigma=1$ (\code{randn() function}).
|
|
% \part Estimate and plot the probability density of this data set (normalized histogram).
|
|
% For a comparison plot the normal distribution
|
|
% \[ p_g(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]
|
|
% into the same plot.
|
|
|
|
\part \label{onesigma} How many data values are at maximum one standard deviation
|
|
away from the mean?\\
|
|
That is, how many data values $x_i$ have the value $-\sigma < x_i < +\sigma$?\\
|
|
Compute the probability $P_{\pm\sigma}$ to get a value in this range
|
|
by counting how many data points fall into this range.
|
|
|
|
\part \label{probintegral} Compute the probability of
|
|
$-\sigma < x_i < +\sigma$ by numerically integrating over the
|
|
probability density of the normal distribution
|
|
\[ P_{\pm\sigma}=\int_{x=\mu-\sigma}^{x=\mu+\sigma} p_g(x) \, dx \; .\]
|
|
First check whether
|
|
\[ \int_{-\infty}^{+\infty} p_g(x) \, dx = 1 \; . \]
|
|
Why is this the case?
|
|
|
|
\part What fraction of the data is contained in the intervals $\pm 2\sigma$
|
|
and $\pm 3\sigma$?
|
|
|
|
Compare the results with the corresponding integrals over the normal
|
|
distribution.
|
|
|
|
\part \label{givenfraction} Find out which intervals, that are
|
|
symmetric with respect to the mean, contain 50\,\%, 90\,\%, 95\,\% and 99\,\%
|
|
of the data by means of numeric integration of the normal
|
|
distribution.
|
|
|
|
% \part \extra Modify the code of questions \pref{onesigma} -- \pref{givenfraction} such
|
|
% that it works for data sets with arbitrary mean and arbitrary standard deviation.\\
|
|
% Check your code with different sets of random numbers.\\
|
|
% How do you generate random numbers of a given mean and standard
|
|
% deviation using the \code{randn()} function?
|
|
\end{parts}
|
|
\begin{solution}
|
|
\lstinputlisting{normprobs.m}
|
|
\end{solution}
|
|
|
|
|
|
\end{questions}
|
|
|
|
\end{document} |