[plotting] add exercise

2019-11-20 11:48:43 +01:00 · 2019-11-20 11:48:43 +01:00 · 0d1f201bd8
commit 0d1f201bd8
parent 2fbbf04e98
3 changed files with 76 additions and 140 deletions
--- a/plotting/exercises/exercises.tex
+++ b/plotting/exercises/exercises.tex
@ -2,7 +2,7 @@
 \usepackage[german]{babel}
 \usepackage{pslatex}
-\usepackage[mediumspace,mediumqspace,Gray]{SIunits}      % \ohm, \micro
+\usepackage[mediumspace,mediumqspace]{SIunits}      % \ohm, \micro
 \usepackage{xcolor}
 \usepackage{graphicx}
 \usepackage[breaklinks=true,bookmarks=true,bookmarksopen=true,pdfpagemode=UseNone,pdfstartview=FitH,colorlinks=true,citecolor=blue]{hyperref}
@ -15,9 +15,9 @@
 \else
 \newcommand{\stitle}{}
 \fi
-\header{{\bfseries\large Exercise 12\stitle}}{{\bfseries\large Maximum likelihood}}{{\bfseries\large January 7th, 2019}}
+\header{{\bfseries\large Exercise 7\stitle}}{{\bfseries\large Plotting}}{{\bfseries\large November 20, 2019}}
-\firstpagefooter{Prof. Dr. Jan Benda}{Phone: 29 74573}{Email:
+\firstpagefooter{Dr. Jan Grewe}{Phone: 29 74588}{Email:
-jan.benda@uni-tuebingen.de}
+jan.grewe@uni-tuebingen.de}
 \runningfooter{}{\thepage}{}
 \setlength{\baselineskip}{15pt}
@ -85,129 +85,49 @@ jan.benda@uni-tuebingen.de}
 \input{instructions}
 \begin{questions}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\question \qt{Maximum likelihood of the standard deviation}
+  \question \qt{Graphical display of behavioral data.}  In this task
-Let's compute the likelihood and the log-likelihood for the estimation
+  you will use the MATLAB plotting system to display different
-of the standard deviation.
+  aspects of experimental data.
 \begin{parts}
  \part Draw $n=50$ random numbers from a normal distribution with
  mean $\mu=3$ and standard deviation $\sigma=2$.
-  \part Plot the likelihood (computed as the product of probabilities)
+  In the accompanying zip file (``experiment.zip'') you find the
-  and the log-likelihood (sum of the logarithms of the probabilities)
+  results of an (hypothetical) behavioral experiment. Animals have
-  as a function of the standard deviation. Compare the position of the
+  been trained in a number of ``sessions'' in a two-alternative-forced
-  maxima with the standard deviation that you compute directly from
+  choice task. The results of each session and each subject are stored
-  the data.
+  in separate ``.mat'' files that are named according to the pattern
  \verb+Animal_{id}_Session_{id}.mat+. Each of these mat files contains
  three variables \verb+performance, perf_std, trials, tasks+ which list
  the subjects average performance, the standard deviation, the number
  of completed trials, and the tasks, respectively.
-  \part Increase $n$ to 1000. What happens to the likelihood, what
+  As a first step accustom yourself with the data structure and open a
-  happens to the log-likelihood? Why?
+  single file to understand what is stored.
 \end{parts}
 \begin{solution}
  \lstinputlisting{mlestd.m}
  \includegraphics[width=1\textwidth]{mlestd}\\
-  The more data the smaller the product of the probabilities ($\approx
+  Solve the assignments below in separate
-  p^n$ with $0 \le p < 1$) and the smaller the sum of the logarithms
+  functions. Create a script that controls the analysis. Try to
-  of the probabilities ($\approx n\log p$, note that $\log p < 0$).
+  use as little ``hardcode'' as possible.
-  The product eventually gets smaller than the precision of the
+  All plots must be properly labeled, use a font size of 10\,pt for
-  floating point numbers support. Therefore for $n=1000$ the products
+  axis labels and a font size of 8\,pt for tick-labels and
-  becomes zero. Using the logarithm avoids this numerical problem.
+  legends. Single-panel figures should have the paper size of 7.5 by
-\end{solution}
+  7.5\,cm, multi-panel figures, may use a width of 17.5\,cm and a height
  of 15\,cm. Save the figures in the pdf format.
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+  \begin{parts}
-\question \qt{Maximum-likelihood estimator of a line through the origin} 
+    \part{} Find out the ids of the used subjects (you will need this information later). Use the \verb+dir+ function to get the name of
-In the lecture we derived the following equation for an
+    a mat file.
-maximum-likelihood estimate of the slope $\theta$ of a straight line
+    \part{} Use the \verb+dir+ function to find out how many sessions have been recorded for
-through the origin fitted to $n$ pairs of data values $(x_i|y_i)$ with
+    each subject. Plot the results in a bar plot, adhere to the instructions above and save the figure.
-standard deviation $\sigma_i$:
+    \part{} Each animal has been tested in the same tasks. Create a figure which plots each subject's performance as a function of the session number. The figure is supposed to display the three tasks in three subplots. Within each subplot plot the average performance and the performance error.
-\[\theta = \frac{\sum_{i=1}^n \frac{x_i y_i}{\sigma_i^2}}{ \sum_{i=1}^n
+    \part{} Not every subject solved the same number of trials. Collect for each subject and each task the total number of trials and plot the data in form of a stacked bar plot. That is, the figure shows a stacked bar for each subject.
-  \frac{x_i^2}{\sigma_i^2}} \]
+  \end{parts}
-\begin{parts}
+%\begin{solution}
-  \part \label{mleslopefunc} Write a function that takes two vectors
+%  \lstinputlisting{mlestd.m}
-  $x$ and $y$ containing the data pairs and returns the slope,
+%  \includegraphics[width=1\textwidth]{mlestd}\\
-  computed according to this equation. For simplicity we assume
+% \end{solution}
  $\sigma_i=\sigma$ for all $1 \le i \le n$. How does this simplify
  the equation for the slope?
  \begin{solution}
    \lstinputlisting{mleslope.m}
  \end{solution}
  \part Write a script that generates data pairs that scatter around a
  line through the origin with a given slope. Use the function from
  \pref{mleslopefunc} to compute the slope from the generated data.
  Compare the computed slope with the true slope that has been used to
  generate the data. Plot the data togehther with the line from which
  the data were generated and the maximum-likelihood fit.
  \begin{solution}
    \lstinputlisting{mlepropfit.m}
    \includegraphics[width=1\textwidth]{mlepropfit}
  \end{solution}
  \part \label{mleslopecomp} Vary the number of data pairs, the slope,
  as well as the variance of the data points around the true
  line. Under which conditions is the maximum-likelihood estimation of
  the slope closer to the true slope?
  \part To answer \pref{mleslopecomp} more precisely, generate for
  each condition let's say 1000 data sets and plot a histogram of the
  estimated slopes. How does the histogram, its mean and standard
  deviation relate to the true slope?
 \end{parts}
 \begin{solution}
  \lstinputlisting{mlepropest.m}
  \includegraphics[width=1\textwidth]{mlepropest}\\
  The estimated slopes are centered around the true slope. The
  standard deviation of the estimated slopes gets smaller for larger
  $n$ and less noise in the data.
 \end{solution}
 \continue
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \question \qt{Maximum-likelihood-estimation of a probability-density function}
 Many probability-density functions have parameters that cannot be
 computed directly from the data, like, for example, the mean of
 normally-distributed data. Such parameter need to be estimated by
 means of the maximum-likelihood from the data.
 Let us demonstrate this approach by means of data that are drawn from a
 gamma distribution,
 \begin{parts}
  \part Find out which \code{matlab} function computes the
  probability-density function of the gamma distribution.
  \part \label{gammaplot} Use this function to plot the
  probability-density function of the gamma distribution for various
  values of the (positive) ``shape'' parameter. Wet set the ``scale''
  parameter to one.
  \part Find out which \code{matlab} function generates random numbers
  that are distributed according to a gamma distribution. Generate
  with this function 50 random numbers using one of the values of the
  ``shape'' parameter used in \pref{gammaplot}.
  \part Compute and plot a properly normalized histogram of these
  random numbers.
  \part Find out which \code{matlab} function fit a distribution to a
  vector of random numbers according to the maximum-likelihood method.
  How do you need to use this function in order to fit a gamma
  distribution to the data?
  \part Estimate with this function the parameter of the gamma
  distribution used to generate the data.
  \part Finally, plot the fitted gamma distribution on top of the
  normalized histogram of the data.
 \end{parts}
 \begin{solution}
  \lstinputlisting{mlepdffit.m}
  \includegraphics[width=1\textwidth]{mlepdffit}
 \end{solution}
 \end{questions}
-\end{document}
+\end{document}
--- a/plotting/exercises/instructions.tex
+++ b/plotting/exercises/instructions.tex
@ -5,6 +5,11 @@
 Neuroethology lab \hfill --- \hfill Institute for Neurobiology \hfill --- \hfill \includegraphics[width=0.28\textwidth]{UT_WBMW_Black_RGB} \\
 \end{center}
 The exercises are meant for self-monitoring and revision of the
 lecture. You should try to solve them on your own. In contrast
 to previous exercises, the solutions can not be saved in a single file. Combine the files into a
 single zip archive and submit it via ILIAS. Name the archive according
 to the pattern: ``plotting\_\{surname\}.zip''.
 % \ifprintanswers%
 % \else
--- a/plotting/exercises/plotting_exercise.py
+++ b/plotting/exercises/plotting_exercise.py
@ -1,8 +1,9 @@
 import numpy as np
 import matplotlib.pyplot as plt
 import scipy.io as scio
 import os
 from IPython import embed
 def boltzmann(x, y_max, slope, inflection):
    """
    The underlying Boltzmann function.
@ -34,41 +35,51 @@ class Animal(object):
        self.__responsiveness = responsiveness
    def simulate(self, session_count=10, trials=20, task_difficulties=[]):
        """
        :param task_difficulties gives a malus on the learning rate range 0 - 1
        """
        tasks = 1 if len(task_difficulties) == 0 else len(task_difficulties)
        if len(task_difficulties) == 0:
            task_difficulties = [0]
        avg_perf = np.zeros((session_count, tasks))
        err_perf = np.zeros((session_count, tasks))
-        trials_performed = np.zeros(session_count)
+        trials_performed = np.zeros((session_count, tasks))
        for i in range(session_count):
            for j in range(tasks):
-                base_performance = boltzmann(i, 1.0, self.__learning_rate/20, self.__delay)
+                learning_rate = self.__learning_rate * (1-task_difficulties[j])
-                penalty = base_performance * task_difficulties[j] * 0.5
+                base_performance = boltzmann(i, 1.0, learning_rate, self.__delay) * 0.5 + 0.5
-                base_perf = 50 + 50 * (base_performance - penalty)
+                noise = np.random.randn(trials) * (self.__volatility * (1-task_difficulties[j]))
                performances = np.random.rand(trials) < (base_performance + noise)
                trials_completed = np.random.rand(trials) < self.__responsiveness
-                performances = np.random.randn(trials) * self.__volatility * 100 + base_perf
+                trials_performed[i, j] = np.sum(trials_completed)
-                avg_perf[i, j] = np.mean(performances[trials_completed])
+                avg_perf[i, j] = np.sum(performances[trials_completed]) / trials_performed[i, j]
-                err_perf[i, j] = np.std(performances[trials_completed])
+                err_perf[i, j] = np.sqrt(trials_performed[i, j] * (avg_perf[i, j]/100) * (1 - avg_perf[i, j]))
-                trials_performed = np.sum(trials_completed)
+
        return avg_perf, err_perf, trials_performed
 def save_performance(avg_perf, err_perf, trials_completed, tasks, animal_id):
    result_folder="experiment"
    for i in range(avg_perf.shape[0]):
        performance = avg_perf[i, :]
        error = err_perf[i, :]
        trials = trials_completed[i, :]
        scio.savemat(os.path.join(result_folder, "Animal_%i_Session_%i.mat" % (animal_id, i+1)),
                     {"performance": performance, "perf_std": error, "trials": trials, "tasks": tasks})
 if __name__ == "__main__":
-
+    session_count = [25, 32, 40, 30]
-    session_count = 30
+    task_difficulties = [0, 0.75, 0.95]
    task_difficulties = [0, 0.3, 1.]
    delays = [5, 10, 12, 20]
-    learning_rates = np.array([5, 10, 2, 20])
+    learning_rates = np.array([0.25, 0.5, 1., 1.5])
-    volatilities = np.random.rand(4) * 0.5
+    volatilities = np.random.rand(4) * 0.25
-    responsivness = np.random.rand(4) * 0.5 + 0.5
+    responsivness = np.random.rand(4) * 0.25 + 0.75
    for i in range(len(delays)):
        d = delays[i]
        lr = learning_rates[i]
-        v = volatilities[i],
+        v = volatilities[i]
        r = responsivness[i]
        a = Animal(d, lr, v, r)
-        ap, ep, tp = a.simulate(session_count=session_count, task_difficulties=[0, 0.3, 0.6])
+        ap, ep, tp = a.simulate(session_count=session_count[i], task_difficulties=task_difficulties)
-        plt.plot(ap)
+        save_performance(ap, ep, tp, ['a', 'b', 'c'], i+1)
        embed()