diff --git a/plotting/exercises/exercises.tex b/plotting/exercises/exercises.tex index ca715f5..f66c61c 100644 --- a/plotting/exercises/exercises.tex +++ b/plotting/exercises/exercises.tex @@ -2,7 +2,7 @@ \usepackage[german]{babel} \usepackage{pslatex} -\usepackage[mediumspace,mediumqspace,Gray]{SIunits} % \ohm, \micro +\usepackage[mediumspace,mediumqspace]{SIunits} % \ohm, \micro \usepackage{xcolor} \usepackage{graphicx} \usepackage[breaklinks=true,bookmarks=true,bookmarksopen=true,pdfpagemode=UseNone,pdfstartview=FitH,colorlinks=true,citecolor=blue]{hyperref} @@ -15,9 +15,9 @@ \else \newcommand{\stitle}{} \fi -\header{{\bfseries\large Exercise 12\stitle}}{{\bfseries\large Maximum likelihood}}{{\bfseries\large January 7th, 2019}} -\firstpagefooter{Prof. Dr. Jan Benda}{Phone: 29 74573}{Email: -jan.benda@uni-tuebingen.de} +\header{{\bfseries\large Exercise 7\stitle}}{{\bfseries\large Plotting}}{{\bfseries\large November 20, 2019}} +\firstpagefooter{Dr. Jan Grewe}{Phone: 29 74588}{Email: +jan.grewe@uni-tuebingen.de} \runningfooter{}{\thepage}{} \setlength{\baselineskip}{15pt} @@ -85,129 +85,49 @@ jan.benda@uni-tuebingen.de} \input{instructions} - \begin{questions} - %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\question \qt{Maximum likelihood of the standard deviation} -Let's compute the likelihood and the log-likelihood for the estimation -of the standard deviation. -\begin{parts} - \part Draw $n=50$ random numbers from a normal distribution with - mean $\mu=3$ and standard deviation $\sigma=2$. - - \part Plot the likelihood (computed as the product of probabilities) - and the log-likelihood (sum of the logarithms of the probabilities) - as a function of the standard deviation. Compare the position of the - maxima with the standard deviation that you compute directly from - the data. - - \part Increase $n$ to 1000. What happens to the likelihood, what - happens to the log-likelihood? Why? -\end{parts} -\begin{solution} - \lstinputlisting{mlestd.m} - \includegraphics[width=1\textwidth]{mlestd}\\ - - The more data the smaller the product of the probabilities ($\approx - p^n$ with $0 \le p < 1$) and the smaller the sum of the logarithms - of the probabilities ($\approx n\log p$, note that $\log p < 0$). - - The product eventually gets smaller than the precision of the - floating point numbers support. Therefore for $n=1000$ the products - becomes zero. Using the logarithm avoids this numerical problem. -\end{solution} + \question \qt{Graphical display of behavioral data.} In this task + you will use the MATLAB plotting system to display different + aspects of experimental data. + + In the accompanying zip file (``experiment.zip'') you find the + results of an (hypothetical) behavioral experiment. Animals have + been trained in a number of ``sessions'' in a two-alternative-forced + choice task. The results of each session and each subject are stored + in separate ``.mat'' files that are named according to the pattern + \verb+Animal_{id}_Session_{id}.mat+. Each of these mat files contains + three variables \verb+performance, perf_std, trials, tasks+ which list + the subjects average performance, the standard deviation, the number + of completed trials, and the tasks, respectively. + + As a first step accustom yourself with the data structure and open a + single file to understand what is stored. + + Solve the assignments below in separate + functions. Create a script that controls the analysis. Try to + use as little ``hardcode'' as possible. + + All plots must be properly labeled, use a font size of 10\,pt for + axis labels and a font size of 8\,pt for tick-labels and + legends. Single-panel figures should have the paper size of 7.5 by + 7.5\,cm, multi-panel figures, may use a width of 17.5\,cm and a height + of 15\,cm. Save the figures in the pdf format. + + \begin{parts} + \part{} Find out the ids of the used subjects (you will need this information later). Use the \verb+dir+ function to get the name of + a mat file. + \part{} Use the \verb+dir+ function to find out how many sessions have been recorded for + each subject. Plot the results in a bar plot, adhere to the instructions above and save the figure. + \part{} Each animal has been tested in the same tasks. Create a figure which plots each subject's performance as a function of the session number. The figure is supposed to display the three tasks in three subplots. Within each subplot plot the average performance and the performance error. + \part{} Not every subject solved the same number of trials. Collect for each subject and each task the total number of trials and plot the data in form of a stacked bar plot. That is, the figure shows a stacked bar for each subject. + \end{parts} +%\begin{solution} +% \lstinputlisting{mlestd.m} +% \includegraphics[width=1\textwidth]{mlestd}\\ +% \end{solution} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\question \qt{Maximum-likelihood estimator of a line through the origin} -In the lecture we derived the following equation for an -maximum-likelihood estimate of the slope $\theta$ of a straight line -through the origin fitted to $n$ pairs of data values $(x_i|y_i)$ with -standard deviation $\sigma_i$: -\[\theta = \frac{\sum_{i=1}^n \frac{x_i y_i}{\sigma_i^2}}{ \sum_{i=1}^n - \frac{x_i^2}{\sigma_i^2}} \] -\begin{parts} - \part \label{mleslopefunc} Write a function that takes two vectors - $x$ and $y$ containing the data pairs and returns the slope, - computed according to this equation. For simplicity we assume - $\sigma_i=\sigma$ for all $1 \le i \le n$. How does this simplify - the equation for the slope? - \begin{solution} - \lstinputlisting{mleslope.m} - \end{solution} - - \part Write a script that generates data pairs that scatter around a - line through the origin with a given slope. Use the function from - \pref{mleslopefunc} to compute the slope from the generated data. - Compare the computed slope with the true slope that has been used to - generate the data. Plot the data togehther with the line from which - the data were generated and the maximum-likelihood fit. - \begin{solution} - \lstinputlisting{mlepropfit.m} - \includegraphics[width=1\textwidth]{mlepropfit} - \end{solution} - - \part \label{mleslopecomp} Vary the number of data pairs, the slope, - as well as the variance of the data points around the true - line. Under which conditions is the maximum-likelihood estimation of - the slope closer to the true slope? - - \part To answer \pref{mleslopecomp} more precisely, generate for - each condition let's say 1000 data sets and plot a histogram of the - estimated slopes. How does the histogram, its mean and standard - deviation relate to the true slope? -\end{parts} -\begin{solution} - \lstinputlisting{mlepropest.m} - \includegraphics[width=1\textwidth]{mlepropest}\\ - The estimated slopes are centered around the true slope. The - standard deviation of the estimated slopes gets smaller for larger - $n$ and less noise in the data. -\end{solution} - -\continue -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\question \qt{Maximum-likelihood-estimation of a probability-density function} -Many probability-density functions have parameters that cannot be -computed directly from the data, like, for example, the mean of -normally-distributed data. Such parameter need to be estimated by -means of the maximum-likelihood from the data. - -Let us demonstrate this approach by means of data that are drawn from a -gamma distribution, -\begin{parts} - \part Find out which \code{matlab} function computes the - probability-density function of the gamma distribution. - - \part \label{gammaplot} Use this function to plot the - probability-density function of the gamma distribution for various - values of the (positive) ``shape'' parameter. Wet set the ``scale'' - parameter to one. - - \part Find out which \code{matlab} function generates random numbers - that are distributed according to a gamma distribution. Generate - with this function 50 random numbers using one of the values of the - ``shape'' parameter used in \pref{gammaplot}. - - \part Compute and plot a properly normalized histogram of these - random numbers. - - \part Find out which \code{matlab} function fit a distribution to a - vector of random numbers according to the maximum-likelihood method. - How do you need to use this function in order to fit a gamma - distribution to the data? - - \part Estimate with this function the parameter of the gamma - distribution used to generate the data. - - \part Finally, plot the fitted gamma distribution on top of the - normalized histogram of the data. -\end{parts} -\begin{solution} - \lstinputlisting{mlepdffit.m} - \includegraphics[width=1\textwidth]{mlepdffit} -\end{solution} \end{questions} -\end{document} \ No newline at end of file +\end{document} diff --git a/plotting/exercises/instructions.tex b/plotting/exercises/instructions.tex index 7fe599d..37f737d 100644 --- a/plotting/exercises/instructions.tex +++ b/plotting/exercises/instructions.tex @@ -5,6 +5,11 @@ Neuroethology lab \hfill --- \hfill Institute for Neurobiology \hfill --- \hfill \includegraphics[width=0.28\textwidth]{UT_WBMW_Black_RGB} \\ \end{center} +The exercises are meant for self-monitoring and revision of the +lecture. You should try to solve them on your own. In contrast +to previous exercises, the solutions can not be saved in a single file. Combine the files into a +single zip archive and submit it via ILIAS. Name the archive according +to the pattern: ``plotting\_\{surname\}.zip''. % \ifprintanswers% % \else diff --git a/plotting/exercises/plotting_exercise.py b/plotting/exercises/plotting_exercise.py index c076834..2e56d32 100644 --- a/plotting/exercises/plotting_exercise.py +++ b/plotting/exercises/plotting_exercise.py @@ -1,8 +1,9 @@ import numpy as np -import matplotlib.pyplot as plt import scipy.io as scio +import os from IPython import embed + def boltzmann(x, y_max, slope, inflection): """ The underlying Boltzmann function. @@ -34,41 +35,51 @@ class Animal(object): self.__responsiveness = responsiveness def simulate(self, session_count=10, trials=20, task_difficulties=[]): + """ + :param task_difficulties gives a malus on the learning rate range 0 - 1 + """ tasks = 1 if len(task_difficulties) == 0 else len(task_difficulties) if len(task_difficulties) == 0: task_difficulties = [0] avg_perf = np.zeros((session_count, tasks)) err_perf = np.zeros((session_count, tasks)) - trials_performed = np.zeros(session_count) + trials_performed = np.zeros((session_count, tasks)) for i in range(session_count): for j in range(tasks): - base_performance = boltzmann(i, 1.0, self.__learning_rate/20, self.__delay) - penalty = base_performance * task_difficulties[j] * 0.5 - base_perf = 50 + 50 * (base_performance - penalty) + learning_rate = self.__learning_rate * (1-task_difficulties[j]) + base_performance = boltzmann(i, 1.0, learning_rate, self.__delay) * 0.5 + 0.5 + noise = np.random.randn(trials) * (self.__volatility * (1-task_difficulties[j])) + performances = np.random.rand(trials) < (base_performance + noise) trials_completed = np.random.rand(trials) < self.__responsiveness - performances = np.random.randn(trials) * self.__volatility * 100 + base_perf - avg_perf[i, j] = np.mean(performances[trials_completed]) - err_perf[i, j] = np.std(performances[trials_completed]) - trials_performed = np.sum(trials_completed) + trials_performed[i, j] = np.sum(trials_completed) + avg_perf[i, j] = np.sum(performances[trials_completed]) / trials_performed[i, j] + err_perf[i, j] = np.sqrt(trials_performed[i, j] * (avg_perf[i, j]/100) * (1 - avg_perf[i, j])) + return avg_perf, err_perf, trials_performed -if __name__ == "__main__": +def save_performance(avg_perf, err_perf, trials_completed, tasks, animal_id): + result_folder="experiment" + for i in range(avg_perf.shape[0]): + performance = avg_perf[i, :] + error = err_perf[i, :] + trials = trials_completed[i, :] + scio.savemat(os.path.join(result_folder, "Animal_%i_Session_%i.mat" % (animal_id, i+1)), + {"performance": performance, "perf_std": error, "trials": trials, "tasks": tasks}) - session_count = 30 - task_difficulties = [0, 0.3, 1.] +if __name__ == "__main__": + session_count = [25, 32, 40, 30] + task_difficulties = [0, 0.75, 0.95] delays = [5, 10, 12, 20] - learning_rates = np.array([5, 10, 2, 20]) - volatilities = np.random.rand(4) * 0.5 - responsivness = np.random.rand(4) * 0.5 + 0.5 - + learning_rates = np.array([0.25, 0.5, 1., 1.5]) + volatilities = np.random.rand(4) * 0.25 + responsivness = np.random.rand(4) * 0.25 + 0.75 for i in range(len(delays)): d = delays[i] lr = learning_rates[i] - v = volatilities[i], + v = volatilities[i] r = responsivness[i] a = Animal(d, lr, v, r) - ap, ep, tp = a.simulate(session_count=session_count, task_difficulties=[0, 0.3, 0.6]) - plt.plot(ap) - embed() \ No newline at end of file + ap, ep, tp = a.simulate(session_count=session_count[i], task_difficulties=task_difficulties) + save_performance(ap, ep, tp, ['a', 'b', 'c'], i+1)