[likelihood] finished exercises

2018-12-17 22:57:39 +01:00 · 2018-12-17 22:57:39 +01:00 · deed303596
commit deed303596
parent 18ca54e94d
3 changed files with 133 additions and 92 deletions
--- a/likelihood/exercises/exercises01.tex
+++ b/likelihood/exercises/exercises01.tex
@ -15,7 +15,7 @@
 \else
 \newcommand{\stitle}{}
 \fi
-\header{{\bfseries\large Exercise 12\stitle}}{{\bfseries\large Maximum Likelihood}}{{\bfseries\large January 7th, 2019}}
+\header{{\bfseries\large Exercise 12\stitle}}{{\bfseries\large Maximum likelihood}}{{\bfseries\large January 7th, 2019}}
 \firstpagefooter{Prof. Dr. Jan Benda}{Phone: 29 74573}{Email:
 jan.benda@uni-tuebingen.de}
 \runningfooter{}{\thepage}{}
@ -93,14 +93,14 @@ jan.benda@uni-tuebingen.de}
 Let's compute the likelihood and the log-likelihood for the estimation
 of the standard deviation.
 \begin{parts}
-  \part Draw $n=50$ normaly distributed random numbers with mean
+  \part Draw $n=50$ random numbers from a normal distribution with
-  $\mu=3$ and standard deviation $\sigma=2$.
+  mean $\mu=3$ and standard deviation $\sigma=2$.
  \part Plot the likelihood (computed as the product of probabilities)
  and the log-likelihood (sum of the logarithms of the probabilities)
-  using the standard deviation as the parameter we want to estimate
+  as a function of the standard deviation. Compare the position of the
-  from the data. Compare the position of the maxima with the standard
+  maxima with the standard deviation that you compute directly from
-  deviation that you can compute from the data.
+  the data.
  \part Increase $n$ to 1000. What happens to the likelihood, what
  happens to the log-likelihood? Why?
@ -111,75 +111,86 @@ of the standard deviation.
 \end{solution}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\question \qt{Maximum-Likelihood-Sch\"atzer einer Ursprungsgeraden} 
+\question \qt{Maximum-likelihood estimator of a line through the origin} 
-In der Vorlesung haben wir folgende Formel f\"ur die Maximum-Likelihood
+In the lecture we derived the following equation for an
-Absch\"atzung der Steigung $\theta$ einer Ursprungsgeraden durch $n$ Datenpunkte $(x_i|y_i)$ mit Standardabweichung $\sigma_i$ hergeleitet:
+maximum-likelihood estimate of the slope $\theta$ of a straight line
-\[\theta = \frac{\sum_{i=1}^n \frac{x_iy_i}{\sigma_i^2}}{ \sum_{i=1}^n
+through the origin fitted to $n$ pairs of data values $(x_i|y_i)$ with
 standard deviation $\sigma_i$:
 \[\theta = \frac{\sum_{i=1}^n \frac{x_i y_i}{\sigma_i^2}}{ \sum_{i=1}^n
  \frac{x_i^2}{\sigma_i^2}} \]
 \begin{parts}
-  \part \label{mleslopefunc} Schreibe eine Funktion, die in einem $x$ und einem
+  \part \label{mleslopefunc} Write a function that takes two vectors
-  $y$ Vektor die Datenpaare \"uberreicht bekommt und die Steigung der
+  $x$ and $y$ containing the data pairs and returns the slope,
-  Ursprungsgeraden, die die Likelihood maximiert, zur\"uckgibt
+  computed according to this equation. For simplicity we assume
-  ($\sigma=\text{const}$).
+  $\sigma_i=\sigma_j=\sigma$ for all $1 \le i \le n$ and $1 \le j \le
-
+  n$. How does this simplify the equation for the slope?
-  \part
+  \begin{solution}
-  Schreibe ein Skript, das Datenpaare erzeugt, die um eine
+    \lstinputlisting{mleslope.m}
-  Ursprungsgerade mit vorgegebener Steigung streuen. Berechne mit der
+  \end{solution}
-  Funktion aus \pref{mleslopefunc} die Steigung aus den Daten,
+
-  vergleiche mit der wahren Steigung, und plotte die urspr\"ungliche
+  \part Write a script that generates data pairs that scatter around a
-  sowie die gefittete Gerade zusammen mit den Daten.
+  line through the origin with a given slope. Use the function from
-
+  \pref{mleslopefunc} to compute the slope from the generated data.
-  \part
+  Compare the computed slope with the true slope that has been used to
-  Ver\"andere die Anzahl der Datenpunkte, die Steigung, sowie die
+  generate the data. Plot the data togehther with the line from which
-  Streuung der Daten um die Gerade.
+  the data were generated and the maximum-likelihood fit.
  \begin{solution}
    \lstinputlisting{mlepropfit.m}
    \includegraphics[width=1\textwidth]{mlepropfit}
  \end{solution}
  \part \label{mleslopecomp} Vary the number of data pairs, the slope,
  as well as the variance of the data points around the true
  line. Under which conditions is the maximum-likelihood estimation of
  the slope closer to the true slope?
  \part To answer \pref{mleslopecomp} more precisely, generate for
  each condition let's say 1000 data sets and plot a histogram of the
  estimated slopes. How does the histogram, its mean and standard
  deviation relate to the true slope?
 \end{parts}
 \begin{solution}
-  \lstinputlisting{mleslope.m}
+  \lstinputlisting{mlepropest.m}
-  \lstinputlisting{mlepropfit.m}
+  \includegraphics[width=1\textwidth]{mlepropest}
  \includegraphics[width=1\textwidth]{mlepropfit}
 \end{solution}
-
+\continue
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\question \qt{Maximum-Likelihood-Sch\"atzer einer Wahrscheinlichkeitsdichtefunktion} 
+\question \qt{Maximum-likelihood-estimation of a probability-density function}
-Verschiedene Wahrscheinlichkeitsdichtefunktionen haben Parameter, die
+Many probability-density functions have parameters that cannot be
-nicht so einfach wie der Mittelwert und die Standardabweichung einer
+computed directly from the data, like, for example, the mean of
-Normalverteilung direkt aus den Daten berechnet werden k\"onnen. Solche Parameter
+normally-distributed data. Such parameter need to be estimated by
-m\"ussen dann aus den Daten mit der Maximum-Likelihood-Methode gefittet werden.
+means of the maximum-likelihood from the data.
-
+
-Um dies zu veranschaulichen ziehen wir uns diesmal nicht normalverteilte Zufallszahlen, sondern Zufallszahlen aus der Gamma-Verteilung.
+Let us demonstrate this approach by means of data that are drawn from a
 gamma distribution,
 \begin{parts}
-  \part
+  \part Find out which \code{matlab} function computes the
-  Finde heraus welche \code{matlab} Funktion die
+  probability-density function of the gamma distribution.
-  Wahrscheinlichkeitsdichtefunktion (probability density function) der
+
-  Gamma-Verteilung berechnet.
+  \part \label{gammaplot} Use this function to plot the
-
+  probability-density function of the gamma distribution for various
-  \part
+  values of the (positive) ``shape'' parameter. Wet set the ``scale''
-  Plotte mit Hilfe dieser Funktion die  Wahrscheinlichkeitsdichtefunktion
+  parameter to one.
-  der Gamma-Verteilung f\"ur verschiedene Werte des (positiven) ``shape'' Parameters.
+
-  Den ``scale'' Parameter setzen wir auf Eins.
+  \part Find out which \code{matlab} function generates random numbers
-
+  that are distributed according to a gamma distribution. Generate
-  \part
+  with this function 50 random numbers using one of the values of the
-  Finde heraus mit welcher Funktion Gammaverteilte Zufallszahlen in
+  ``shape'' parameter used in \pref{gammaplot}.
-  \code{matlab} gezogen werden k\"onnen. Erzeuge mit dieser Funktion
+
-  50 Zufallszahlen mit einem der oben geplotteten ``shape'' Parameter.
+  \part Compute and plot a properly normalized histogram of these
-
+  random numbers.
-  \part
+
-  Berechne und plotte ein normiertes Histogramm dieser Zufallszahlen.
+  \part Find out which \code{matlab} function fit a distribution to a
-
+  vector of random numbers according to the maximum-likelihood method.
-  \part
+  How do you need to use this function in order to fit a gamma
-  Finde heraus mit welcher \code{matlab}-Funktion eine beliebige
+  distribution to the data?
-  Verteilung (``distribution'') an die Zufallszahlen nach der
+
-  Maximum-Likelihood Methode gefittet werden kann. Wie wird diese
+  \part Estimate with this function the parameter of the gamma
-  Funktion benutzt, um die Gammaverteilung an die Daten zu fitten?
+  distribution used to generate the data.
-
+
-  \part
+  \part Finally, plot the fitted gamma distribution on top of the
-  Bestimme mit dieser Funktion die Parameter der Gammaverteilung aus
+  normalized histogram of the data.
  den Zufallszahlen.
  \part
  Plotte anschlie{\ss}end die Gammaverteilung mit den gefitteten
  Parametern.
 \end{parts}
 \begin{solution}
  \lstinputlisting{mlepdffit.m}
--- a/likelihood/exercises/mlepropest.m
+++ b/likelihood/exercises/mlepropest.m
@ -0,0 +1,25 @@
 m = 2.0;               % slope
 sigmas = [0.1, 1.0];   % standard deviations
 ns = [100, 1000];      % number of data pairs
 trials = 1000;         % number of data sets
 for i = 1:length(sigmas)
  sigma = sigmas(i);
  for j = 1:length(ns)
    n = ns(j);
    slopes = zeros(trials, 1);
    for k=1:trials
      % data pairs:
      x = 5.0*rand(n, 1);
      y = m*x + sigma*randn(n, 1);
      % fit:
      slopes(k) = mleslope(x, y);
    end
    subplot(2, 2, 2*(i-1)+j);
    bins = [1.9:0.005:2.1];
    hist(slopes, bins);
    title(sprintf('sigma=%g, n=%d', sigma, n));
  end
 end
 savefigpdf(gcf, 'mlepropest.pdf', 12, 7);
--- a/likelihood/exercises/mlestd.m
+++ b/likelihood/exercises/mlestd.m
@ -1,30 +1,35 @@
 % draw random numbers:
 n = 50;
 mu = 3.0;
 sigma =2.0;
-x = randn(n,1)*sigma+mu;
+ns = [50, 1000];
-fprintf('              mean of the data is %.2f\n', mean(x))
+for k = 1:length(ns)
-fprintf('standard deviation of the data is %.2f\n', std(x))
+    n = ns(k);
    % draw random numbers:
    x = randn(n,1)*sigma+mu;
    fprintf('              mean of the data is %.2f\n', mean(x))
    fprintf('standard deviation of the data is %.2f\n', std(x))
-% standard deviation as parameter:
+    % standard deviation as parameter:
-psigs = 1.0:0.01:3.0;
+    psigs = 1.0:0.01:3.0;
-% matrix with the probabilities for each x and psigs:
+    % matrix with the probabilities for each x and psigs:
-lms = zeros(length(x), length(psigs));
+    lms = zeros(length(x), length(psigs));
-for i=1:length(psigs)
+    for i=1:length(psigs)
-    psig = psigs(i);
+        psig = psigs(i);
-    p = exp(-0.5*((x-mu)/psig).^2.0)/sqrt(2.0*pi)/psig;
+        p = exp(-0.5*((x-mu)/psig).^2.0)/sqrt(2.0*pi)/psig;
-    lms(:,i) = p;
+        lms(:,i) = p;
-end
+    end
-lm = prod(lms, 1);          % likelihood
+    lm = prod(lms, 1);          % likelihood
-loglm = sum(log(lms), 1);   % log likelihood
+    loglm = sum(log(lms), 1);   % log likelihood
-% plot likelihood of standard deviation:
+    % plot likelihood of standard deviation:
-subplot(1, 2, 1);
+    subplot(2, 2, 2*k-1);
-plot(psigs, lm );
+    plot(psigs, lm );
-xlabel('standard deviation')
+    title(sprintf('likelihood n=%d', n));
-ylabel('likelihood')
+    xlabel('standard deviation')
-subplot(1, 2, 2);
+    ylabel('likelihood')
-plot(psigs, loglm);
+    subplot(2, 2, 2*k);
-xlabel('standard deviation')
+    plot(psigs, loglm);
-ylabel('log likelihood')
+    title(sprintf('log-likelihood n=%d', n));
    xlabel('standard deviation')
    ylabel('log likelihood')
 end
 savefigpdf(gcf, 'mlestd.pdf', 15, 5);