[likelihood] 1st version of translation
This commit is contained in:
parent
87f52022c9
commit
413ccf22b3
@ -17,7 +17,7 @@ x_n$ originating from the distribution.
|
|||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\section{Maximum Likelihood}
|
\section{Maximum Likelihood}
|
||||||
|
|
||||||
Let $p(x|\theta)$ (to be read as ``Probability(density) of $x$ given
|
Let $p(x|\theta)$ (to be read as ``probability(density) of $x$ given
|
||||||
$\theta$.'') the probability (density) distribution of $x$ given the
|
$\theta$.'') the probability (density) distribution of $x$ given the
|
||||||
parameters $\theta$. This could be the normal distribution
|
parameters $\theta$. This could be the normal distribution
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
@ -68,118 +68,119 @@ the likelihood (\enterm{log-likelihood}):
|
|||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Example: the arithmetic mean}
|
\subsection{Example: the arithmetic mean}
|
||||||
|
Suppose that the measurements $x_1, x_2, \ldots x_n$ originate from a
|
||||||
Wenn die Me{\ss}daten $x_1, x_2, \ldots x_n$ der Normalverteilung
|
normal distribution \eqnref{normpdfmean} and we consider the mean
|
||||||
\eqnref{normpdfmean} entstammen, und wir den Mittelwert $\mu=\theta$ als
|
$\mu=\theta$ as the only parameter. Which value of $\theta$ maximizes
|
||||||
einzigen Parameter der Verteilung betrachten, welcher Wert von
|
its likelihood?
|
||||||
$\theta$ maximiert dessen Likelihood?
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\includegraphics[width=1\textwidth]{mlemean}
|
\includegraphics[width=1\textwidth]{mlemean}
|
||||||
\titlecaption{\label{mlemeanfig} Maximum Likelihood Sch\"atzung des
|
\titlecaption{\label{mlemeanfig} Maximum likelihood estimation of
|
||||||
Mittelwerts.}{Oben: Die Daten zusammen mit drei m\"oglichen
|
the mean.}{Top: The measured data (blue dots) together with three
|
||||||
Normalverteilungen mit unterschiedlichen Mittelwerten (Pfeile) aus
|
different possible normal distributions with different means
|
||||||
denen die Daten stammen k\"onnten. Unten links: Die Likelihood
|
(arrows) that could be the origin of the data. Bootom left: the
|
||||||
in Abh\"angigkeit des Mittelwerts als Parameter der
|
likelihood as a function of $\theta$ i.e. the mean. It is maximal
|
||||||
Normalverteilungen. Unten rechts: die entsprechende
|
at a value of $\theta = 2$. Bottom right: the
|
||||||
Log-Likelihood. An der Position des Maximums bei $\theta=2$
|
log-likelihood. Taking the logarithm does not change the position
|
||||||
\"andert sich nichts (Pfeil).}
|
of the maximum.}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
The log-likelihood \eqnref{loglikelihood}
|
||||||
Die Log-Likelihood \eqnref{loglikelihood} ist
|
|
||||||
\begin{eqnarray*}
|
\begin{eqnarray*}
|
||||||
\log {\cal L}(\theta|x_1,x_2, \ldots x_n)
|
\log {\cal L}(\theta|x_1,x_2, \ldots x_n)
|
||||||
& = & \sum_{i=1}^n \log \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x_i-\theta)^2}{2\sigma^2}} \\
|
& = & \sum_{i=1}^n \log \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x_i-\theta)^2}{2\sigma^2}} \\
|
||||||
& = & \sum_{i=1}^n - \log \sqrt{2\pi \sigma^2} -\frac{(x_i-\theta)^2}{2\sigma^2} \; .
|
& = & \sum_{i=1}^n - \log \sqrt{2\pi \sigma^2} -\frac{(x_i-\theta)^2}{2\sigma^2} \; .
|
||||||
\end{eqnarray*}
|
\end{eqnarray*}
|
||||||
Der Logarithmus hat die sch\"one Eigenschaft, die Exponentialfunktion
|
% FIXME do we need parentheses around the normal distribution in line one?
|
||||||
der Normalverteilung auszul\"oschen, da der Logarithmus die
|
Since the logarithm is the inverse function of the exponential
|
||||||
Umkehrfunktion der Exponentialfunktion ist ($\log(e^x)=x$).
|
($\log(e^x)=x$), taking the logarithm removes the exponential from the
|
||||||
|
normal distribution. To calculate the maximum of the log-likelihood,
|
||||||
Zur Bestimmung des Maximums der Log-Likelihood berechnen wir deren Ableitung
|
we need to take the derivative with respect to $\theta$ and set it to
|
||||||
nach dem Parameter $\theta$ und setzen diese gleich Null:
|
zero:
|
||||||
\begin{eqnarray*}
|
\begin{eqnarray*}
|
||||||
\frac{\text{d}}{\text{d}\theta} \log {\cal L}(\theta|x_1,x_2, \ldots x_n) & = & \sum_{i=1}^n - \frac{2(x_i-\theta)}{2\sigma^2} \;\; = \;\; 0 \\
|
\frac{\text{d}}{\text{d}\theta} \log {\cal L}(\theta|x_1,x_2, \ldots x_n) & = & \sum_{i=1}^n - \frac{2(x_i-\theta)}{2\sigma^2} \;\; = \;\; 0 \\
|
||||||
\Leftrightarrow \quad \sum_{i=1}^n x_i - \sum_{i=1}^n \theta & = & 0 \\
|
\Leftrightarrow \quad \sum_{i=1}^n x_i - \sum_{i=1}^n \theta & = & 0 \\
|
||||||
\Leftrightarrow \quad n \theta & = & \sum_{i=1}^n x_i \\
|
\Leftrightarrow \quad n \theta & = & \sum_{i=1}^n x_i \\
|
||||||
\Leftrightarrow \quad \theta & = & \frac{1}{n} \sum_{i=1}^n x_i \;\; = \;\; \bar x
|
\Leftrightarrow \quad \theta & = & \frac{1}{n} \sum_{i=1}^n x_i \;\; = \;\; \bar x
|
||||||
\end{eqnarray*}
|
\end{eqnarray*}
|
||||||
Der Maximum-Likelihood-Sch\"atzer ist das arithmetische Mittel $\bar
|
From the above equations it becomes clear that the maximum likelihood
|
||||||
x$ der Daten. D.h. das arithmetische Mittel maximiert die
|
estimation is equivalent to the mean of the data. That is, the
|
||||||
Wahrscheinlichkeit, dass die Daten aus einer Normalverteilung mit
|
assuming the mean of the data as $\theta$ maximizes the likelihood
|
||||||
diesem Mittelwert gezogen worden sind (\figref{mlemeanfig}).
|
that the data originate from a normal distribution with that mean
|
||||||
|
(\figref{mlemeanfig}).
|
||||||
|
|
||||||
|
|
||||||
\begin{exercise}{mlemean.m}{mlemean.out}
|
\begin{exercise}{mlemean.m}{mlemean.out}
|
||||||
Ziehe $n=50$ normalverteilte Zufallsvariablen mit einem Mittelwert $\ne 0$
|
Draw $n=50$ random numbers from a normal distribution with a mean of
|
||||||
und einer Standardabweichung $\ne 1$.
|
$\ne 0$ and a standard deviation of $\ne 1$.
|
||||||
|
|
||||||
Plotte die Likelihood (aus dem Produkt der Wahrscheinlichkeiten) und
|
Plot the likelihood (the product of the probabilities) and the
|
||||||
die Log-Likelihood (aus der Summe der logarithmierten
|
log-likelihood (given by the sum of the logarithms of the
|
||||||
Wahrscheinlichkeiten) f\"ur den Mittelwert als Parameter. Vergleiche
|
probabilities) for the mean as parameter. Compare the position of
|
||||||
die Position der Maxima mit dem aus den Daten berechneten
|
the maxima with the mean calculated from the data.
|
||||||
Mittelwert.
|
|
||||||
\pagebreak[4]
|
\pagebreak[4]
|
||||||
\end{exercise}
|
\end{exercise}
|
||||||
|
|
||||||
|
|
||||||
\pagebreak[4]
|
\pagebreak[4]
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\section{Kurvenfit als Maximum-Likelihood Sch\"atzung}
|
\section{Curve fitting as using maximum-likelihood estimation}
|
||||||
Beim \determ{Kurvenfit} soll eine Funktion $f(x;\theta)$ mit den Parametern
|
|
||||||
$\theta$ an die Datenpaare $(x_i|y_i)$ durch Anpassung der Parameter
|
During curve fitting a function of the form $f(x;\theta)$ with the
|
||||||
$\theta$ gefittet werden. Wenn wir annehmen, dass die $y_i$ um die
|
parameter $\theta$ is adapted to the data pairs $(x_i|y_i)$ by
|
||||||
entsprechenden Funktionswerte $f(x_i;\theta)$ mit einer
|
adapting $\theta$. When we assume that the $y_i$ values are normally
|
||||||
Standardabweichung $\sigma_i$ normalverteilt streuen, dann lautet die
|
distributed around the function values $f(x_i;\theta)$ with a standard
|
||||||
Log-Likelihood
|
deviation $\sigma_i$, the log-likelihood is
|
||||||
|
|
||||||
\begin{eqnarray*}
|
\begin{eqnarray*}
|
||||||
\log {\cal L}(\theta|(x_1,y_1,\sigma_1), \ldots, (x_n,y_n,\sigma_n))
|
\log {\cal L}(\theta|(x_1,y_1,\sigma_1), \ldots, (x_n,y_n,\sigma_n))
|
||||||
& = & \sum_{i=1}^n \log \frac{1}{\sqrt{2\pi \sigma_i^2}}e^{-\frac{(y_i-f(x_i;\theta))^2}{2\sigma_i^2}} \\
|
& = & \sum_{i=1}^n \log \frac{1}{\sqrt{2\pi \sigma_i^2}}e^{-\frac{(y_i-f(x_i;\theta))^2}{2\sigma_i^2}} \\
|
||||||
& = & \sum_{i=1}^n - \log \sqrt{2\pi \sigma_i^2} -\frac{(y_i-f(x_i;\theta))^2}{2\sigma_i^2} \\
|
& = & \sum_{i=1}^n - \log \sqrt{2\pi \sigma_i^2} -\frac{(y_i-f(x_i;\theta))^2}{2\sigma_i^2} \\
|
||||||
\end{eqnarray*}
|
\end{eqnarray*}
|
||||||
Der einzige Unterschied zum vorherigen Beispiel ist, dass die
|
The only difference to the previous example is that the averages in
|
||||||
Mittelwerte der Normalverteilungen nun durch die Funktionswerte
|
the equations above are now given as the function values
|
||||||
gegeben sind.
|
$f(x_i;\theta)$
|
||||||
|
|
||||||
Der Parameter $\theta$ soll so gew\"ahlt werden, dass die
|
The parameter $\theta$ should be the one that maximizes the
|
||||||
Log-Likelihood maximal wird. Der erste Term der Summe ist
|
log-likelihood. The first part of the sum is independent of $\theta$
|
||||||
unabh\"angig von $\theta$ und kann deshalb bei der Suche nach dem
|
and can thus be ignored during the search of the maximum:
|
||||||
Maximum weggelassen werden:
|
|
||||||
\begin{eqnarray*}
|
\begin{eqnarray*}
|
||||||
& = & - \frac{1}{2} \sum_{i=1}^n \left( \frac{y_i-f(x_i;\theta)}{\sigma_i} \right)^2
|
& = & - \frac{1}{2} \sum_{i=1}^n \left( \frac{y_i-f(x_i;\theta)}{\sigma_i} \right)^2
|
||||||
\end{eqnarray*}
|
\end{eqnarray*}
|
||||||
Anstatt nach dem Maximum zu suchen, k\"onnen wir auch das Vorzeichen der Log-Likelihood
|
We can further simplify by inverting the sign and then search for the
|
||||||
umdrehen und nach dem Minimum suchen. Dabei k\"onnen wir auch den Faktor $1/2$ vor der Summe vernachl\"assigen --- auch das \"andert nichts an der Position des Minimums:
|
minimum. Also the $1/2$ factor can be ignored since it does not affect
|
||||||
|
the position of the minimum:
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{chisqmin}
|
\label{chisqmin}
|
||||||
\theta_{mle} = \text{argmin}_{\theta} \; \sum_{i=1}^n \left( \frac{y_i-f(x_i;\theta)}{\sigma_i} \right)^2 \;\; = \;\; \text{argmin}_{\theta} \; \chi^2
|
\theta_{mle} = \text{argmin}_{\theta} \; \sum_{i=1}^n \left( \frac{y_i-f(x_i;\theta)}{\sigma_i} \right)^2 \;\; = \;\; \text{argmin}_{\theta} \; \chi^2
|
||||||
\end{equation}
|
\end{equation}
|
||||||
Die Summe der quadratischen Abst\"ande normiert auf die jeweiligen
|
The sum of the squared differences when normalized to the standard
|
||||||
Standardabweichungen wird auch mit $\chi^2$ bezeichnet. Der Wert des
|
deviation is also called $\chi^2$. The parameter $\theta$ which
|
||||||
Parameters $\theta$, welcher den quadratischen Abstand minimiert, ist
|
minimizes the squared differences is thus the one that maximizes the
|
||||||
also identisch mit der Maximierung der Wahrscheinlichkeit, dass die
|
probability that the data actually originate from the given
|
||||||
Daten tats\"achlich aus der Funktion stammen k\"onnen. Minimierung des
|
function. Minimizing $\chi^2$ therefore is a maximum likelihood
|
||||||
$\chi^2$ ist also eine Maximum-Likelihood Sch\"atzung.
|
estimation.
|
||||||
|
|
||||||
An der Herleitung sehen wir aber auch, dass die Minimierung des
|
From the mathematical considerations above we can see that the
|
||||||
quadratischen Abstands nur dann eine Maximum-Likelihood Absch\"atzung
|
minimization of the squared difference is a maximum-likelihood
|
||||||
ist, wenn die Daten normalverteilt um die Funktion streuen. Bei
|
estimation only if the data are normally distributed around the
|
||||||
anderen Verteilungen m\"usste man die Log-Likelihood entsprechend
|
function. In case of other distributions, the log-likelihood needs to
|
||||||
\eqnref{loglikelihood} ausrechnen und maximieren.
|
be adapted accordingly \eqnref{loglikelihood} and be maximized
|
||||||
|
respectively.
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\includegraphics[width=1\textwidth]{mlepropline}
|
\includegraphics[width=1\textwidth]{mlepropline}
|
||||||
\titlecaption{\label{mleproplinefig} Maximum-Likelihood Sch\"atzung der
|
\titlecaption{\label{mleproplinefig} Maximum likelihood estimation
|
||||||
Steigung einer Ursprungsgeraden.}{}
|
of the slope of line through the origin.}{}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
|
|
||||||
\subsection{Beispiel: einfache Proportionalit\"at}
|
\subsection{Example: simple proportionality}
|
||||||
Als Funktion nehmen wir die Ursprungsgerade
|
The function of a line going through the origin
|
||||||
\[ f(x) = \theta x \]
|
\[ f(x) = \theta x \]
|
||||||
mit Steigung $\theta$. Die $\chi^2$-Summe lautet damit
|
with the slope $\theta$. The $\chi^2$-sum is thus
|
||||||
\[ \chi^2 = \sum_{i=1}^n \left( \frac{y_i-\theta x_i}{\sigma_i} \right)^2 \; . \]
|
\[ \chi^2 = \sum_{i=1}^n \left( \frac{y_i-\theta x_i}{\sigma_i} \right)^2 \; . \]
|
||||||
Zur Bestimmung des Minimums berechnen wir wieder die erste Ableitung nach $\theta$
|
To estimate the minimum we again take the first derivative with
|
||||||
und setzen diese gleich Null:
|
respect to $\theta$ and equate it to zero:
|
||||||
\begin{eqnarray}
|
\begin{eqnarray}
|
||||||
\frac{\text{d}}{\text{d}\theta}\chi^2 & = & \frac{\text{d}}{\text{d}\theta} \sum_{i=1}^n \left( \frac{y_i-\theta x_i}{\sigma_i} \right)^2 \nonumber \\
|
\frac{\text{d}}{\text{d}\theta}\chi^2 & = & \frac{\text{d}}{\text{d}\theta} \sum_{i=1}^n \left( \frac{y_i-\theta x_i}{\sigma_i} \right)^2 \nonumber \\
|
||||||
& = & \sum_{i=1}^n \frac{\text{d}}{\text{d}\theta} \left( \frac{y_i-\theta x_i}{\sigma_i} \right)^2 \nonumber \\
|
& = & \sum_{i=1}^n \frac{\text{d}}{\text{d}\theta} \left( \frac{y_i-\theta x_i}{\sigma_i} \right)^2 \nonumber \\
|
||||||
@ -188,123 +189,124 @@ und setzen diese gleich Null:
|
|||||||
\Leftrightarrow \quad \theta \sum_{i=1}^n \frac{x_i^2}{\sigma_i^2} & = & \sum_{i=1}^n \frac{x_iy_i}{\sigma_i^2} \nonumber \\
|
\Leftrightarrow \quad \theta \sum_{i=1}^n \frac{x_i^2}{\sigma_i^2} & = & \sum_{i=1}^n \frac{x_iy_i}{\sigma_i^2} \nonumber \\
|
||||||
\Leftrightarrow \quad \theta & = & \frac{\sum_{i=1}^n \frac{x_iy_i}{\sigma_i^2}}{ \sum_{i=1}^n \frac{x_i^2}{\sigma_i^2}} \label{mleslope}
|
\Leftrightarrow \quad \theta & = & \frac{\sum_{i=1}^n \frac{x_iy_i}{\sigma_i^2}}{ \sum_{i=1}^n \frac{x_i^2}{\sigma_i^2}} \label{mleslope}
|
||||||
\end{eqnarray}
|
\end{eqnarray}
|
||||||
Damit haben wir nun einen anlytischen Ausdruck f\"ur die Bestimmung
|
With this we obtained an analytical expression for the estimation of
|
||||||
der Steigung $\theta$ der Regressionsgeraden gewonnen
|
the slope $\theta$ of the regression line (\figref{mleproplinefig}).
|
||||||
(\figref{mleproplinefig}).
|
|
||||||
|
A gradient descent, as we have done in the previous chapter, is thus
|
||||||
Ein Gradientenabstieg ist f\"ur das Fitten der Geradensteigung also
|
not necessary for the fitting of the slope of a linear equation. This
|
||||||
gar nicht n\"otig. Das gilt allgemein f\"ur das Fitten von
|
holds even more generally for fitting the coefficients of linearly
|
||||||
Koeffizienten von linear kombinierten Basisfunktionen. Wie z.B.
|
combined basis functions as for example the fitting of the slope $m$
|
||||||
die Steigung $m$ und der y-Achsenabschnitt $b$ einer Geradengleichung
|
and the y-intercept $b$ of the linear equation
|
||||||
\[ y = m \cdot x +b \]
|
\[ y = m \cdot x +b \]
|
||||||
oder allgemeiner die Koeffizienten $a_k$ eines Polynoms
|
or, more generally, the coefficients $a_k$ of a polynom
|
||||||
\[ y = \sum_{k=0}^N a_k x^k = a_o + a_1x + a_2x^2 + a_3x^4 + \ldots \]
|
\[ y = \sum_{k=0}^N a_k x^k = a_o + a_1x + a_2x^2 + a_3x^4 + \ldots \]
|
||||||
\matlabfun{polyfit()}.
|
\matlabfun{polyfit()}.
|
||||||
|
|
||||||
Parameter, die nichtlinear in einer Funktion enthalten sind, k\"onnen
|
Parameters that are non-linearly combined can not be calculated
|
||||||
im Gegensatz dazu nicht analytisch aus den Daten berechnet
|
analytically. Consider for example the rate $\lambda$ of the
|
||||||
werden. z.B. die Rate $\lambda$ eines exponentiellen Zerfalls
|
exponential decay
|
||||||
\[ y = c \cdot e^{\lambda x} \quad , \quad c, \lambda \in \reZ \; . \]
|
\[ y = c \cdot e^{\lambda x} \quad , \quad c, \lambda \in \reZ \; . \]
|
||||||
F\"ur diesen Fall bleibt dann nur auf numerische Verfahren zur
|
Such cases require numerical solutions for the optimization of the
|
||||||
Optimierung der Kostenfunktion, wie z.B. der Gradientenabstieg,
|
cost function, e.g. the gradient descent \matlabfun{lsqcurvefit()}.
|
||||||
zur\"uckzugreifen \matlabfun{lsqcurvefit()}.
|
|
||||||
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\section{Fits von Wahrscheinlichkeitsverteilungen}
|
\section{Fits von Wahrscheinlichkeitsverteilungen}
|
||||||
Jetzt betrachten wir noch den Fall, bei dem wir die Parameter einer
|
Finally let's consider the case in which we want to fit the parameters
|
||||||
Wahrscheinlichkeitsdichtefunktion (z.B. den shape-Parameter einer
|
of a probability density function (e.g. the shape parameter of a
|
||||||
\determ{Gamma-Verteilung}) an ein Datenset fitten wollen.
|
\enterm{Gamma-distribution}) to a dataset.
|
||||||
|
|
||||||
Ein erster Gedanke k\"onnte sein, die
|
A first guess could be to fit the probability density by minimization
|
||||||
\determ[Wahrscheinlichkeitsdichte]{Wahrscheinlichkeitsdichtefunktion}
|
of the squared difference to a histogram of the measured data. For
|
||||||
durch Minimierung des quadratischen Abstands an ein Histogramm der
|
several reasons this is, however, not the method of choice: (i)
|
||||||
Daten zu fitten. Das ist aber aus folgenden Gr\"unden nicht die
|
probability densities can only be positive which leads, for small
|
||||||
Methode der Wahl: (i) Wahrscheinlichkeitsdichten k\"onnen nur positiv
|
values in particular, to asymmetric distributions. (ii) the values of
|
||||||
sein. Darum k\"onnen insbesondere bei kleinen Werten die Daten nicht
|
a histogram are not independent because the integral of a density is
|
||||||
symmetrisch streuen, wie es bei normalverteilten Daten der Fall
|
unity. The two basic assumptions of normally distributed and
|
||||||
ist. (ii) Die Datenwerte sind nicht unabh\"angig, da das normierte
|
independent samples, which are a prerequisite make the minimization of
|
||||||
Histogram sich zu Eins aufintegriert. Die beiden Annahmen
|
the squared difference \eqnref{chisqmin} to a maximum likelihood
|
||||||
normalverteilte und unabh\"angige Daten, die die Minimierung des
|
estimation, are violated. (iii) The histogram strongly depends on the
|
||||||
quadratischen Abstands \eqnref{chisqmin} zu einem Maximum-Likelihood
|
chosen bin size \figref{mlepdffig}).
|
||||||
Sch\"atzer machen, sind also verletzt. (iii) Das Histogramm h\"angt
|
|
||||||
von der Wahl der Klassenbreite ab (\figref{mlepdffig}).
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\includegraphics[width=1\textwidth]{mlepdf}
|
\includegraphics[width=1\textwidth]{mlepdf}
|
||||||
\titlecaption{\label{mlepdffig} Maximum-Likelihood Sch\"atzung einer
|
\titlecaption{\label{mlepdffig} Maximum likelihood estimation of a
|
||||||
Wahrscheinlichkeitsdichtefunktion.}{Links: die 100 Datenpunkte, die
|
probability density.}{Left: the 100 data points drawn from a 2nd
|
||||||
aus der Gammaverteilung 2. Ordnung (rot) gezogen worden sind. Der
|
order Gamma-distribution. The maximum likelihood estimation of the
|
||||||
Maximum-Likelihood-Fit ist orange dargestellt. Rechts: das
|
probability density function is shown in orange, the true pdf is
|
||||||
normierte Histogramm der Daten zusammen mit dem \"uber Minimierung
|
shown in red. Right: normalized histogram of the data together
|
||||||
des quadratischen Abstands zum Histogramm berechneten Fit.}
|
with the real (red) and the fitted probability density
|
||||||
|
functions. The fit was done by minimizing the squared difference
|
||||||
|
to the histogram.}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Den direkten Weg, eine Wahrscheinlichkeitsdichtefunktion an ein
|
|
||||||
Datenset zu fitten, haben wir oben schon bei dem Beispiel zur
|
Using the example of the estimating the mean value of a normal
|
||||||
Absch\"atzung des Mittelwertes einer Normalverteilung gesehen ---
|
distribution we have discussed the direct approach to fit a
|
||||||
Maximum Likelihood! Wir suchen einfach die Parameter $\theta$ der
|
probability density to data via maximum likelihood. We simply search
|
||||||
gesuchten Wahrscheinlichkeitsdichtefunktion bei der die Log-Likelihood
|
for the parameter $\theta$ of the desired probability density function
|
||||||
\eqnref{loglikelihood} maximal wird. Das ist im allgemeinen ein
|
that maximizes the log-likelihood. This is a non-linear optimization
|
||||||
nichtlinieares Optimierungsproblem, das mit numerischen Verfahren, wie
|
problem that is generally solved with numerical methods such as the
|
||||||
z.B. dem Gradientenabstieg, gel\"ost wird \matlabfun{mle()}.
|
gradient descent \matlabfun{mle()}.
|
||||||
|
|
||||||
\begin{exercise}{mlegammafit.m}{mlegammafit.out}
|
\begin{exercise}{mlegammafit.m}{mlegammafit.out}
|
||||||
Erzeuge Gammaverteilte Zufallszahlen und benutze Maximum-Likelihood,
|
Create a sample of gamma-distributed random number and apply the
|
||||||
um die Parameter der Gammafunktion aus den Daten zu bestimmen.
|
maximum likelihood method to estimate the parameters of the gamma
|
||||||
|
function from the data.
|
||||||
\pagebreak
|
\pagebreak
|
||||||
\end{exercise}
|
\end{exercise}
|
||||||
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\section{Neuronale Kodierung}
|
\section{Neural coding}
|
||||||
In sensorischen Systemen kodieren Populationen von Neuronen mit ihrer
|
In sensory systems certain aspects of the surrounding are encoded in
|
||||||
Aktivit\"at Eigenschaften von sensorischen Stimuli. z.B. im visuellen
|
the neuronal activity of populations of neurons. One example of such
|
||||||
Kortex V1 die Orientierung eines Balkens. Traditionell wird die
|
population coding is the tuning of neurons in the primary visual
|
||||||
Antwort der Neurone f\"ur verschiedene Stimuli (z.B. verschiedene
|
cortex (V1) to the orientation of a visual stimulus. Different neurons
|
||||||
Orientierungen des Balkens) gemessen. Die mittlere Antwort der Neurone
|
respond best to different stimulus orientations. Traditionally, such a
|
||||||
als Funktion eines Stimulusparameters ist dann die
|
tuning is measured by analyzing the neuronal response strength
|
||||||
\enterm{Tuning-curve} (deutsch \determ{Abstimmkurve}, z.B. Feuerrate
|
(e.g. the firing rate) as a function of the orientation of the visual
|
||||||
als Funktion des Orientierungswinkels).
|
stimulus and is depicted and summarized with the so called
|
||||||
|
\enterm{tuning-curve}(German \determ{Abstimmkurve},
|
||||||
|
figure~\ref{mlecoding}, top).
|
||||||
|
|
||||||
\begin{figure}[tp]
|
\begin{figure}[tp]
|
||||||
\includegraphics[width=1\textwidth]{mlecoding}
|
\includegraphics[width=1\textwidth]{mlecoding}
|
||||||
\titlecaption{\label{mlecodingfig} Maximum Likelihood Sch\"atzung
|
\titlecaption{\label{mlecodingfig} Maximum likelihood estimation of
|
||||||
eines Stimulusparameters aus neuronaler Aktivit\"at.}{Oben:
|
a stimulus parameter from neuronal activity.}{Top: Tuning curve of
|
||||||
Die Tuning-Kurve eines einzelnen Neurons in Abh\"angigkeit von der
|
an individual neuron as a function of the stimulus orientation (a
|
||||||
Orientierung eines Balkens. Der Stimulus der die st\"akste
|
dark bar in front of a white background). The stimulus that evokes
|
||||||
Aktivit\"at in diesem Neuron hervorruft ist ein senkrechter Balken
|
the strongest activity in that neuron is the bar with the vertical
|
||||||
(Pfeil, $\phi_i=90$\,\degree. Die rote Fl\"ache deutet die
|
orientation (arrow, $\phi_i=90$\,\degree). The red area indicates
|
||||||
Variabilit\"at $p(r)$ der Aktivit\"at $r$ um die Tuning-Kurve
|
the variability of the neuronal activity $p(r)$ around the tunig
|
||||||
herum an. Mitte: Jedes Neuron in der Population hat eine andere
|
curve. Center: In a population of neurons, each neuron may have a
|
||||||
bevorzugte Orientierung des Stimulus (farbige Linien). Ein
|
different tuning curve (colors). A specific stimulus (the vertical
|
||||||
Stimulus einer bestimmten Orientierung aktiviert die Neurone in
|
bar) activates the individual neurons of the population in a
|
||||||
spezifischer Weise (Punkte). Unten: Die Log-Likelihood dieser
|
specific way (dots). Bottom: The log-likelihood of the activity
|
||||||
Aktivit\"aten wird in der N\"ahe der wahren Orientierung
|
pattern will be maximized close to the real stimulus orientation.}
|
||||||
des Stimulus maximiert.}
|
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Das Gehirn ist aber mit dem umgekehrten Problem konfrontiert: gegeben
|
The brain, however, is confronted with the inverse problem: given a
|
||||||
eine bestimmte Aktivit\"at der Neurone in der Population, was war der
|
certain activity pattern in the neuronal population, what was the
|
||||||
Stimulus (die Orientierung des Balkens)? Eine m\"ogliche Antwort ist
|
stimulus? In the sense of maximum likelihood, a possible answer to
|
||||||
im Sinne von Maximum-Likelihood: es war der Stimulus f\"ur den das
|
this question would be: It was the stimulus for which the particular
|
||||||
Aktivit\"atsmuster am wahrscheinlichsten ist.
|
activity pattern is most likely.
|
||||||
|
|
||||||
Bleiben wir mit einem Beispiel bei den orientierungssensitiven Zellen
|
Let's stay with the example of the orientation tuning in V1. The
|
||||||
des V1. Das Tuning $\Omega_i(\phi)$ der Zellen $i$ auf ihre bevorzugte
|
tuning $\Omega_i(\phi)$ of the neurons $i$ to the preferred stimulus
|
||||||
Orientierung $\phi_i$ l\"asst sich gut mit einer van-Mises Funktion
|
orientation $\phi_i$ can be well described using a van-Mises function
|
||||||
(entspricht der Gaussfunktion auf einer zyklischen x-Achse)
|
(the Gaussian function on a cyclic x-axis) (\figref{mlecodingfig}):
|
||||||
beschreiben (\figref{mlecodingfig}):
|
|
||||||
\[ \Omega_i(\phi) = c \cdot e^{\cos(2(\phi-\phi_i))} \quad , \quad c
|
\[ \Omega_i(\phi) = c \cdot e^{\cos(2(\phi-\phi_i))} \quad , \quad c
|
||||||
\in \reZ \]
|
\in \reZ \]
|
||||||
Die Aktivit\"at der Neurone approximieren wir hier mit einer
|
The neuronal activity is approximated with a normal
|
||||||
Normalverteilung um die Tuning-Kurve mit Standardabweichung
|
distribution around the tuning curve with a standard deviation
|
||||||
$\sigma=\Omega/4$ proportional zu $\Omega$, so dass die
|
$\sigma=\Omega/4$ which is proprotional to $\Omega$ such that the
|
||||||
Wahrscheinlichkeit $p_i(r|\phi)$ des $i$-ten Neurons die Aktivit\"at $r$ zu
|
probability $p_i(r|\phi)$ of the $i$-th neuron showing the activity
|
||||||
haben, wenn ein Stimulus mit Orientierung $\phi$ anliegt, gegeben ist durch
|
$r$ given a certain orientation $\phi$ is given by
|
||||||
|
|
||||||
\[ p_i(r|\phi) = \frac{1}{\sqrt{2\pi}\Omega_i(\phi)/4} e^{-\frac{1}{2}\left(\frac{r-\Omega_i(\phi)}{\Omega_i(\phi)/4}\right)^2} \; . \]
|
\[ p_i(r|\phi) = \frac{1}{\sqrt{2\pi}\Omega_i(\phi)/4} e^{-\frac{1}{2}\left(\frac{r-\Omega_i(\phi)}{\Omega_i(\phi)/4}\right)^2} \; . \]
|
||||||
Die Log-Likelihood der Stimulusorientierung $\phi$ gegeben die
|
The log-likelihood of the stimulus orientation $\phi$ given the
|
||||||
Aktivit\"aten $r_1$, $r_2$, ... $r_n$ ist damit
|
activity pattern in the population $r_1$, $r_2$, ... $r_n$ is thus
|
||||||
\[ {\cal L}(\phi|r_1, r_2, \ldots r_n) = \sum_{i=1}^n \log p_i(r_i|\phi) \]
|
\[ {\cal L}(\phi|r_1, r_2, \ldots r_n) = \sum_{i=1}^n \log p_i(r_i|\phi) \]
|
||||||
|
|
||||||
\selectlanguage{english}
|
\selectlanguage{english}
|
||||||
|
Reference in New Issue
Block a user