fixed many index entries
This commit is contained in:
@@ -33,7 +33,7 @@ fitting approaches. We will apply this method to find the combination
|
||||
of slope and intercept that best describes the system.
|
||||
|
||||
|
||||
\section{The error function --- mean square error}
|
||||
\section{The error function --- mean squared error}
|
||||
|
||||
Before the optimization can be done we need to specify what is
|
||||
considered an optimal fit. In our example we search the parameter
|
||||
@@ -57,25 +57,23 @@ $\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
|
||||
all deviations are indeed small no matter if they are above or below
|
||||
the prediced line. Instead of the sum we could also ask for the
|
||||
\emph{average}
|
||||
|
||||
\begin{equation}
|
||||
\label{meanabserror}
|
||||
f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
|
||||
\end{equation}
|
||||
should be small. Commonly, the \enterm{mean squared distance} oder
|
||||
\enterm{mean squared error}
|
||||
\enterm[square error!mean]{mean square error} (\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer Fehler})
|
||||
\begin{equation}
|
||||
\label{meansquarederror}
|
||||
f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
|
||||
\end{equation}
|
||||
|
||||
is used (\figref{leastsquareerrorfig}). Similar to the absolute
|
||||
distance, the square of the error($(y_i - y_i^{est})^2$) is always
|
||||
positive error values do not cancel out. The square further punishes
|
||||
large deviations.
|
||||
|
||||
\begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
|
||||
Implement a function \code{meanSquareError()}, that calculates the
|
||||
Implement a function \varcode{meanSquareError()}, that calculates the
|
||||
\emph{mean square distance} between a vector of observations ($y$)
|
||||
and respective predictions ($y^{est}$).
|
||||
\end{exercise}
|
||||
@@ -84,18 +82,19 @@ large deviations.
|
||||
\section{\tr{Objective function}{Zielfunktion}}
|
||||
|
||||
$f_{cost}(\{(x_i, y_i)\}|\{y^{est}_i\})$ is a so called
|
||||
\enterm{objective function} or \enterm{cost function}. We aim to adapt
|
||||
the model parameters to minimize the error (mean square error) and
|
||||
thus the \emph{objective function}. In Chapter~\ref{maximumlikelihoodchapter}
|
||||
we will show that the minimization of the mean square error is
|
||||
equivalent to maximizing the likelihood that the observations
|
||||
originate from the model (assuming a normal distribution of the data
|
||||
around the model prediction).
|
||||
\enterm{objective function} or \enterm{cost function}
|
||||
(\determ{Kostenfunktion}). We aim to adapt the model parameters to
|
||||
minimize the error (mean square error) and thus the \emph{objective
|
||||
function}. In Chapter~\ref{maximumlikelihoodchapter} we will show
|
||||
that the minimization of the mean square error is equivalent to
|
||||
maximizing the likelihood that the observations originate from the
|
||||
model (assuming a normal distribution of the data around the model
|
||||
prediction).
|
||||
|
||||
\begin{figure}[t]
|
||||
\includegraphics[width=1\textwidth]{linear_least_squares}
|
||||
\titlecaption{Estimating the \emph{mean square error}.} {The
|
||||
deviation (\enterm{error}, orange) between the prediction (red
|
||||
deviation error, orange) between the prediction (red
|
||||
line) and the observations (blue dots) is calculated for each data
|
||||
point (left). Then the deviations are squared and the aveage is
|
||||
calculated (right).}
|
||||
@@ -119,11 +118,13 @@ Replacing $y^{est}$ with the linear equation (the model) in
|
||||
|
||||
That is, the mean square error is given the pairs $(x_i, y_i)$ and the
|
||||
parameters $m$ and $b$ of the linear equation. The optimization
|
||||
process will not try to optimize $m$ and $b$ to lead to the smallest
|
||||
error, the method of the \enterm{least square error}.
|
||||
process tries to optimize $m$ and $b$ such that the error is
|
||||
minimized, the method of the \enterm[square error!least]{least square
|
||||
error} (\determ[quadratischer Fehler!kleinster]{Methode der
|
||||
kleinsten Quadrate}).
|
||||
|
||||
\begin{exercise}{lsqError.m}{}
|
||||
Implement the objective function \code{lsqError()} that applies the
|
||||
Implement the objective function \varcode{lsqError()} that applies the
|
||||
linear equation as a model.
|
||||
\begin{itemize}
|
||||
\item The function takes three arguments. The first is a 2-element
|
||||
@@ -131,7 +132,7 @@ error, the method of the \enterm{least square error}.
|
||||
\varcode{b}. The second is a vector of x-values the third contains
|
||||
the measurements for each value of $x$, the respecive $y$-values.
|
||||
\item The function returns the mean square error \eqnref{mseline}.
|
||||
\item The function should call the function \code{meanSquareError()}
|
||||
\item The function should call the function \varcode{meanSquareError()}
|
||||
defined in the previouos exercise to calculate the error.
|
||||
\end{itemize}
|
||||
\end{exercise}
|
||||
@@ -165,7 +166,7 @@ third dimension is used to indicate the error value
|
||||
\varcode{y}). Implement a script \file{errorSurface.m}, that
|
||||
calculates the mean square error between data and a linear model and
|
||||
illustrates the error surface using the \code{surf()} function
|
||||
(consult the help to find out how to use \code{surf}.).
|
||||
(consult the help to find out how to use \code{surf()}.).
|
||||
\end{exercise}
|
||||
|
||||
By looking at the error surface we can directly see the position of
|
||||
@@ -257,7 +258,7 @@ way to the minimum of the objective function. The ball will always
|
||||
follow the steepest slope. Thus we need to figure out the direction of
|
||||
the steepest slope at the position of the ball.
|
||||
|
||||
The \enterm{gradient} (Box~\ref{partialderivativebox}) of the
|
||||
The \entermde{Gradient}{gradient} (Box~\ref{partialderivativebox}) of the
|
||||
objective function is the vector
|
||||
|
||||
\[ \nabla f_{cost}(m,b) = \left( \frac{\partial f(m,b)}{\partial m},
|
||||
@@ -296,7 +297,7 @@ choose the opposite direction.
|
||||
\end{figure}
|
||||
|
||||
\begin{exercise}{lsqGradient.m}{}\label{gradientexercise}%
|
||||
Implement a function \code{lsqGradient()}, that takes the set of
|
||||
Implement a function \varcode{lsqGradient()}, that takes the set of
|
||||
parameters $(m, b)$ of the linear equation as a two-element vector
|
||||
and the $x$- and $y$-data as input arguments. The function should
|
||||
return the gradient at that position.
|
||||
@@ -316,8 +317,8 @@ choose the opposite direction.
|
||||
Finally, we are able to implement the optimization itself. By now it
|
||||
should be obvious why it is called the gradient descent method. All
|
||||
ingredients are already there. We need: 1. The error function
|
||||
(\code{meanSquareError}), 2. the objective function
|
||||
(\code{lsqError()}), and 3. the gradient (\code{lsqGradient()}). The
|
||||
(\varcode{meanSquareError}), 2. the objective function
|
||||
(\varcode{lsqError()}), and 3. the gradient (\varcode{lsqGradient()}). The
|
||||
algorithm of the gradient descent is:
|
||||
|
||||
\begin{enumerate}
|
||||
|
||||
Reference in New Issue
Block a user