[regression] further improved the chapter
This commit is contained in:
@@ -16,9 +16,11 @@
|
||||
|
||||
\include{regression}
|
||||
|
||||
\section{Improvements}
|
||||
Adapt function arguments to matlabs polyfit. That is: first the data
|
||||
(x,y) and then the parameter vector. p(1) is slope, p(2) is intercept.
|
||||
\subsection{Linear fits}
|
||||
\begin{itemize}
|
||||
\item Polyfit is easy: unique solution!
|
||||
\item Example for overfitting with polyfit of a high order (=number of data points)
|
||||
\end{itemize}
|
||||
|
||||
\section{Fitting in practice}
|
||||
|
||||
@@ -34,11 +36,5 @@ Fit with matlab functions lsqcurvefit, polyfit
|
||||
\item How to test the quality of a fit? Residuals. $\chi^2$ test. Run-test.
|
||||
\end{itemize}
|
||||
|
||||
\subsection{Linear fits}
|
||||
\begin{itemize}
|
||||
\item Polyfit is easy: unique solution!
|
||||
\item Example for overfitting with polyfit of a high order (=number of data points)
|
||||
\end{itemize}
|
||||
|
||||
|
||||
\end{document}
|
||||
|
||||
@@ -52,40 +52,43 @@ considered an optimal fit. In our example we search the parameter
|
||||
combination that describe the relation of $x$ and $y$ best. What is
|
||||
meant by this? Each input $x_i$ leads to an measured output $y_i$ and
|
||||
for each $x_i$ there is a \emph{prediction} or \emph{estimation}
|
||||
$y^{est}_i$ of the output value by the model. At each $x_i$ estimation
|
||||
and measurement have a distance or error $y_i - y_i^{est}$. In our
|
||||
example the estimation is given by the equation $y_i^{est} =
|
||||
f(x;m,b)$. The best fitting model with parameters $m$ and $b$ is the
|
||||
one that minimizes the distances between observation $y_i$ and
|
||||
estimation $y_i^{est}$ (\figref{leastsquareerrorfig}).
|
||||
$y^{est}(x_i)$ of the output value by the model. At each $x_i$
|
||||
estimation and measurement have a distance or error $y_i -
|
||||
y^{est}(x_i)$. In our example the estimation is given by the equation
|
||||
$y^{est}(x_i) = f(x_i;m,b)$. The best fitting model with parameters
|
||||
$m$ and $b$ is the one that minimizes the distances between
|
||||
observation $y_i$ and estimation $y^{est}(x_i)$
|
||||
(\figref{leastsquareerrorfig}).
|
||||
|
||||
As a first guess we could simply minimize the sum $\sum_{i=1}^N y_i -
|
||||
y^{est}_i$. This approach, however, will not work since a minimal sum
|
||||
y^{est}(x_i)$. This approach, however, will not work since a minimal sum
|
||||
can also be achieved if half of the measurements is above and the
|
||||
other half below the predicted line. Positive and negative errors
|
||||
would cancel out and then sum up to values close to zero. A better
|
||||
approach is to sum over the absolute values of the distances:
|
||||
$\sum_{i=1}^N |y_i - y^{est}_i|$. This sum can only be small if all
|
||||
$\sum_{i=1}^N |y_i - y^{est}(x_i)|$. This sum can only be small if all
|
||||
deviations are indeed small no matter if they are above or below the
|
||||
predicted line. Instead of the sum we could also take the average
|
||||
\begin{equation}
|
||||
\label{meanabserror}
|
||||
f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
|
||||
f_{dist}(\{(x_i, y_i)\}|\{y^{est}(x_i)\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}(x_i)|
|
||||
\end{equation}
|
||||
For reasons that are explained in
|
||||
chapter~\ref{maximumlikelihoodchapter}, instead of the averaged
|
||||
absolute errors, the \enterm[mean squared error]{mean squared error}
|
||||
(\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer
|
||||
Fehler})
|
||||
Instead of the averaged absolute errors, the \enterm[mean squared
|
||||
error]{mean squared error} (\determ[quadratischer
|
||||
Fehler!mittlerer]{mittlerer quadratischer Fehler})
|
||||
\begin{equation}
|
||||
\label{meansquarederror}
|
||||
f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
|
||||
f_{mse}(\{(x_i, y_i)\}|\{y^{est}(x_i)\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}(x_i))^2
|
||||
\end{equation}
|
||||
is commonly used (\figref{leastsquareerrorfig}). Similar to the
|
||||
absolute distance, the square of the errors, $(y_i - y_i^{est})^2$, is
|
||||
absolute distance, the square of the errors, $(y_i - y^{est}(x_i))^2$, is
|
||||
always positive and thus positive and negative error values do not
|
||||
cancel each other out. In addition, the square punishes large
|
||||
deviations over small deviations.
|
||||
deviations over small deviations. In
|
||||
chapter~\ref{maximumlikelihoodchapter} we show that minimizing the
|
||||
mean square error is equivalent to maximizing the likelihood that the
|
||||
observations originate from the model, if the data are normally
|
||||
distributed around the model prediction.
|
||||
|
||||
\begin{exercise}{meanSquaredErrorLine.m}{}\label{mseexercise}%
|
||||
Given a vector of observations \varcode{y} and a vector with the
|
||||
@@ -98,20 +101,13 @@ deviations over small deviations.
|
||||
\section{Objective function}
|
||||
|
||||
The mean squared error is a so called \enterm{objective function} or
|
||||
\enterm{cost function} (\determ{Kostenfunktion}), $f_{cost}(\{(x_i,
|
||||
y_i)\}|\{y^{est}_i\})$. A cost function assigns to the given data set
|
||||
$\{(x_i, y_i)\}$ and corresponding model predictions $\{y^{est}_i\}$ a
|
||||
single scalar value that we want to minimize. Here we aim to adapt the
|
||||
model parameters to minimize the mean squared error
|
||||
\eqref{meansquarederror}. In chapter~\ref{maximumlikelihoodchapter} we
|
||||
show that the minimization of the mean square error is equivalent to
|
||||
maximizing the likelihood that the observations originate from the
|
||||
model (assuming a normal distribution of the data around the model
|
||||
prediction). The \enterm{cost function} does not have to be the mean
|
||||
square error but can be any function that maps the data and the
|
||||
predictions to a scalar value describing the quality of the fit. In
|
||||
the optimization process we aim for the paramter combination that
|
||||
minimizes the costs.
|
||||
\enterm{cost function} (\determ{Kostenfunktion}). A cost function
|
||||
assigns to a model prediction $\{y^{est}(x_i)\}$ for a given data set
|
||||
$\{(x_i, y_i)\}$ a single scalar value that we want to minimize. Here
|
||||
we aim to adapt the model parameters to minimize the mean squared
|
||||
error \eqref{meansquarederror}. In general, the \enterm{cost function}
|
||||
can be any function that describes the quality of the fit by mapping
|
||||
the data and the predictions to a single scalar value.
|
||||
|
||||
\begin{figure}[t]
|
||||
\includegraphics[width=1\textwidth]{linear_least_squares}
|
||||
@@ -123,41 +119,40 @@ minimizes the costs.
|
||||
\label{leastsquareerrorfig}
|
||||
\end{figure}
|
||||
|
||||
Replacing $y^{est}$ with our model, the straight line
|
||||
\eqref{straightline}, yields
|
||||
Replacing $y^{est}$ in the mean squared error \eqref{meansquarederror}
|
||||
with our model, the straight line \eqref{straightline}, the cost
|
||||
function reads
|
||||
\begin{eqnarray}
|
||||
f_{cost}(\{(x_i, y_i)\}|m,b) & = & \frac{1}{N} \sum_{i=1}^N (y_i - f(x_i;m,b))^2 \label{msefunc} \\
|
||||
f_{cost}(m,b|\{(x_i, y_i)\}) & = & \frac{1}{N} \sum_{i=1}^N (y_i - f(x_i;m,b))^2 \label{msefunc} \\
|
||||
& = & \frac{1}{N} \sum_{i=1}^N (y_i - m x_i - b)^2 \label{mseline}
|
||||
\end{eqnarray}
|
||||
That is, the mean square error is given by the pairs $(x_i, y_i)$ of
|
||||
measurements and the parameters $m$ and $b$ of the straight line. The
|
||||
optimization process tries to find $m$ and $b$ such that the cost
|
||||
function is minimized. With the mean squared error as the cost
|
||||
function this optimization process is also called method of the
|
||||
\enterm{least square error} (\determ[quadratischer
|
||||
The optimization process tries to find the slope $m$ and the intercept
|
||||
$b$ such that the cost function is minimized. With the mean squared
|
||||
error as the cost function this optimization process is also called
|
||||
method of the \enterm{least square error} (\determ[quadratischer
|
||||
Fehler!kleinster]{Methode der kleinsten Quadrate}).
|
||||
|
||||
\begin{exercise}{meanSquaredError.m}{}
|
||||
Implement the objective function \varcode{meanSquaredError()} that
|
||||
uses a straight line, \eqnref{straightline}, as a model. The
|
||||
function takes three arguments. The first is a 2-element vector that
|
||||
contains the values of parameters \varcode{m} and \varcode{b}. The
|
||||
second is a vector of x-values, and the third contains the
|
||||
measurements for each value of $x$, the respective $y$-values. The
|
||||
function returns the mean square error \eqnref{mseline}.
|
||||
Implement the objective function \eqref{mseline} as a function
|
||||
\varcode{meanSquaredError()}. The function takes three
|
||||
arguments. The first is a vector of $x$-values and the second
|
||||
contains the measurements $y$ for each value of $x$. The third
|
||||
argument is a 2-element vector that contains the values of
|
||||
parameters \varcode{m} and \varcode{b}. The function returns the
|
||||
mean square error.
|
||||
\end{exercise}
|
||||
|
||||
|
||||
\section{Error surface}
|
||||
For each combination of the two parameters $m$ and $b$ of the model we
|
||||
can use \eqnref{mseline} to calculate the corresponding value of the
|
||||
cost function. We thus consider the cost function $f_{cost}(\{(x_i,
|
||||
y_i)\}|m,b)$ as a function $f_{cost}(m,b)$, that maps the parameter
|
||||
values $m$ and $b$ to an error value. The error values describe a
|
||||
landscape over the $m$-$b$ plane, the error surface, that can be
|
||||
illustrated graphically using a 3-d surface-plot. $m$ and $b$ are
|
||||
plotted on the $x-$ and $y-$ axis while the third dimension indicates
|
||||
the error value (\figref{errorsurfacefig}).
|
||||
cost function. The cost function $f_{cost}(m,b|\{(x_i, y_i)\}|)$ is a
|
||||
function $f_{cost}(m,b)$, that maps the parameter values $m$ and $b$
|
||||
to a scalar error value. The error values describe a landscape over the
|
||||
$m$-$b$ plane, the error surface, that can be illustrated graphically
|
||||
using a 3-d surface-plot. $m$ and $b$ are plotted on the $x$- and $y$-
|
||||
axis while the third dimension indicates the error value
|
||||
(\figref{errorsurfacefig}).
|
||||
|
||||
\begin{figure}[t]
|
||||
\includegraphics[width=0.75\textwidth]{error_surface}
|
||||
@@ -176,8 +171,8 @@ the error value (\figref{errorsurfacefig}).
|
||||
calculate the mean squared error between the data and straight lines
|
||||
for a range of slopes and intercepts using the
|
||||
\varcode{meanSquaredError()} function from the previous exercise.
|
||||
Illustrates the error surface using the \code{surface()} function
|
||||
(consult the help to find out how to use \code{surface()}).
|
||||
Illustrates the error surface using the \code{surface()} function.
|
||||
Consult the documentation to find out how to use \code{surface()}.
|
||||
\end{exercise}
|
||||
|
||||
By looking at the error surface we can directly see the position of
|
||||
@@ -185,19 +180,21 @@ the minimum and thus estimate the optimal parameter combination. How
|
||||
can we use the error surface to guide an automatic optimization
|
||||
process?
|
||||
|
||||
The obvious approach would be to calculate the error surface and then
|
||||
find the position of the minimum using the \code{min} function. This
|
||||
approach, however has several disadvantages: (i) it is computationally
|
||||
very expensive to calculate the error for each parameter
|
||||
combination. The number of combinations increases exponentially with
|
||||
the number of free parameters (also known as the ``curse of
|
||||
dimensionality''). (ii) the accuracy with which the best parameters
|
||||
can be estimated is limited by the resolution used to sample the
|
||||
parameter space. The coarser the parameters are sampled the less
|
||||
precise is the obtained position of the minimum.
|
||||
The obvious approach would be to calculate the error surface for any
|
||||
combination of slope and intercept values and then find the position
|
||||
of the minimum using the \code{min} function. This approach, however
|
||||
has several disadvantages: (i) it is computationally very expensive to
|
||||
calculate the error for each parameter combination. The number of
|
||||
combinations increases exponentially with the number of free
|
||||
parameters (also known as the ``curse of dimensionality''). (ii) the
|
||||
accuracy with which the best parameters can be estimated is limited by
|
||||
the resolution used to sample the parameter space. The coarser the
|
||||
parameters are sampled the less precise is the obtained position of
|
||||
the minimum.
|
||||
|
||||
We want a procedure that finds the minimum of the cost function with a minimal number
|
||||
of computations and to arbitrary precision.
|
||||
So we need a different approach. We want a procedure that finds the
|
||||
minimum of the cost function with a minimal number of computations and
|
||||
to arbitrary precision.
|
||||
|
||||
\begin{ibox}[t]{\label{differentialquotientbox}Difference quotient and derivative}
|
||||
\includegraphics[width=0.33\textwidth]{derivative}
|
||||
@@ -308,9 +305,9 @@ choose the opposite direction.
|
||||
|
||||
\begin{exercise}{meanSquaredGradient.m}{}\label{gradientexercise}%
|
||||
Implement a function \varcode{meanSquaredGradient()}, that takes the
|
||||
set of parameters $(m, b)$ of a straight line as a two-element
|
||||
vector and the $x$- and $y$-data as input arguments. The function
|
||||
should return the gradient at that position as a vector with two
|
||||
$x$- and $y$-data and the set of parameters $(m, b)$ of a straight
|
||||
line as a two-element vector as input arguments. The function should
|
||||
return the gradient at the position $(m, b)$ as a vector with two
|
||||
elements.
|
||||
\end{exercise}
|
||||
|
||||
@@ -359,7 +356,7 @@ distance between the red dots in \figref{gradientdescentfig}) is
|
||||
large.
|
||||
|
||||
\begin{figure}[t]
|
||||
\includegraphics[width=0.55\textwidth]{gradient_descent}
|
||||
\includegraphics[width=0.45\textwidth]{gradient_descent}
|
||||
\titlecaption{Gradient descent.}{The algorithm starts at an
|
||||
arbitrary position. At each point the gradient is estimated and
|
||||
the position is updated as long as the length of the gradient is
|
||||
@@ -376,7 +373,7 @@ large.
|
||||
\item Plot the error values as a function of the iterations, the
|
||||
number of optimization steps.
|
||||
\item Plot the measured data together with the best fitting straight line.
|
||||
\end{enumerate}
|
||||
\end{enumerate}\vspace{-4.5ex}
|
||||
\end{exercise}
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user