[regression] started to imporve the chapter
This commit is contained in:
parent
5484a7136e
commit
357477e17f
@ -226,7 +226,7 @@
|
||||
% \determ[index entry]{<german term>}
|
||||
% typeset the term in quotes and add it (or the optional argument) to
|
||||
% the german index.
|
||||
\newcommand{\determ}[2][]{``#2''\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
|
||||
\newcommand{\determ}[2][]{\selectlanguage{german}``#2''\selectlanguage{english}\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
|
||||
|
||||
\newcommand{\file}[1]{\texttt{#1}}
|
||||
|
||||
|
@ -1,16 +1,22 @@
|
||||
\chapter{Optimization and gradient descent}
|
||||
\exercisechapter{Optimization and gradient descent}
|
||||
|
||||
To understand the behaviour of a given system sciences often probe the
|
||||
system with input signals and then try to explain the responses
|
||||
through a model. Typically the model has a few parameter that specify
|
||||
how input and output signals are related. The question arises which
|
||||
combination of paramters are best suited to describe the relation of
|
||||
in- and output. The process of finding the best paramter set is called
|
||||
optimization or also \enterm{curve fitting}. One rather generic
|
||||
approach to the problem is the so called gradient descent method which
|
||||
will be introduced in this chapter.
|
||||
|
||||
Optimization problems arise in many different contexts. For example,
|
||||
to understand the behavior of a given system, the system is probed
|
||||
with a range of input signals and then the resulting responses are
|
||||
measured. This input-output relation can be described by a model. Such
|
||||
a model can be a simple function that maps the input signals to
|
||||
corresponding responses, it can be a filter, or a system of
|
||||
differential equations. In any case, the model has same parameter that
|
||||
specify how input and output signals are related. Which combination
|
||||
of parameter values are best suited to describe the input-output
|
||||
relation? The process of finding the best parameter values is an
|
||||
optimization problem. For a simple parameterized function that maps
|
||||
input to output values, this is the special case of a \enterm{curve
|
||||
fitting} problem, where the average distance between the curve and
|
||||
the response values is minimized. One basic numerical method used for
|
||||
such optimization problems is the so called gradient descent, which is
|
||||
introduced in this chapter.
|
||||
|
||||
\begin{figure}[t]
|
||||
\includegraphics[width=1\textwidth]{lin_regress}\hfill
|
||||
@ -22,20 +28,21 @@ will be introduced in this chapter.
|
||||
y-axis (right panel).}\label{linregressiondatafig}
|
||||
\end{figure}
|
||||
|
||||
The data plotted in \figref{linregressiondatafig} suggests a linear
|
||||
relation between input and output of the invesitagted system. We thus
|
||||
assume that the linear equation
|
||||
\[y = f(x; m, b) = m\cdot x + b \] is an appropriate model to describe the system.
|
||||
The linear equation has two free paramteter $m$ and $b$ which denote
|
||||
the slope and the y-intercept, respectively. In this chapter we will
|
||||
use this example to illustrate the methods behind several curve
|
||||
fitting approaches. We will apply this method to find the combination
|
||||
of slope and intercept that best describes the system.
|
||||
The data plotted in \figref{linregressiondatafig} suggest a linear
|
||||
relation between input and output of the system. We thus assume that a
|
||||
straight line
|
||||
\[y = f(x; m, b) = m\cdot x + b \]
|
||||
is an appropriate model to describe the system. The line has two free
|
||||
parameter, the slope $m$ and the $y$-intercept $b$. We need to find
|
||||
values for the slope and the intercept that best describe the measured
|
||||
data. In this chapter we use this example to illustrate the gradient
|
||||
descent and how this methods can be used to find a combination of
|
||||
slope and intercept that best describes the system.
|
||||
|
||||
|
||||
\section{The error function --- mean squared error}
|
||||
|
||||
Before the optimization can be done we need to specify what is
|
||||
Before the optimization can be done we need to specify what exactly is
|
||||
considered an optimal fit. In our example we search the parameter
|
||||
combination that describe the relation of $x$ and $y$ best. What is
|
||||
meant by this? Each input $x_i$ leads to an output $y_i$ and for each
|
||||
@ -55,22 +62,23 @@ would cancel out and then sum up to values close to zero. A better
|
||||
approach is to consider the absolute value of the distance
|
||||
$\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
|
||||
all deviations are indeed small no matter if they are above or below
|
||||
the prediced line. Instead of the sum we could also ask for the
|
||||
the predicted line. Instead of the sum we could also ask for the
|
||||
\emph{average}
|
||||
\begin{equation}
|
||||
\label{meanabserror}
|
||||
f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
|
||||
\end{equation}
|
||||
should be small. Commonly, the \enterm{mean squared distance} oder
|
||||
\enterm[square error!mean]{mean square error} (\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer Fehler})
|
||||
should be small. Commonly, the \enterm{mean squared distance} or
|
||||
\enterm[square error!mean]{mean square error} (\determ[quadratischer
|
||||
Fehler!mittlerer]{mittlerer quadratischer Fehler})
|
||||
\begin{equation}
|
||||
\label{meansquarederror}
|
||||
f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
|
||||
\end{equation}
|
||||
is used (\figref{leastsquareerrorfig}). Similar to the absolute
|
||||
distance, the square of the error($(y_i - y_i^{est})^2$) is always
|
||||
positive error values do not cancel out. The square further punishes
|
||||
large deviations.
|
||||
distance, the square of the error, $(y_i - y_i^{est})^2$, is always
|
||||
positive and thus error values do not cancel out. The square further
|
||||
punishes large deviations over small deviations.
|
||||
|
||||
\begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
|
||||
Implement a function \varcode{meanSquareError()}, that calculates the
|
||||
|
Reference in New Issue
Block a user