[regression] started to imporve the chapter

This commit is contained in:
Jan Benda 2019-12-10 17:57:06 +01:00
parent 5484a7136e
commit 357477e17f
2 changed files with 35 additions and 27 deletions

View File

@ -226,7 +226,7 @@
% \determ[index entry]{<german term>}
% typeset the term in quotes and add it (or the optional argument) to
% the german index.
\newcommand{\determ}[2][]{``#2''\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
\newcommand{\determ}[2][]{\selectlanguage{german}``#2''\selectlanguage{english}\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
\newcommand{\file}[1]{\texttt{#1}}

View File

@ -1,16 +1,22 @@
\chapter{Optimization and gradient descent}
\exercisechapter{Optimization and gradient descent}
To understand the behaviour of a given system sciences often probe the
system with input signals and then try to explain the responses
through a model. Typically the model has a few parameter that specify
how input and output signals are related. The question arises which
combination of paramters are best suited to describe the relation of
in- and output. The process of finding the best paramter set is called
optimization or also \enterm{curve fitting}. One rather generic
approach to the problem is the so called gradient descent method which
will be introduced in this chapter.
Optimization problems arise in many different contexts. For example,
to understand the behavior of a given system, the system is probed
with a range of input signals and then the resulting responses are
measured. This input-output relation can be described by a model. Such
a model can be a simple function that maps the input signals to
corresponding responses, it can be a filter, or a system of
differential equations. In any case, the model has same parameter that
specify how input and output signals are related. Which combination
of parameter values are best suited to describe the input-output
relation? The process of finding the best parameter values is an
optimization problem. For a simple parameterized function that maps
input to output values, this is the special case of a \enterm{curve
fitting} problem, where the average distance between the curve and
the response values is minimized. One basic numerical method used for
such optimization problems is the so called gradient descent, which is
introduced in this chapter.
\begin{figure}[t]
\includegraphics[width=1\textwidth]{lin_regress}\hfill
@ -22,20 +28,21 @@ will be introduced in this chapter.
y-axis (right panel).}\label{linregressiondatafig}
\end{figure}
The data plotted in \figref{linregressiondatafig} suggests a linear
relation between input and output of the invesitagted system. We thus
assume that the linear equation
\[y = f(x; m, b) = m\cdot x + b \] is an appropriate model to describe the system.
The linear equation has two free paramteter $m$ and $b$ which denote
the slope and the y-intercept, respectively. In this chapter we will
use this example to illustrate the methods behind several curve
fitting approaches. We will apply this method to find the combination
of slope and intercept that best describes the system.
The data plotted in \figref{linregressiondatafig} suggest a linear
relation between input and output of the system. We thus assume that a
straight line
\[y = f(x; m, b) = m\cdot x + b \]
is an appropriate model to describe the system. The line has two free
parameter, the slope $m$ and the $y$-intercept $b$. We need to find
values for the slope and the intercept that best describe the measured
data. In this chapter we use this example to illustrate the gradient
descent and how this methods can be used to find a combination of
slope and intercept that best describes the system.
\section{The error function --- mean squared error}
Before the optimization can be done we need to specify what is
Before the optimization can be done we need to specify what exactly is
considered an optimal fit. In our example we search the parameter
combination that describe the relation of $x$ and $y$ best. What is
meant by this? Each input $x_i$ leads to an output $y_i$ and for each
@ -55,22 +62,23 @@ would cancel out and then sum up to values close to zero. A better
approach is to consider the absolute value of the distance
$\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
all deviations are indeed small no matter if they are above or below
the prediced line. Instead of the sum we could also ask for the
the predicted line. Instead of the sum we could also ask for the
\emph{average}
\begin{equation}
\label{meanabserror}
f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
\end{equation}
should be small. Commonly, the \enterm{mean squared distance} oder
\enterm[square error!mean]{mean square error} (\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer Fehler})
should be small. Commonly, the \enterm{mean squared distance} or
\enterm[square error!mean]{mean square error} (\determ[quadratischer
Fehler!mittlerer]{mittlerer quadratischer Fehler})
\begin{equation}
\label{meansquarederror}
f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
\end{equation}
is used (\figref{leastsquareerrorfig}). Similar to the absolute
distance, the square of the error($(y_i - y_i^{est})^2$) is always
positive error values do not cancel out. The square further punishes
large deviations.
distance, the square of the error, $(y_i - y_i^{est})^2$, is always
positive and thus error values do not cancel out. The square further
punishes large deviations over small deviations.
\begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
Implement a function \varcode{meanSquareError()}, that calculates the