[regression] started to imporve the chapter

This commit is contained in:
Jan Benda 2019-12-10 17:57:06 +01:00
parent 5484a7136e
commit 357477e17f
2 changed files with 35 additions and 27 deletions

View File

@ -226,7 +226,7 @@
% \determ[index entry]{<german term>} % \determ[index entry]{<german term>}
% typeset the term in quotes and add it (or the optional argument) to % typeset the term in quotes and add it (or the optional argument) to
% the german index. % the german index.
\newcommand{\determ}[2][]{``#2''\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}} \newcommand{\determ}[2][]{\selectlanguage{german}``#2''\selectlanguage{english}\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
\newcommand{\file}[1]{\texttt{#1}} \newcommand{\file}[1]{\texttt{#1}}

View File

@ -1,16 +1,22 @@
\chapter{Optimization and gradient descent} \chapter{Optimization and gradient descent}
\exercisechapter{Optimization and gradient descent} \exercisechapter{Optimization and gradient descent}
To understand the behaviour of a given system sciences often probe the Optimization problems arise in many different contexts. For example,
system with input signals and then try to explain the responses to understand the behavior of a given system, the system is probed
through a model. Typically the model has a few parameter that specify with a range of input signals and then the resulting responses are
how input and output signals are related. The question arises which measured. This input-output relation can be described by a model. Such
combination of paramters are best suited to describe the relation of a model can be a simple function that maps the input signals to
in- and output. The process of finding the best paramter set is called corresponding responses, it can be a filter, or a system of
optimization or also \enterm{curve fitting}. One rather generic differential equations. In any case, the model has same parameter that
approach to the problem is the so called gradient descent method which specify how input and output signals are related. Which combination
will be introduced in this chapter. of parameter values are best suited to describe the input-output
relation? The process of finding the best parameter values is an
optimization problem. For a simple parameterized function that maps
input to output values, this is the special case of a \enterm{curve
fitting} problem, where the average distance between the curve and
the response values is minimized. One basic numerical method used for
such optimization problems is the so called gradient descent, which is
introduced in this chapter.
\begin{figure}[t] \begin{figure}[t]
\includegraphics[width=1\textwidth]{lin_regress}\hfill \includegraphics[width=1\textwidth]{lin_regress}\hfill
@ -22,20 +28,21 @@ will be introduced in this chapter.
y-axis (right panel).}\label{linregressiondatafig} y-axis (right panel).}\label{linregressiondatafig}
\end{figure} \end{figure}
The data plotted in \figref{linregressiondatafig} suggests a linear The data plotted in \figref{linregressiondatafig} suggest a linear
relation between input and output of the invesitagted system. We thus relation between input and output of the system. We thus assume that a
assume that the linear equation straight line
\[y = f(x; m, b) = m\cdot x + b \] is an appropriate model to describe the system. \[y = f(x; m, b) = m\cdot x + b \]
The linear equation has two free paramteter $m$ and $b$ which denote is an appropriate model to describe the system. The line has two free
the slope and the y-intercept, respectively. In this chapter we will parameter, the slope $m$ and the $y$-intercept $b$. We need to find
use this example to illustrate the methods behind several curve values for the slope and the intercept that best describe the measured
fitting approaches. We will apply this method to find the combination data. In this chapter we use this example to illustrate the gradient
of slope and intercept that best describes the system. descent and how this methods can be used to find a combination of
slope and intercept that best describes the system.
\section{The error function --- mean squared error} \section{The error function --- mean squared error}
Before the optimization can be done we need to specify what is Before the optimization can be done we need to specify what exactly is
considered an optimal fit. In our example we search the parameter considered an optimal fit. In our example we search the parameter
combination that describe the relation of $x$ and $y$ best. What is combination that describe the relation of $x$ and $y$ best. What is
meant by this? Each input $x_i$ leads to an output $y_i$ and for each meant by this? Each input $x_i$ leads to an output $y_i$ and for each
@ -55,22 +62,23 @@ would cancel out and then sum up to values close to zero. A better
approach is to consider the absolute value of the distance approach is to consider the absolute value of the distance
$\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if $\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
all deviations are indeed small no matter if they are above or below all deviations are indeed small no matter if they are above or below
the prediced line. Instead of the sum we could also ask for the the predicted line. Instead of the sum we could also ask for the
\emph{average} \emph{average}
\begin{equation} \begin{equation}
\label{meanabserror} \label{meanabserror}
f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i| f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
\end{equation} \end{equation}
should be small. Commonly, the \enterm{mean squared distance} oder should be small. Commonly, the \enterm{mean squared distance} or
\enterm[square error!mean]{mean square error} (\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer Fehler}) \enterm[square error!mean]{mean square error} (\determ[quadratischer
Fehler!mittlerer]{mittlerer quadratischer Fehler})
\begin{equation} \begin{equation}
\label{meansquarederror} \label{meansquarederror}
f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2 f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
\end{equation} \end{equation}
is used (\figref{leastsquareerrorfig}). Similar to the absolute is used (\figref{leastsquareerrorfig}). Similar to the absolute
distance, the square of the error($(y_i - y_i^{est})^2$) is always distance, the square of the error, $(y_i - y_i^{est})^2$, is always
positive error values do not cancel out. The square further punishes positive and thus error values do not cancel out. The square further
large deviations. punishes large deviations over small deviations.
\begin{exercise}{meanSquareError.m}{}\label{mseexercise}% \begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
Implement a function \varcode{meanSquareError()}, that calculates the Implement a function \varcode{meanSquareError()}, that calculates the