[regression] started to imporve the chapter
This commit is contained in:
parent
5484a7136e
commit
357477e17f
@ -226,7 +226,7 @@
|
|||||||
% \determ[index entry]{<german term>}
|
% \determ[index entry]{<german term>}
|
||||||
% typeset the term in quotes and add it (or the optional argument) to
|
% typeset the term in quotes and add it (or the optional argument) to
|
||||||
% the german index.
|
% the german index.
|
||||||
\newcommand{\determ}[2][]{``#2''\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
|
\newcommand{\determ}[2][]{\selectlanguage{german}``#2''\selectlanguage{english}\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}}
|
||||||
|
|
||||||
\newcommand{\file}[1]{\texttt{#1}}
|
\newcommand{\file}[1]{\texttt{#1}}
|
||||||
|
|
||||||
|
@ -1,16 +1,22 @@
|
|||||||
\chapter{Optimization and gradient descent}
|
\chapter{Optimization and gradient descent}
|
||||||
\exercisechapter{Optimization and gradient descent}
|
\exercisechapter{Optimization and gradient descent}
|
||||||
|
|
||||||
To understand the behaviour of a given system sciences often probe the
|
Optimization problems arise in many different contexts. For example,
|
||||||
system with input signals and then try to explain the responses
|
to understand the behavior of a given system, the system is probed
|
||||||
through a model. Typically the model has a few parameter that specify
|
with a range of input signals and then the resulting responses are
|
||||||
how input and output signals are related. The question arises which
|
measured. This input-output relation can be described by a model. Such
|
||||||
combination of paramters are best suited to describe the relation of
|
a model can be a simple function that maps the input signals to
|
||||||
in- and output. The process of finding the best paramter set is called
|
corresponding responses, it can be a filter, or a system of
|
||||||
optimization or also \enterm{curve fitting}. One rather generic
|
differential equations. In any case, the model has same parameter that
|
||||||
approach to the problem is the so called gradient descent method which
|
specify how input and output signals are related. Which combination
|
||||||
will be introduced in this chapter.
|
of parameter values are best suited to describe the input-output
|
||||||
|
relation? The process of finding the best parameter values is an
|
||||||
|
optimization problem. For a simple parameterized function that maps
|
||||||
|
input to output values, this is the special case of a \enterm{curve
|
||||||
|
fitting} problem, where the average distance between the curve and
|
||||||
|
the response values is minimized. One basic numerical method used for
|
||||||
|
such optimization problems is the so called gradient descent, which is
|
||||||
|
introduced in this chapter.
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\includegraphics[width=1\textwidth]{lin_regress}\hfill
|
\includegraphics[width=1\textwidth]{lin_regress}\hfill
|
||||||
@ -22,20 +28,21 @@ will be introduced in this chapter.
|
|||||||
y-axis (right panel).}\label{linregressiondatafig}
|
y-axis (right panel).}\label{linregressiondatafig}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
The data plotted in \figref{linregressiondatafig} suggests a linear
|
The data plotted in \figref{linregressiondatafig} suggest a linear
|
||||||
relation between input and output of the invesitagted system. We thus
|
relation between input and output of the system. We thus assume that a
|
||||||
assume that the linear equation
|
straight line
|
||||||
\[y = f(x; m, b) = m\cdot x + b \] is an appropriate model to describe the system.
|
\[y = f(x; m, b) = m\cdot x + b \]
|
||||||
The linear equation has two free paramteter $m$ and $b$ which denote
|
is an appropriate model to describe the system. The line has two free
|
||||||
the slope and the y-intercept, respectively. In this chapter we will
|
parameter, the slope $m$ and the $y$-intercept $b$. We need to find
|
||||||
use this example to illustrate the methods behind several curve
|
values for the slope and the intercept that best describe the measured
|
||||||
fitting approaches. We will apply this method to find the combination
|
data. In this chapter we use this example to illustrate the gradient
|
||||||
of slope and intercept that best describes the system.
|
descent and how this methods can be used to find a combination of
|
||||||
|
slope and intercept that best describes the system.
|
||||||
|
|
||||||
|
|
||||||
\section{The error function --- mean squared error}
|
\section{The error function --- mean squared error}
|
||||||
|
|
||||||
Before the optimization can be done we need to specify what is
|
Before the optimization can be done we need to specify what exactly is
|
||||||
considered an optimal fit. In our example we search the parameter
|
considered an optimal fit. In our example we search the parameter
|
||||||
combination that describe the relation of $x$ and $y$ best. What is
|
combination that describe the relation of $x$ and $y$ best. What is
|
||||||
meant by this? Each input $x_i$ leads to an output $y_i$ and for each
|
meant by this? Each input $x_i$ leads to an output $y_i$ and for each
|
||||||
@ -55,22 +62,23 @@ would cancel out and then sum up to values close to zero. A better
|
|||||||
approach is to consider the absolute value of the distance
|
approach is to consider the absolute value of the distance
|
||||||
$\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
|
$\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if
|
||||||
all deviations are indeed small no matter if they are above or below
|
all deviations are indeed small no matter if they are above or below
|
||||||
the prediced line. Instead of the sum we could also ask for the
|
the predicted line. Instead of the sum we could also ask for the
|
||||||
\emph{average}
|
\emph{average}
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{meanabserror}
|
\label{meanabserror}
|
||||||
f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
|
f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
|
||||||
\end{equation}
|
\end{equation}
|
||||||
should be small. Commonly, the \enterm{mean squared distance} oder
|
should be small. Commonly, the \enterm{mean squared distance} or
|
||||||
\enterm[square error!mean]{mean square error} (\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer Fehler})
|
\enterm[square error!mean]{mean square error} (\determ[quadratischer
|
||||||
|
Fehler!mittlerer]{mittlerer quadratischer Fehler})
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{meansquarederror}
|
\label{meansquarederror}
|
||||||
f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
|
f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
|
||||||
\end{equation}
|
\end{equation}
|
||||||
is used (\figref{leastsquareerrorfig}). Similar to the absolute
|
is used (\figref{leastsquareerrorfig}). Similar to the absolute
|
||||||
distance, the square of the error($(y_i - y_i^{est})^2$) is always
|
distance, the square of the error, $(y_i - y_i^{est})^2$, is always
|
||||||
positive error values do not cancel out. The square further punishes
|
positive and thus error values do not cancel out. The square further
|
||||||
large deviations.
|
punishes large deviations over small deviations.
|
||||||
|
|
||||||
\begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
|
\begin{exercise}{meanSquareError.m}{}\label{mseexercise}%
|
||||||
Implement a function \varcode{meanSquareError()}, that calculates the
|
Implement a function \varcode{meanSquareError()}, that calculates the
|
||||||
|
Reference in New Issue
Block a user