diff --git a/header.tex b/header.tex index a9866c7..8a8f55f 100644 --- a/header.tex +++ b/header.tex @@ -226,7 +226,7 @@ % \determ[index entry]{} % typeset the term in quotes and add it (or the optional argument) to % the german index. -\newcommand{\determ}[2][]{``#2''\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}} +\newcommand{\determ}[2][]{\selectlanguage{german}``#2''\selectlanguage{english}\ifthenelse{\equal{#1}{}}{\protect\sindex[determ]{#2}}{\protect\sindex[determ]{#1}}} \newcommand{\file}[1]{\texttt{#1}} diff --git a/regression/lecture/regression.tex b/regression/lecture/regression.tex index 7e020ae..1111796 100644 --- a/regression/lecture/regression.tex +++ b/regression/lecture/regression.tex @@ -1,16 +1,22 @@ \chapter{Optimization and gradient descent} \exercisechapter{Optimization and gradient descent} -To understand the behaviour of a given system sciences often probe the -system with input signals and then try to explain the responses -through a model. Typically the model has a few parameter that specify -how input and output signals are related. The question arises which -combination of paramters are best suited to describe the relation of -in- and output. The process of finding the best paramter set is called -optimization or also \enterm{curve fitting}. One rather generic -approach to the problem is the so called gradient descent method which -will be introduced in this chapter. - +Optimization problems arise in many different contexts. For example, +to understand the behavior of a given system, the system is probed +with a range of input signals and then the resulting responses are +measured. This input-output relation can be described by a model. Such +a model can be a simple function that maps the input signals to +corresponding responses, it can be a filter, or a system of +differential equations. In any case, the model has same parameter that +specify how input and output signals are related. Which combination +of parameter values are best suited to describe the input-output +relation? The process of finding the best parameter values is an +optimization problem. For a simple parameterized function that maps +input to output values, this is the special case of a \enterm{curve + fitting} problem, where the average distance between the curve and +the response values is minimized. One basic numerical method used for +such optimization problems is the so called gradient descent, which is +introduced in this chapter. \begin{figure}[t] \includegraphics[width=1\textwidth]{lin_regress}\hfill @@ -22,20 +28,21 @@ will be introduced in this chapter. y-axis (right panel).}\label{linregressiondatafig} \end{figure} -The data plotted in \figref{linregressiondatafig} suggests a linear -relation between input and output of the invesitagted system. We thus -assume that the linear equation -\[y = f(x; m, b) = m\cdot x + b \] is an appropriate model to describe the system. -The linear equation has two free paramteter $m$ and $b$ which denote -the slope and the y-intercept, respectively. In this chapter we will -use this example to illustrate the methods behind several curve -fitting approaches. We will apply this method to find the combination -of slope and intercept that best describes the system. +The data plotted in \figref{linregressiondatafig} suggest a linear +relation between input and output of the system. We thus assume that a +straight line +\[y = f(x; m, b) = m\cdot x + b \] +is an appropriate model to describe the system. The line has two free +parameter, the slope $m$ and the $y$-intercept $b$. We need to find +values for the slope and the intercept that best describe the measured +data. In this chapter we use this example to illustrate the gradient +descent and how this methods can be used to find a combination of +slope and intercept that best describes the system. \section{The error function --- mean squared error} -Before the optimization can be done we need to specify what is +Before the optimization can be done we need to specify what exactly is considered an optimal fit. In our example we search the parameter combination that describe the relation of $x$ and $y$ best. What is meant by this? Each input $x_i$ leads to an output $y_i$ and for each @@ -55,22 +62,23 @@ would cancel out and then sum up to values close to zero. A better approach is to consider the absolute value of the distance $\sum_{i=1}^N |y_i - y^{est}_i|$. The total error can only be small if all deviations are indeed small no matter if they are above or below -the prediced line. Instead of the sum we could also ask for the +the predicted line. Instead of the sum we could also ask for the \emph{average} \begin{equation} \label{meanabserror} f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i| \end{equation} -should be small. Commonly, the \enterm{mean squared distance} oder -\enterm[square error!mean]{mean square error} (\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer Fehler}) +should be small. Commonly, the \enterm{mean squared distance} or +\enterm[square error!mean]{mean square error} (\determ[quadratischer +Fehler!mittlerer]{mittlerer quadratischer Fehler}) \begin{equation} \label{meansquarederror} f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2 \end{equation} is used (\figref{leastsquareerrorfig}). Similar to the absolute -distance, the square of the error($(y_i - y_i^{est})^2$) is always -positive error values do not cancel out. The square further punishes -large deviations. +distance, the square of the error, $(y_i - y_i^{est})^2$, is always +positive and thus error values do not cancel out. The square further +punishes large deviations over small deviations. \begin{exercise}{meanSquareError.m}{}\label{mseexercise}% Implement a function \varcode{meanSquareError()}, that calculates the