| 
						
					 | 
					 | 
					@ -52,40 +52,43 @@ considered an optimal fit. In our example we search the parameter
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					combination that describe the relation of $x$ and $y$ best. What is
 | 
					 | 
					 | 
					 | 
					combination that describe the relation of $x$ and $y$ best. What is
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					meant by this? Each input $x_i$ leads to an measured output $y_i$ and
 | 
					 | 
					 | 
					 | 
					meant by this? Each input $x_i$ leads to an measured output $y_i$ and
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					for each $x_i$ there is a \emph{prediction} or \emph{estimation}
 | 
					 | 
					 | 
					 | 
					for each $x_i$ there is a \emph{prediction} or \emph{estimation}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					$y^{est}_i$ of the output value by the model. At each $x_i$ estimation
 | 
					 | 
					 | 
					 | 
					$y^{est}(x_i)$ of the output value by the model. At each $x_i$
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					and measurement have a distance or error $y_i - y_i^{est}$. In our
 | 
					 | 
					 | 
					 | 
					estimation and measurement have a distance or error $y_i -
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					example the estimation is given by the equation $y_i^{est} =
 | 
					 | 
					 | 
					 | 
					y^{est}(x_i)$. In our example the estimation is given by the equation
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					f(x;m,b)$. The best fitting model with parameters $m$ and $b$ is the
 | 
					 | 
					 | 
					 | 
					$y^{est}(x_i) = f(x_i;m,b)$. The best fitting model with parameters
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					one that minimizes the distances between observation $y_i$ and
 | 
					 | 
					 | 
					 | 
					$m$ and $b$ is the one that minimizes the distances between
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					estimation $y_i^{est}$ (\figref{leastsquareerrorfig}).
 | 
					 | 
					 | 
					 | 
					observation $y_i$ and estimation $y^{est}(x_i)$
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					(\figref{leastsquareerrorfig}).
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					As a first guess we could simply minimize the sum $\sum_{i=1}^N y_i -
 | 
					 | 
					 | 
					 | 
					As a first guess we could simply minimize the sum $\sum_{i=1}^N y_i -
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					y^{est}_i$. This approach, however, will not work since a minimal sum
 | 
					 | 
					 | 
					 | 
					y^{est}(x_i)$. This approach, however, will not work since a minimal sum
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					can also be achieved if half of the measurements is above and the
 | 
					 | 
					 | 
					 | 
					can also be achieved if half of the measurements is above and the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					other half below the predicted line. Positive and negative errors
 | 
					 | 
					 | 
					 | 
					other half below the predicted line. Positive and negative errors
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					would cancel out and then sum up to values close to zero. A better
 | 
					 | 
					 | 
					 | 
					would cancel out and then sum up to values close to zero. A better
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					approach is to sum over the absolute values of the distances:
 | 
					 | 
					 | 
					 | 
					approach is to sum over the absolute values of the distances:
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					$\sum_{i=1}^N |y_i - y^{est}_i|$. This sum can only be small if all
 | 
					 | 
					 | 
					 | 
					$\sum_{i=1}^N |y_i - y^{est}(x_i)|$. This sum can only be small if all
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					deviations are indeed small no matter if they are above or below the
 | 
					 | 
					 | 
					 | 
					deviations are indeed small no matter if they are above or below the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					predicted line. Instead of the sum we could also take the average
 | 
					 | 
					 | 
					 | 
					predicted line. Instead of the sum we could also take the average
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{equation}
 | 
					 | 
					 | 
					 | 
					\begin{equation}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \label{meanabserror}
 | 
					 | 
					 | 
					 | 
					  \label{meanabserror}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  f_{dist}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}_i|
 | 
					 | 
					 | 
					 | 
					  f_{dist}(\{(x_i, y_i)\}|\{y^{est}(x_i)\}) = \frac{1}{N} \sum_{i=1}^N |y_i - y^{est}(x_i)|
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\end{equation}
 | 
					 | 
					 | 
					 | 
					\end{equation}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					For reasons that are explained in
 | 
					 | 
					 | 
					 | 
					Instead of the averaged absolute errors, the \enterm[mean squared
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					chapter~\ref{maximumlikelihoodchapter}, instead of the averaged
 | 
					 | 
					 | 
					 | 
					error]{mean squared error} (\determ[quadratischer
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					absolute errors, the \enterm[mean squared error]{mean squared error}
 | 
					 | 
					 | 
					 | 
					Fehler!mittlerer]{mittlerer quadratischer Fehler})
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					(\determ[quadratischer Fehler!mittlerer]{mittlerer quadratischer
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  Fehler})
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{equation}
 | 
					 | 
					 | 
					 | 
					\begin{equation}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \label{meansquarederror}
 | 
					 | 
					 | 
					 | 
					  \label{meansquarederror}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  f_{mse}(\{(x_i, y_i)\}|\{y^{est}_i\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}_i)^2
 | 
					 | 
					 | 
					 | 
					  f_{mse}(\{(x_i, y_i)\}|\{y^{est}(x_i)\}) = \frac{1}{N} \sum_{i=1}^N (y_i - y^{est}(x_i))^2
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\end{equation}
 | 
					 | 
					 | 
					 | 
					\end{equation}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					is commonly used (\figref{leastsquareerrorfig}). Similar to the
 | 
					 | 
					 | 
					 | 
					is commonly used (\figref{leastsquareerrorfig}). Similar to the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					absolute distance, the square of the errors, $(y_i - y_i^{est})^2$, is
 | 
					 | 
					 | 
					 | 
					absolute distance, the square of the errors, $(y_i - y^{est}(x_i))^2$, is
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					always positive and thus positive and negative error values do not
 | 
					 | 
					 | 
					 | 
					always positive and thus positive and negative error values do not
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					cancel each other out. In addition, the square punishes large
 | 
					 | 
					 | 
					 | 
					cancel each other out. In addition, the square punishes large
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					deviations over small deviations.
 | 
					 | 
					 | 
					 | 
					deviations over small deviations. In
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					chapter~\ref{maximumlikelihoodchapter} we show that minimizing the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					mean square error is equivalent to maximizing the likelihood that the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					observations originate from the model, if the data are normally
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					distributed around the model prediction.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{exercise}{meanSquaredErrorLine.m}{}\label{mseexercise}%
 | 
					 | 
					 | 
					 | 
					\begin{exercise}{meanSquaredErrorLine.m}{}\label{mseexercise}%
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  Given a vector of observations \varcode{y} and a vector with the
 | 
					 | 
					 | 
					 | 
					  Given a vector of observations \varcode{y} and a vector with the
 | 
				
			
			
		
	
	
		
		
			
				
					
					| 
						
					 | 
					 | 
					@ -98,20 +101,13 @@ deviations over small deviations.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\section{Objective function}
 | 
					 | 
					 | 
					 | 
					\section{Objective function}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					The mean squared error is a so called \enterm{objective function} or
 | 
					 | 
					 | 
					 | 
					The mean squared error is a so called \enterm{objective function} or
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\enterm{cost function} (\determ{Kostenfunktion}), $f_{cost}(\{(x_i,
 | 
					 | 
					 | 
					 | 
					\enterm{cost function} (\determ{Kostenfunktion}). A cost function
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					y_i)\}|\{y^{est}_i\})$. A cost function assigns to the given data set
 | 
					 | 
					 | 
					 | 
					assigns to a model prediction $\{y^{est}(x_i)\}$ for a given data set
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					$\{(x_i, y_i)\}$ and corresponding model predictions $\{y^{est}_i\}$ a
 | 
					 | 
					 | 
					 | 
					$\{(x_i, y_i)\}$ a single scalar value that we want to minimize.  Here
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					single scalar value that we want to minimize. Here we aim to adapt the
 | 
					 | 
					 | 
					 | 
					we aim to adapt the model parameters to minimize the mean squared
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					model parameters to minimize the mean squared error
 | 
					 | 
					 | 
					 | 
					error \eqref{meansquarederror}. In general, the \enterm{cost function}
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\eqref{meansquarederror}. In chapter~\ref{maximumlikelihoodchapter} we
 | 
					 | 
					 | 
					 | 
					can be any function that describes the quality of the fit by mapping
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					show that the minimization of the mean square error is equivalent to
 | 
					 | 
					 | 
					 | 
					the data and the predictions to a single scalar value.
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					maximizing the likelihood that the observations originate from the
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					model (assuming a normal distribution of the data around the model
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					prediction). The \enterm{cost function} does not have to be the mean
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					square error but can be any function that maps the data and the
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					predictions to a scalar value describing the quality of the fit. In
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					the optimization process we aim for the paramter combination that
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					minimizes the costs.
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{figure}[t]
 | 
					 | 
					 | 
					 | 
					\begin{figure}[t]
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \includegraphics[width=1\textwidth]{linear_least_squares}
 | 
					 | 
					 | 
					 | 
					  \includegraphics[width=1\textwidth]{linear_least_squares}
 | 
				
			
			
		
	
	
		
		
			
				
					
					| 
						
					 | 
					 | 
					@ -123,41 +119,40 @@ minimizes the costs.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \label{leastsquareerrorfig}
 | 
					 | 
					 | 
					 | 
					  \label{leastsquareerrorfig}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\end{figure}
 | 
					 | 
					 | 
					 | 
					\end{figure}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					Replacing $y^{est}$ with our model, the straight line
 | 
					 | 
					 | 
					 | 
					Replacing $y^{est}$ in the mean squared error \eqref{meansquarederror}
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\eqref{straightline}, yields
 | 
					 | 
					 | 
					 | 
					with our model, the straight line \eqref{straightline}, the cost
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					function reads
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{eqnarray}
 | 
					 | 
					 | 
					 | 
					\begin{eqnarray}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  f_{cost}(\{(x_i, y_i)\}|m,b) & = & \frac{1}{N} \sum_{i=1}^N (y_i - f(x_i;m,b))^2 \label{msefunc} \\
 | 
					 | 
					 | 
					 | 
					  f_{cost}(m,b|\{(x_i, y_i)\}) & = & \frac{1}{N} \sum_{i=1}^N (y_i - f(x_i;m,b))^2 \label{msefunc} \\
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  & = & \frac{1}{N} \sum_{i=1}^N (y_i - m x_i - b)^2 \label{mseline}
 | 
					 | 
					 | 
					 | 
					  & = & \frac{1}{N} \sum_{i=1}^N (y_i - m x_i - b)^2 \label{mseline}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\end{eqnarray}
 | 
					 | 
					 | 
					 | 
					\end{eqnarray}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					That is, the mean square error is given by the pairs $(x_i, y_i)$ of
 | 
					 | 
					 | 
					 | 
					The optimization process tries to find the slope $m$ and the intercept
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					measurements and the parameters $m$ and $b$ of the straight line. The
 | 
					 | 
					 | 
					 | 
					$b$ such that the cost function is minimized. With the mean squared
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					optimization process tries to find $m$ and $b$ such that the cost
 | 
					 | 
					 | 
					 | 
					error as the cost function this optimization process is also called
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					function is minimized. With the mean squared error as the cost
 | 
					 | 
					 | 
					 | 
					method of the \enterm{least square error} (\determ[quadratischer
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					function this optimization process is also called method of the
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\enterm{least square error} (\determ[quadratischer
 | 
					 | 
					 | 
					 | 
					 | 
				
			
			
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					Fehler!kleinster]{Methode der kleinsten Quadrate}).
 | 
					 | 
					 | 
					 | 
					Fehler!kleinster]{Methode der kleinsten Quadrate}).
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{exercise}{meanSquaredError.m}{}
 | 
					 | 
					 | 
					 | 
					\begin{exercise}{meanSquaredError.m}{}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  Implement the objective function \varcode{meanSquaredError()} that
 | 
					 | 
					 | 
					 | 
					  Implement the objective function \eqref{mseline} as a function
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  uses a straight line, \eqnref{straightline}, as a model.  The
 | 
					 | 
					 | 
					 | 
					  \varcode{meanSquaredError()}.  The function takes three
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  function takes three arguments. The first is a 2-element vector that
 | 
					 | 
					 | 
					 | 
					  arguments. The first is a vector of $x$-values and the second
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  contains the values of parameters \varcode{m} and \varcode{b}. The
 | 
					 | 
					 | 
					 | 
					  contains the measurements $y$ for each value of $x$. The third
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  second is a vector of x-values, and the third contains the
 | 
					 | 
					 | 
					 | 
					  argument is a 2-element vector that contains the values of
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  measurements for each value of $x$, the respective $y$-values.  The
 | 
					 | 
					 | 
					 | 
					  parameters \varcode{m} and \varcode{b}. The function returns the
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  function returns the mean square error \eqnref{mseline}.
 | 
					 | 
					 | 
					 | 
					  mean square error.
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\end{exercise}
 | 
					 | 
					 | 
					 | 
					\end{exercise}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\section{Error surface}
 | 
					 | 
					 | 
					 | 
					\section{Error surface}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					For each combination of the two parameters $m$ and $b$ of the model we
 | 
					 | 
					 | 
					 | 
					For each combination of the two parameters $m$ and $b$ of the model we
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					can use \eqnref{mseline} to calculate the corresponding value of the
 | 
					 | 
					 | 
					 | 
					can use \eqnref{mseline} to calculate the corresponding value of the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					cost function. We thus consider the cost function $f_{cost}(\{(x_i,
 | 
					 | 
					 | 
					 | 
					cost function. The cost function $f_{cost}(m,b|\{(x_i, y_i)\}|)$ is a
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					y_i)\}|m,b)$ as a function $f_{cost}(m,b)$, that maps the parameter
 | 
					 | 
					 | 
					 | 
					function $f_{cost}(m,b)$, that maps the parameter values $m$ and $b$
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					values $m$ and $b$ to an error value.  The error values describe a
 | 
					 | 
					 | 
					 | 
					to a scalar error value.  The error values describe a landscape over the
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					landscape over the $m$-$b$ plane, the error surface, that can be
 | 
					 | 
					 | 
					 | 
					$m$-$b$ plane, the error surface, that can be illustrated graphically
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					illustrated graphically using a 3-d surface-plot. $m$ and $b$ are
 | 
					 | 
					 | 
					 | 
					using a 3-d surface-plot. $m$ and $b$ are plotted on the $x$- and $y$-
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					plotted on the $x-$ and $y-$ axis while the third dimension indicates
 | 
					 | 
					 | 
					 | 
					axis while the third dimension indicates the error value
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					the error value (\figref{errorsurfacefig}).
 | 
					 | 
					 | 
					 | 
					(\figref{errorsurfacefig}).
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{figure}[t]
 | 
					 | 
					 | 
					 | 
					\begin{figure}[t]
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \includegraphics[width=0.75\textwidth]{error_surface}
 | 
					 | 
					 | 
					 | 
					  \includegraphics[width=0.75\textwidth]{error_surface}
 | 
				
			
			
		
	
	
		
		
			
				
					
					| 
						
					 | 
					 | 
					@ -176,8 +171,8 @@ the error value (\figref{errorsurfacefig}).
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  calculate the mean squared error between the data and straight lines
 | 
					 | 
					 | 
					 | 
					  calculate the mean squared error between the data and straight lines
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  for a range of slopes and intercepts using the
 | 
					 | 
					 | 
					 | 
					  for a range of slopes and intercepts using the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \varcode{meanSquaredError()} function from the previous exercise.
 | 
					 | 
					 | 
					 | 
					  \varcode{meanSquaredError()} function from the previous exercise.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  Illustrates the error surface using the \code{surface()} function
 | 
					 | 
					 | 
					 | 
					  Illustrates the error surface using the \code{surface()} function.
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  (consult the help to find out how to use \code{surface()}).
 | 
					 | 
					 | 
					 | 
					  Consult the documentation to find out how to use \code{surface()}.
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\end{exercise}
 | 
					 | 
					 | 
					 | 
					\end{exercise}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					By looking at the error surface we can directly see the position of
 | 
					 | 
					 | 
					 | 
					By looking at the error surface we can directly see the position of
 | 
				
			
			
		
	
	
		
		
			
				
					
					| 
						
					 | 
					 | 
					@ -185,19 +180,21 @@ the minimum and thus estimate the optimal parameter combination. How
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					can we use the error surface to guide an automatic optimization
 | 
					 | 
					 | 
					 | 
					can we use the error surface to guide an automatic optimization
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					process?
 | 
					 | 
					 | 
					 | 
					process?
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					The obvious approach would be to calculate the error surface and then
 | 
					 | 
					 | 
					 | 
					The obvious approach would be to calculate the error surface for any
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					find the position of the minimum using the \code{min} function. This
 | 
					 | 
					 | 
					 | 
					combination of slope and intercept values and then find the position
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					approach, however has several disadvantages: (i) it is computationally
 | 
					 | 
					 | 
					 | 
					of the minimum using the \code{min} function. This approach, however
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					very expensive to calculate the error for each parameter
 | 
					 | 
					 | 
					 | 
					has several disadvantages: (i) it is computationally very expensive to
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					combination. The number of combinations increases exponentially with
 | 
					 | 
					 | 
					 | 
					calculate the error for each parameter combination. The number of
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					the number of free parameters (also known as the ``curse of
 | 
					 | 
					 | 
					 | 
					combinations increases exponentially with the number of free
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					dimensionality''). (ii) the accuracy with which the best parameters
 | 
					 | 
					 | 
					 | 
					parameters (also known as the ``curse of dimensionality''). (ii) the
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					can be estimated is limited by the resolution used to sample the
 | 
					 | 
					 | 
					 | 
					accuracy with which the best parameters can be estimated is limited by
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					parameter space. The coarser the parameters are sampled the less
 | 
					 | 
					 | 
					 | 
					the resolution used to sample the parameter space. The coarser the
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					precise is the obtained position of the minimum.
 | 
					 | 
					 | 
					 | 
					parameters are sampled the less precise is the obtained position of
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					the minimum.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					We want a procedure that finds the minimum of the cost function with a minimal number
 | 
					 | 
					 | 
					 | 
					So we need a different approach. We want a procedure that finds the
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					of computations and to arbitrary precision.
 | 
					 | 
					 | 
					 | 
					minimum of the cost function with a minimal number of computations and
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					 | 
					to arbitrary precision.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{ibox}[t]{\label{differentialquotientbox}Difference quotient and derivative}
 | 
					 | 
					 | 
					 | 
					\begin{ibox}[t]{\label{differentialquotientbox}Difference quotient and derivative}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \includegraphics[width=0.33\textwidth]{derivative}
 | 
					 | 
					 | 
					 | 
					  \includegraphics[width=0.33\textwidth]{derivative}
 | 
				
			
			
		
	
	
		
		
			
				
					
					| 
						
					 | 
					 | 
					@ -308,9 +305,9 @@ choose the opposite direction.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{exercise}{meanSquaredGradient.m}{}\label{gradientexercise}%
 | 
					 | 
					 | 
					 | 
					\begin{exercise}{meanSquaredGradient.m}{}\label{gradientexercise}%
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  Implement a function \varcode{meanSquaredGradient()}, that takes the
 | 
					 | 
					 | 
					 | 
					  Implement a function \varcode{meanSquaredGradient()}, that takes the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  set of parameters $(m, b)$ of a straight line as a two-element
 | 
					 | 
					 | 
					 | 
					  $x$- and $y$-data and the set of parameters $(m, b)$ of a straight
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  vector and the $x$- and $y$-data as input arguments. The function
 | 
					 | 
					 | 
					 | 
					  line as a two-element vector as input arguments. The function should
 | 
				
			
			
				
				
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  should return the gradient at that position as a vector with two
 | 
					 | 
					 | 
					 | 
					  return the gradient at the position $(m, b)$ as a vector with two
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  elements.
 | 
					 | 
					 | 
					 | 
					  elements.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\end{exercise}
 | 
					 | 
					 | 
					 | 
					\end{exercise}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
	
		
		
			
				
					
					| 
						
					 | 
					 | 
					@ -359,7 +356,7 @@ distance between the red dots in \figref{gradientdescentfig}) is
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					large.
 | 
					 | 
					 | 
					 | 
					large.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\begin{figure}[t]
 | 
					 | 
					 | 
					 | 
					\begin{figure}[t]
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \includegraphics[width=0.55\textwidth]{gradient_descent}
 | 
					 | 
					 | 
					 | 
					  \includegraphics[width=0.45\textwidth]{gradient_descent}
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \titlecaption{Gradient descent.}{The algorithm starts at an
 | 
					 | 
					 | 
					 | 
					  \titlecaption{Gradient descent.}{The algorithm starts at an
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					    arbitrary position. At each point the gradient is estimated and
 | 
					 | 
					 | 
					 | 
					    arbitrary position. At each point the gradient is estimated and
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					    the position is updated as long as the length of the gradient is
 | 
					 | 
					 | 
					 | 
					    the position is updated as long as the length of the gradient is
 | 
				
			
			
		
	
	
		
		
			
				
					
					| 
						
					 | 
					 | 
					@ -376,7 +373,7 @@ large.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \item Plot the error values as a function of the iterations, the
 | 
					 | 
					 | 
					 | 
					  \item Plot the error values as a function of the iterations, the
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					    number of optimization steps.
 | 
					 | 
					 | 
					 | 
					    number of optimization steps.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \item Plot the measured data together with the best fitting straight line.
 | 
					 | 
					 | 
					 | 
					  \item Plot the measured data together with the best fitting straight line.
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					  \end{enumerate}
 | 
					 | 
					 | 
					 | 
					  \end{enumerate}\vspace{-4.5ex}
 | 
				
			
			
				
				
			
		
	
		
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					\end{exercise}
 | 
					 | 
					 | 
					 | 
					\end{exercise}
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
		
		
			
				
					
					 | 
					 | 
					 | 
					
 | 
					 | 
					 | 
					 | 
					
 | 
				
			
			
		
	
	
		
		
			
				
					
					| 
						
					 | 
					 | 
					
 
 |