diff --git a/regression/lecture/regression-chapter.tex b/regression/lecture/regression-chapter.tex index 45fdfda..8798b2a 100644 --- a/regression/lecture/regression-chapter.tex +++ b/regression/lecture/regression-chapter.tex @@ -23,20 +23,23 @@ \item Fig 8.2 right: this should be a chi-squared distribution with one degree of freedom! \end{itemize} -\subsection{Linear fits} +\subsection{New chapter: non-linear fits} \begin{itemize} -\item Polyfit is easy: unique solution! $c x^2$ is also a linear fit. -\item Example for overfitting with polyfit of a high order (=number of data points) -\end{itemize} - - -\subsection{Non-linear fits} -\begin{itemize} -\item Example that illustrates the Nebenminima Problem (with error surface) +\item Move 8.7 to this new chapter. +\item Example that illustrates the Nebenminima Problem (with error + surface). Maybe data generate from $1/x$ and fitted with + $\exp(\lambda x)$ induces local minima. \item You need initial values for the parameter! \item Example that fitting gets harder the more parameter you have. -\item Try to fix as many parameter before doing the fit. +\item Try to fix as many parameters before doing the fit. \item How to test the quality of a fit? Residuals. $\chi^2$ test. Run-test. +\item Impoartant box: summary of fit howtos. +\end{itemize} + +\subsection{New chapter: linear fits --- generalized linear models} +\begin{itemize} +\item Polyfit is easy: unique solution! $c x^3$ is also a linear fit. +\item Example for \emph{overfitting} with polyfit of a high order (=number of data points) \end{itemize} diff --git a/regression/lecture/regression.tex b/regression/lecture/regression.tex index bfff7bb..de9630c 100644 --- a/regression/lecture/regression.tex +++ b/regression/lecture/regression.tex @@ -576,34 +576,36 @@ our tiger data-set (\figref{powergradientdescentfig}): \section{Fitting non-linear functions to data} -The gradient descent is an important numerical method for solving +The gradient descent is a basic numerical method for solving optimization problems. It is used to find the global minimum of an objective function. -Curve fitting is a common application for the gradient descent method. -For the case of fitting straight lines to data pairs, the error -surface (using the mean squared error) has exactly one clearly defined -global minimum. In fact, the position of the minimum can be -analytically calculated as shown in the next chapter. For linear -fitting problems numerical methods like the gradient descent are not -needed. +Curve fitting is a specific optimization problem and a common +application for the gradient descent method. For the case of fitting +straight lines to data pairs, the error surface (using the mean +squared error) has exactly one clearly defined global minimum. In +fact, the position of the minimum can be analytically calculated as +shown in the next chapter. For linear fitting problems numerical +methods like the gradient descent are not needed. Fitting problems that involve nonlinear functions of the parameters, e.g. the power law \eqref{powerfunc} or the exponential function -$f(x;\lambda) = e^{\lambda x}$, do not have an analytical solution for -the least squares. To find the least squares for such functions -numerical methods such as the gradient descent have to be applied. - -The suggested gradient descent algorithm is quite fragile and requires -manually tuned values for $\epsilon$ and the threshold for terminating -the iteration. The algorithm can be improved in multiple ways to -converge more robustly and faster. For example one could adapt the -step size to the length of the gradient. These numerical tricks have -already been implemented in pre-defined functions. Generic -optimization functions such as \mcode{fminsearch()} have been -implemented for arbitrary objective functions, while the more -specialized function \mcode{lsqcurvefit()} is specifically designed -for optimizations in the least square error sense. +$f(t;\tau) = e^{-t/\tau}$, do in general not have an analytical +solution for the least squares. To find the least squares for such +functions numerical methods such as the gradient descent have to be +applied. + +The suggested gradient descent algorithm requires manually tuned +values for $\epsilon$ and the threshold for terminating the iteration. +The algorithm can be improved in multiple ways to converge more +robustly and faster. Most importantly, $\epsilon$ is made dependent +on the changes of the gradient from one iteration to the next. These +and other numerical tricks have already been implemented in +pre-defined functions. Generic optimization functions such as +\mcode{fminsearch()} have been implemented for arbitrary objective +functions, while the more specialized function \mcode{lsqcurvefit()} +is specifically designed for optimizations in the least square error +sense. \begin{exercise}{plotlsqcurvefitpower.m}{} Use the \matlab-function \varcode{lsqcurvefit()} instead of @@ -626,5 +628,62 @@ for optimizations in the least square error sense. \end{important} +\section{Evolution as an optimization problem} + +Evolution is a biological implementation of an optimization +algorithm. The objective function is an organism's fitness. This needs +to be maximized (this is the same as minimizing the negative fitness). +The parameters of this optimization problem are all the many genes on +the DNA. This is a very high-dimensional optimization problem. By +cross-over and mutations a population of a species moves along the +high-dimensional parameter space. Selection processes make sure that +only organisms with higher fitness pass on their genes to the next +generations. In this way the algorithm is not directed towards higher +fitness, as the gradient descent method would be. Rather, some +neighborhood of the parameter space is randomly probed. That way it is +even possible to escape a local maximum and find a potentially better +maximum. For this reason, \enterm{genetic algorithms} try to mimic +evolution in the context of high-dimensional optimization problems, in +particular with discrete parameter values. In biological evolution, +the objective function, however, is not a fixed function. It may +change in time by changing abiotic and biotic environmental +conditions, making this a very complex but also interesting +optimization problem. + +How should a neuron or neural network be designed? As a particular +aspect of the general evolution of a species, this is a fundamental +question in the neurosciences. Maintaining a neural system is +costly. By their simple presence neurons incur costs. They need to be +built and maintained, they occupy space and consume +resources. Equipping a neuron with more ion channels also costs. And +neural activity makes it more costly to maintain concentration +gradients of ions. This all boils down to the consumption of more ATP, +the currency of metabolism. On the other hand each neuron provides +some useful function. In the end, neurons make the organism to behave +in some sensible way to increase the overall fitness of the +organism. On the level of neurons that means that they should +faithfully represent and process behaviorally relevant sensory +stimuli, make a sensible decision, store some important memory, or +initiate and control movements in a directed way. Unfortunately there +is a tradeoff. Better neural function usually involves higher +costs. More ion channels reduce intrinsic noise which usually favors +the precision of neural responses. Higher neuronal activity improves +the quality of the encoding of sensory stimuli. More neurons are +required for more complex computations. And so on. + +Understanding why a neuronal system is designed in some specific way +requires to also understand these tradeoffs. The number of neurons, +the number and types of ion channels, the length of axons, the number +of synapses, the way neurons are connected, etc. are all parameters +that could be optimized. For the objective function the function of +the neurons needs to be quantified, for example by measures from +information or detection theory, and their dependence on the +parameters. From these benefits the costs need to be subtracted. And +then one is interested in finding the maximum of the resulting +objective function. Maximization (or minimization) problems are not +only a tool for data analysis, rather they are at the core of many --- +not only biological or neuroscientific --- problems. + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \printsolutions