From c2e4d4e40cf24bc697f05989fe996070a544bd41 Mon Sep 17 00:00:00 2001
From: Jan Benda <jan.benda@uni-tuebingen.de>
Date: Sun, 20 Dec 2020 21:24:46 +0100
Subject: [PATCH] [regression] note on evolution

---
 regression/lecture/regression-chapter.tex |  23 ++---
 regression/lecture/regression.tex         | 103 +++++++++++++++++-----
 2 files changed, 94 insertions(+), 32 deletions(-)

diff --git a/regression/lecture/regression-chapter.tex b/regression/lecture/regression-chapter.tex
index 45fdfda..8798b2a 100644
--- a/regression/lecture/regression-chapter.tex
+++ b/regression/lecture/regression-chapter.tex
@@ -23,20 +23,23 @@
 \item Fig 8.2 right: this should be a chi-squared distribution with one degree of freedom!
 \end{itemize}
 
-\subsection{Linear fits}
+\subsection{New chapter: non-linear fits}
 \begin{itemize}
-\item Polyfit is easy: unique solution! $c x^2$ is also a linear fit.
-\item Example for overfitting with polyfit of a high order (=number of data points)
-\end{itemize}
-
-
-\subsection{Non-linear fits}
-\begin{itemize}
-\item Example that illustrates the Nebenminima Problem (with error surface)
+\item Move 8.7 to this new chapter.
+\item Example that illustrates the Nebenminima Problem (with error
+  surface). Maybe data generate from $1/x$ and fitted with
+  $\exp(\lambda x)$ induces local minima.
 \item You need initial values for the parameter!
 \item Example that fitting gets harder the more parameter you have.
-\item Try to fix as many parameter before doing the fit.
+\item Try to fix as many parameters before doing the fit.
 \item How to test the quality of a fit? Residuals. $\chi^2$ test. Run-test.
+\item Impoartant box: summary of fit howtos.
+\end{itemize}
+
+\subsection{New chapter: linear fits --- generalized linear models}
+\begin{itemize}
+\item Polyfit is easy: unique solution! $c x^3$ is also a linear fit.
+\item Example for \emph{overfitting} with polyfit of a high order (=number of data points)
 \end{itemize}
 
 
diff --git a/regression/lecture/regression.tex b/regression/lecture/regression.tex
index bfff7bb..de9630c 100644
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@@ -576,34 +576,36 @@ our tiger data-set (\figref{powergradientdescentfig}):
 
 \section{Fitting non-linear functions to data}
 
-The gradient descent is an important numerical method for solving
+The gradient descent is a basic numerical method for solving
 optimization problems. It is used to find the global minimum of an
 objective function.
 
-Curve fitting is a common application for the gradient descent method.
-For the case of fitting straight lines to data pairs, the error
-surface (using the mean squared error) has exactly one clearly defined
-global minimum. In fact, the position of the minimum can be
-analytically calculated as shown in the next chapter. For linear
-fitting problems numerical methods like the gradient descent are not
-needed.
+Curve fitting is a specific optimization problem and a common
+application for the gradient descent method.  For the case of fitting
+straight lines to data pairs, the error surface (using the mean
+squared error) has exactly one clearly defined global minimum. In
+fact, the position of the minimum can be analytically calculated as
+shown in the next chapter. For linear fitting problems numerical
+methods like the gradient descent are not needed.
 
 Fitting problems that involve nonlinear functions of the parameters,
 e.g. the power law \eqref{powerfunc} or the exponential function
-$f(x;\lambda) = e^{\lambda x}$, do not have an analytical solution for
-the least squares. To find the least squares for such functions
-numerical methods such as the gradient descent have to be applied.
-
-The suggested gradient descent algorithm is quite fragile and requires
-manually tuned values for $\epsilon$ and the threshold for terminating
-the iteration.  The algorithm can be improved in multiple ways to
-converge more robustly and faster.  For example one could adapt the
-step size to the length of the gradient. These numerical tricks have
-already been implemented in pre-defined functions. Generic
-optimization functions such as \mcode{fminsearch()} have been
-implemented for arbitrary objective functions, while the more
-specialized function \mcode{lsqcurvefit()} is specifically designed
-for optimizations in the least square error sense.
+$f(t;\tau) = e^{-t/\tau}$, do in general not have an analytical
+solution for the least squares. To find the least squares for such
+functions numerical methods such as the gradient descent have to be
+applied.
+
+The suggested gradient descent algorithm requires manually tuned
+values for $\epsilon$ and the threshold for terminating the iteration.
+The algorithm can be improved in multiple ways to converge more
+robustly and faster.  Most importantly, $\epsilon$ is made dependent
+on the changes of the gradient from one iteration to the next. These
+and other numerical tricks have already been implemented in
+pre-defined functions. Generic optimization functions such as
+\mcode{fminsearch()} have been implemented for arbitrary objective
+functions, while the more specialized function \mcode{lsqcurvefit()}
+is specifically designed for optimizations in the least square error
+sense.
 
 \begin{exercise}{plotlsqcurvefitpower.m}{}
   Use the \matlab-function \varcode{lsqcurvefit()} instead of
@@ -626,5 +628,62 @@ for optimizations in the least square error sense.
 \end{important}
 
 
+\section{Evolution as an optimization problem}
+
+Evolution is a biological implementation of an optimization
+algorithm. The objective function is an organism's fitness. This needs
+to be maximized (this is the same as minimizing the negative fitness).
+The parameters of this optimization problem are all the many genes on
+the DNA. This is a very high-dimensional optimization problem. By
+cross-over and mutations a population of a species moves along the
+high-dimensional parameter space. Selection processes make sure that
+only organisms with higher fitness pass on their genes to the next
+generations. In this way the algorithm is not directed towards higher
+fitness, as the gradient descent method would be. Rather, some
+neighborhood of the parameter space is randomly probed. That way it is
+even possible to escape a local maximum and find a potentially better
+maximum. For this reason, \enterm{genetic algorithms} try to mimic
+evolution in the context of high-dimensional optimization problems, in
+particular with discrete parameter values. In biological evolution,
+the objective function, however, is not a fixed function. It may
+change in time by changing abiotic and biotic environmental
+conditions, making this a very complex but also interesting
+optimization problem.
+
+How should a neuron or neural network be designed? As a particular
+aspect of the general evolution of a species, this is a fundamental
+question in the neurosciences. Maintaining a neural system is
+costly. By their simple presence neurons incur costs. They need to be
+built and maintained, they occupy space and consume
+resources. Equipping a neuron with more ion channels also costs. And
+neural activity makes it more costly to maintain concentration
+gradients of ions. This all boils down to the consumption of more ATP,
+the currency of metabolism. On the other hand each neuron provides
+some useful function. In the end, neurons make the organism to behave
+in some sensible way to increase the overall fitness of the
+organism. On the level of neurons that means that they should
+faithfully represent and process behaviorally relevant sensory
+stimuli, make a sensible decision, store some important memory, or
+initiate and control movements in a directed way. Unfortunately there
+is a tradeoff. Better neural function usually involves higher
+costs. More ion channels reduce intrinsic noise which usually favors
+the precision of neural responses. Higher neuronal activity improves
+the quality of the encoding of sensory stimuli. More neurons are
+required for more complex computations. And so on.
+
+Understanding why a neuronal system is designed in some specific way
+requires to also understand these tradeoffs. The number of neurons,
+the number and types of ion channels, the length of axons, the number
+of synapses, the way neurons are connected, etc. are all parameters
+that could be optimized. For the objective function the function of
+the neurons needs to be quantified, for example by measures from
+information or detection theory, and their dependence on the
+parameters. From these benefits the costs need to be subtracted. And
+then one is interested in finding the maximum of the resulting
+objective function. Maximization (or minimization) problems are not
+only a tool for data analysis, rather they are at the core of many ---
+not only biological or neuroscientific --- problems.
+
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \printsolutions