[regression] note on evolution

2020-12-20 21:24:46 +01:00 · 2020-12-20 21:24:46 +01:00 · c2e4d4e40c
commit c2e4d4e40c
parent 17bf940101
2 changed files with 93 additions and 31 deletions
--- a/regression/lecture/regression-chapter.tex
+++ b/regression/lecture/regression-chapter.tex
@ -23,20 +23,23 @@
 \item Fig 8.2 right: this should be a chi-squared distribution with one degree of freedom!
 \end{itemize}
-\subsection{Linear fits}
+\subsection{New chapter: non-linear fits}
 \begin{itemize}
-\item Polyfit is easy: unique solution! $c x^2$ is also a linear fit.
+\item Move 8.7 to this new chapter.
-\item Example for overfitting with polyfit of a high order (=number of data points)
+\item Example that illustrates the Nebenminima Problem (with error
-\end{itemize}
+  surface). Maybe data generate from $1/x$ and fitted with
-
+  $\exp(\lambda x)$ induces local minima.
 \subsection{Non-linear fits}
 \begin{itemize}
 \item Example that illustrates the Nebenminima Problem (with error surface)
 \item You need initial values for the parameter!
 \item Example that fitting gets harder the more parameter you have.
-\item Try to fix as many parameter before doing the fit.
+\item Try to fix as many parameters before doing the fit.
 \item How to test the quality of a fit? Residuals. $\chi^2$ test. Run-test.
 \item Impoartant box: summary of fit howtos.
 \end{itemize}
 \subsection{New chapter: linear fits --- generalized linear models}
 \begin{itemize}
 \item Polyfit is easy: unique solution! $c x^3$ is also a linear fit.
 \item Example for \emph{overfitting} with polyfit of a high order (=number of data points)
 \end{itemize}
--- a/regression/lecture/regression.tex
+++ b/regression/lecture/regression.tex
@ -576,34 +576,36 @@ our tiger data-set (\figref{powergradientdescentfig}):
 \section{Fitting non-linear functions to data}
-The gradient descent is an important numerical method for solving
+The gradient descent is a basic numerical method for solving
 optimization problems. It is used to find the global minimum of an
 objective function.
-Curve fitting is a common application for the gradient descent method.
+Curve fitting is a specific optimization problem and a common
-For the case of fitting straight lines to data pairs, the error
+application for the gradient descent method.  For the case of fitting
-surface (using the mean squared error) has exactly one clearly defined
+straight lines to data pairs, the error surface (using the mean
-global minimum. In fact, the position of the minimum can be
+squared error) has exactly one clearly defined global minimum. In
-analytically calculated as shown in the next chapter. For linear
+fact, the position of the minimum can be analytically calculated as
-fitting problems numerical methods like the gradient descent are not
+shown in the next chapter. For linear fitting problems numerical
-needed.
+methods like the gradient descent are not needed.
 Fitting problems that involve nonlinear functions of the parameters,
 e.g. the power law \eqref{powerfunc} or the exponential function
-$f(x;\lambda) = e^{\lambda x}$, do not have an analytical solution for
+$f(t;\tau) = e^{-t/\tau}$, do in general not have an analytical
-the least squares. To find the least squares for such functions
+solution for the least squares. To find the least squares for such
-numerical methods such as the gradient descent have to be applied.
+functions numerical methods such as the gradient descent have to be
 applied.
-The suggested gradient descent algorithm is quite fragile and requires
+The suggested gradient descent algorithm requires manually tuned
-manually tuned values for $\epsilon$ and the threshold for terminating
+values for $\epsilon$ and the threshold for terminating the iteration.
-the iteration.  The algorithm can be improved in multiple ways to
+The algorithm can be improved in multiple ways to converge more
-converge more robustly and faster.  For example one could adapt the
+robustly and faster.  Most importantly, $\epsilon$ is made dependent
-step size to the length of the gradient. These numerical tricks have
+on the changes of the gradient from one iteration to the next. These
-already been implemented in pre-defined functions. Generic
+and other numerical tricks have already been implemented in
-optimization functions such as \mcode{fminsearch()} have been
+pre-defined functions. Generic optimization functions such as
-implemented for arbitrary objective functions, while the more
+\mcode{fminsearch()} have been implemented for arbitrary objective
-specialized function \mcode{lsqcurvefit()} is specifically designed
+functions, while the more specialized function \mcode{lsqcurvefit()}
-for optimizations in the least square error sense.
+is specifically designed for optimizations in the least square error
 sense.
 \begin{exercise}{plotlsqcurvefitpower.m}{}
  Use the \matlab-function \varcode{lsqcurvefit()} instead of
@ -626,5 +628,62 @@ for optimizations in the least square error sense.
 \end{important}
 \section{Evolution as an optimization problem}
 Evolution is a biological implementation of an optimization
 algorithm. The objective function is an organism's fitness. This needs
 to be maximized (this is the same as minimizing the negative fitness).
 The parameters of this optimization problem are all the many genes on
 the DNA. This is a very high-dimensional optimization problem. By
 cross-over and mutations a population of a species moves along the
 high-dimensional parameter space. Selection processes make sure that
 only organisms with higher fitness pass on their genes to the next
 generations. In this way the algorithm is not directed towards higher
 fitness, as the gradient descent method would be. Rather, some
 neighborhood of the parameter space is randomly probed. That way it is
 even possible to escape a local maximum and find a potentially better
 maximum. For this reason, \enterm{genetic algorithms} try to mimic
 evolution in the context of high-dimensional optimization problems, in
 particular with discrete parameter values. In biological evolution,
 the objective function, however, is not a fixed function. It may
 change in time by changing abiotic and biotic environmental
 conditions, making this a very complex but also interesting
 optimization problem.
 How should a neuron or neural network be designed? As a particular
 aspect of the general evolution of a species, this is a fundamental
 question in the neurosciences. Maintaining a neural system is
 costly. By their simple presence neurons incur costs. They need to be
 built and maintained, they occupy space and consume
 resources. Equipping a neuron with more ion channels also costs. And
 neural activity makes it more costly to maintain concentration
 gradients of ions. This all boils down to the consumption of more ATP,
 the currency of metabolism. On the other hand each neuron provides
 some useful function. In the end, neurons make the organism to behave
 in some sensible way to increase the overall fitness of the
 organism. On the level of neurons that means that they should
 faithfully represent and process behaviorally relevant sensory
 stimuli, make a sensible decision, store some important memory, or
 initiate and control movements in a directed way. Unfortunately there
 is a tradeoff. Better neural function usually involves higher
 costs. More ion channels reduce intrinsic noise which usually favors
 the precision of neural responses. Higher neuronal activity improves
 the quality of the encoding of sensory stimuli. More neurons are
 required for more complex computations. And so on.
 Understanding why a neuronal system is designed in some specific way
 requires to also understand these tradeoffs. The number of neurons,
 the number and types of ion channels, the length of axons, the number
 of synapses, the way neurons are connected, etc. are all parameters
 that could be optimized. For the objective function the function of
 the neurons needs to be quantified, for example by measures from
 information or detection theory, and their dependence on the
 parameters. From these benefits the costs need to be subtracted. And
 then one is interested in finding the maximum of the resulting
 objective function. Maximization (or minimization) problems are not
 only a tool for data analysis, rather they are at the core of many ---
 not only biological or neuroscientific --- problems.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \printsolutions