scientificComputing/simulations/lecture/simulations.tex

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Stochastic simulations}
\label{stochasticsimulationschapter}
\exercisechapter{Stochastic simulations}

The real power of computers for data analysis is the possibility to
run simulations. Experimental data of almost unlimited sample sizes
can be simulated in no time. This allows to explore basic concepts,
the ones we introduce in the following chapters and many more, with
well controlled data sets that are free of confounding pecularities of
real data sets. With simulated data we can also test our own analysis
functions. More importantly, by means of simulations we can explore
possible outcomes of our experiments before we even started the
experiment. We could even explore possible results for regimes that we
cannot test experimentally. How dynamical systems, like for example
predator-prey interactions or the activity of neurons or whole brains,
evolve in time is a central application for simulations. The advent of
computers at the second half of the twentieth century pushed the
exciting field of nonlinear dynamical systems forward. Conceptually,
many kinds of simulations are very simple and are implemented in a few
lines of code.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Random numbers}

\begin{figure}[t]
  \includegraphics{randomnumbers}
  \titlecaption{\label{randomnumbersfig} Random numbers.}  {Numerical
    random number generators return sequences of numbers that are as
    random as possible. The returned values approximate a given
    probability distribution. Here, for example, a uniform
    distribution between zero and one (top left). The serial
    correlation (auto-correlation) of the returned sequences is zero
    at any lag except for zero (bottom left). However, by setting a
    seed, defining the state, or spawning the random number generator,
    the exact same sequence can be generated (right).}
\end{figure}

At the heart of many simulations are random numbers that we get from
\enterm[random number generator]{random number generators}.  These are
numerical algorithms that return sequences of numbers that appear to
be as random as possible (\figref{randomnumbersfig}). If we draw
random numbers using, for example, the \code{rand()} function, then
these numbers are indeed uniformly distributed and have a mean of one
half. Subsequent numbers are independent of each other, i.e. the
autocorrelation function is zero everywhere except at lag
zero. However, numerical random number generators have a period, after
which they repeat the exact same sequence of numbers. This
differentiates them from truely random numbers and hence they are
called \enterm[random number generator!pseudo]{pseudo random number
  generators}. In rare cases this periodicity can induce problems in
simulations whenever more random numbers than the period of the random
number generator are used. Luckily, nowadays the periods of random
nunmber generators are very large. Periods are at least in the range
of $2^{64}$, that is about $10^{18}$, or even larger.

The pseudo randomness of numerical random number generators also has
an advantage.  They allow to repeat exactly the same sequence of
random numbers. After defining the state of the generator or setting a
\enterm{seed} with the \code{rng()} function, a particular sequence of
random numbers is generated by subsequent calls of the random number
generator. This way we can not only precisly define the statistics of
noise in our simulated data, but we can repeat an experiment with
exactly the same sequence of noise values.  This is useful for plots
that involve some random numbers but should look the same whenever the
script is run.

\begin{exercise}{randomnumbers.m}{randomnumbers.out}
  First, read the documentation of the \varcode{rand()} function and
  check its output for some (small) input arguments.

  Generate three times the same sequence of 10 uniformly distributed
  numbers using the \code{rand()} and \code{rng()} functions.

  Generate 10\,000 uniformly distributed random numbers and compute
  the correlation coefficient between each number and the next one in
  the sequence. This is the serial correlation at lag one.
\end{exercise}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Univariate data}
A basic type of stochastic simulations is to draw random numbers from
a given distribution like, for example, the normal distribution. This
simulates repeated measurements of some quantity (weight of tigers,
firing rate of neurons). Doing so we must specify from which
probability distribution the data should originate from and what are
the parameters of that distribution (mean, standard deviation, shape
parameters). How to illustrate and quantify univariate data, no matter
whether they have been actually measured or whether they have been
simulated as described in the following, is described in
chapter~\ref{descriptivestatisticschapter}.

\subsection{Normally distributed data}

\begin{figure}[t]
  \includegraphics{normaldata}
  \titlecaption{\label{normaldatafig} Generating normally distributed
    data.}{With the help of a computer the weight of 300 tigers can be
    measured in no time using the \code{randn()} function (left).  By
    construction we then even know the population distribution (red
    line, right), its mean (here 220\,kg) and standard deviation
    (40\,kg) from which the simulated data values were drawn (yellow
    histogram).}
\end{figure}

For drawing numbers from a normal distribution
\begin{equation}
  p_{norm}(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}
\end{equation}
with zero mean and unit standard deviation we use the \code{randn()}
function. This function returns normally distributed numbers $\xi_i$
with zero mean and unit standard deviation. For changing the standard
deviation we need to multiply the returned data values with the
required standard deviation $\sigma$. For changing the mean we just
add the desired mean $\mu$ to the random numbers
(\figref{normaldatafig}):
\begin{equation}
  x_i = \sigma \xi_i + \mu
\end{equation}

\begin{exercise}{normaldata.m}{normaldata.out}
  Write a little script that generates $n=100$ normally distributed
  data simulating the weight of Bengal tiger males with mean 220\,kg
  and standard deviation 40\,kg. Check the actual mean and standard
  deviation of the generated data. Do this, let's say, five times
  using a for-loop. Then increase $n$ to 10\,000 and run the code
  again. It is so simple to measure the weight of 10\,000 tigers for
  getting a really good estimate of their mean weight, isn't it?
  Finally plot from the last generated data set of tiger weights the
  first 1000 values as a function of their index.
\end{exercise}

\subsection{Other probability densities}
We can draw random numbers not only from normal
distributions. Functions are provided that let you draw random numbers
of almost any probability distribution. They differ in their shape and
in the number of parameters they have. Most probability distributions
are parameterized by a location parameter that usually coincides with
their mean, and a scale parameter that often coincides with their
standard deviation. Some distributions have additional shape
parameters.

For example, the time intervals $t$ between randomly generated action
potentials (a so called Poisson spike train, see
chapter~\ref{pointprocesseschapter}) are exponentially distributed ---
as are the intervals between state transitions of ion channels, or the
intervals between radioactive decays. The exponential distribution is
only defined for positive time intervals:
\begin{equation}
  \label{expdistribution}
  p_{exp}(t) = \lambda e^{-\lambda t}  \; , \quad t \ge 0, \; \lambda > 0
\end{equation}
A single rate parameter $\lambda$ measured in Hertz parameterizes the
exponential distribution. It defines how often per time unit the event
happens. Both the mean interval between the events and the
corresponding standard deviation equal the inverse rate.

\begin{exercise}{exprandom.m}{exprandom.out}
  Draw $n=200$ random time intervals from an exponential distribution
  with rate $\lambda=50$\,Hz. Calculate the mean and the standard
  deviation of the random numbers and compare them with the expected
  values.
\end{exercise}


\subsection{Uniformly distributed random numbers}

\begin{figure}[t]
  Exponential distribution of ISIs.
\end{figure}

The \code{rand()} function returns uniformly distributed random
numbers between zero and one
\begin{equation}
  p_{uni}(x) = \left\{\begin{array}{rcl} 1 & ; & 0 \le x < 1 \\
                               0 & ; & \text{else} \end{array}\right.
\end{equation}
For random numbers $x_i$ uniformly distributed between $x_{\rm min}$
and $x_{\rm max}$ we multiply by the desired range $x_{\rm max} -
x_{\rm min}$ and add $x_{\rm min}$ to the numbers $u_i$ returned by
\code{rand()}:
\begin{equation}
  x_i = (x_{\rm max} - x_{\rm min}) u_i + x_{\rm min}
\end{equation}

\begin{exercise}{unirandom.m}{unirandom.out}
  Simulate the times of $n=10\,000$ action potentials occuring
  randomly within a time interval of 100\,s using the \code{rand()}
  function.  Sort the times using \code{sort()}, compute the time
  intervals between the action potentials (interspike intervals, ISIs)
  using the \code{diff()} function, plot a histogram of the interspike
  intervals in milliseconds, and compute the mean and standard
  deviation of the interspike intervals, also in milliseconds. Compare
  the mean ISI with the firing rate (number of action potentials per
  time).
\end{exercise}


\subsection{Simulating probabilities}
Simulating events that happen with some probability $P$ is also
possible. That could be the probability of head showing up when
flipping a coin, getting an action potential within some time, a ion
channel to open, ... For this draw a random number $u_i$ from a
uniform distribution between zero and one (\code{rand()}). If the
random number is lower than $P$, the event happens, if it is larger
the event does not occur.

\begin{exercise}{probability.m}{probability.out}
  Let's flip a coin 20 times. The coin is biased and shows head with a
  probability of $P=0.6$. Count the number of heads.
\end{exercise}


The gamma distribution (\code{gamrnd()}) phenomenologically describes
various types of interspike interval dsitributions
(chapter~\ref{pointprocesseschapter}). scale and shape. exercise.

\subsection{Random integers}
\code{randi()}


\section{Random walks}
A basic concept for stochastic models is the \enterm{random walk}. A
walker starts at some initial position $x_0$. At every iteration $n$
it takes a step $\delta_n$: $x_n = x_{n-1} + \delta_n$. The walker
either steps to the right ($\delta_n = +1$) with some probability $p$
or to the left ($\delta_n = -1$) with probability $q = 1-p$.

\begin{figure}[tb]
  \includegraphics{randomwalkone}
  \titlecaption{\label{randomwalkonefig} Random walk in one
    dimension.}{Twelve random walks starting at $x_0=0$. The ensemble
    average of the position of the random walkers marked by the blue
    line and the corresponding standard deviation by the shaded
    area. The standard deviation increases with the square root of the
    number of steps taken. The top plot shows a symmetric random walk
    where the probability of a step to the right equals the one for a
    step to the left ($p=q=\frac{1}{2}$). The bottom plot shows a
    random walk with drift. Here the probability of taking a step to
    the right ($p=0.6$) is larger than the probability for a step to
    the left ($q=0.4$).}
\end{figure}

For a \enterm[random walk!symmetric]{symmetric random walk}
(\figref{randomwalkonefig} top), the probabilities for a step to the
left or to the right are the same, $p = q = \frac{1}{2}$. The average
position of many walkers is then independent of the iteration step and
equals the initial position $x_0$. The variance of each step is
$\sigma_{\delta_n}^2 = p \cdot 1^2 + q \cdot(-1)^2 = 1$. Since each
step is independent of the steps before, the variances of all the
steps add up such that the total variance of the walker position at
step $n$ equals $\sigma_{x_n}^2 = \sum_{i=1}^n \sigma_{\delta_n}^2 =
n$. The standard deviation of the walker positions grows with the
square root of the number of iteration steps $n$: $\sigma_{x_n} =
\sqrt{n}$.

\enterm[random walk!with drift]{Random walks with drift} are biased to
one direction (\figref{randomwalkonefig} bottom). The probability to
step to the left does not equal the probability to step to the right,
$p \ne q$. Consequently, the average step is no longer zero. Instead
it equals $\mu_{\delta_n} = p \cdot 1 + q \cdot (-1) = 2p - 1$. The
average position grows proportional to the number of iterations:
$\mu_{x_n} = n(2p - 1)$. The variance of each step turns out to be
$\sigma_{\delta_n}^2 = 4pq$ (this indeed equals one for
$p=q=\frac{1}{2}$). Again, because of the independence of the steps,
the variance of the position at the $n$-th iteration, $\sigma_{x_n}^2
= 4pqn$, is proportional to $n$ and the standard deviation,
$\sigma_{x_n} = 2\sqrt{pqn}$, grows with the square root of $n$.

\begin{exercise}{}{}
  Random walk exercise.
\end{exercise}

\begin{figure}[tb]
  \includegraphics{randomwalkneuron}
  \titlecaption{\label{randomwalkneuronfig} Random walk spiking neuron.}{.}
\end{figure}

The sub-threshold membrane potential of a neuron can be modeled as a
one-dimensional random walk. The thousands of excitatory and
inhibitory synaptic inputs of a principal neuron in the brain make the
membrane potential to randomly step upwards or downwards. Once the
membrane potential hits a threshold an action potential is generated
and the membrane potential is reset to the resting potential. The
threshold in this context is often called an ``absorbing
boundary''. The time it takes from the initial position to cross the
threshold is the ``first-passage time''. For neurons this is the
interspike interval. For symmetric random walks the first-passage
times are again exponentially distributed. ??? Is that so ???

% Gerstein and Mandelbrot 1964

\begin{exercise}{}{}
  Random walk neuron exercise.
\end{exercise}

\begin{figure}[tb]
  \includegraphics{randomwalktwo}
  \titlecaption{\label{randomwalktwofig} Random walk in two
    dimensions.}{Fivehundred steps of five random walkers starting at
    the origin (black dot) and taking steps of distance one either up,
    down, to the left, or to the right with equal probability.}
\end{figure}

Higher dimensions (\figref{randomwalktwofig}), Brownian motion. Crystal growth.

\begin{exercise}{}{}
  Random walk in 2D exercise.
\end{exercise}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Bivariate data}

\subsection{Static nonlinearities}

\begin{figure}[t]
  \includegraphics{staticnonlinearity}
  \titlecaption{\label{staticnonlinearityfig} Generating data
    fluctuating around a function.}{The conductance of the
    mechontransducer channels in hair cells of the inner ear is a
    saturating function of the deflection of their hairs (left, red
    line).  Measured data will fluctuate around this function (blue
    dots).  Ideally the residuals (yellow histogram) are normally
    distributed (right, red line).}
\end{figure}

Example: mechanotransduciton! (\figref{staticnonlinearityfig})

draw (and plot) random functions.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Dynamical systems}

\begin{itemize}
\item iterated maps
\item euler forward, odeint
\item introduce derivatives which are also needed for fitting (move box from there here)
\item Passive membrane
\item Add passive membrane to mechanotransduction!
\item Integrate and fire
\item Fitzugh-Nagumo
\item Two coupled neurons? Predator-prey?
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Summary}
with outook to other simulations (cellular automata, monte carlo, etc.)


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\printsolutions