ALMOST finished the methods section.

This commit is contained in:
j-hartling
2026-05-15 16:45:51 +02:00
parent cbd0af7a5f
commit 155fb1eecf
46 changed files with 561 additions and 450 deletions

272
main.tex
View File

@@ -79,18 +79,14 @@
\newcommand{\kf}{\omega} % Unspecific Gabor kernel frequency
\newcommand{\kp}{\phi} % Unspecific Gabor kernel phase
\newcommand{\kn}{n} % Unspecific Gabor kernel lobe number
% \newcommand{\ks}{s} % Unspecific Gabor kernel sign
\newcommand{\kwi}{\kw_i} % Specific Gabor kernel width
\newcommand{\kfi}{\kf_i} % Specific Gabor kernel frequency
\newcommand{\kpi}{\kp_i} % Specific Gabor kernel phase
\newcommand{\kni}{\kn_i} % Specific Gabor kernel lobe number
% \newcommand{\ksi}{\ks_i} % Specific Gabor kernel sign
% Math shorthands - Auxiliary kernel parameters:
\newcommand{\fsin}{f_{\text{sin}}} % Carrier frequency
\newcommand{\rh}{h_{\text{rel}}} % Relative Gaussian height for FWRH
\newcommand{\fwrh}{\text{FWRH}} % Gaussian full-width at relative height
\newcommand{\off}{\beta_0} % Offset for linear frequency approximation
\newcommand{\fdrm}{\text{FDRM}} % Gaussian full duration relative to maximum
\newcommand{\rh}{h_{\text{rel}}} % Relative Gaussian height for FDRM calculation
% Math shorthands - Thresholding nonlinearity:
\newcommand{\thr}{\Theta_i} % Step function threshold value
@@ -287,6 +283,20 @@ approximation by basic mathematical operations. We then elaborate on the key
mechanisms that drive the emergence of intensity-invariant song representations
within the auditory pathway.
% RIPPED FROM RESULTS, MAYBE INTEGRATE SOMEWHERE HERE:
% The robustness of song recognition is tied to the degree of intensity
% invariance of the finalized feature representation. Ideally, the values of each
% feature should depend only on the relative amplitude dynamics of the song
% pattern but not on the overall intensity of the song. In the grasshopper, the
% emergence of intensity-invariant representations along the song recognition
% pathway likely is a distributed process that involves different neuronal
% populations, which raises the question of what the essential computational
% mechanisms are that drive this process. Within the model pathway, we identified
% two key mechanisms that render the song representation more invariant to
% intensity variations. The two mechanisms each comprise a nonlinear signal
% transformation followed by a linear signal transformation but differ in the
% specific operations involved, as outlined in the following sections.
% SCRAPPED UNTIL FURTHER NOTICE:
% Multi-species, multi-individual communally inhabited environments\\
% - Temporal overlap: Simultaneous singing across individuals/species common\\
@@ -328,43 +338,38 @@ on PyPi.
\subsection{Functional model of the grasshopper song recognition pathway}
% Too long (no splitting, only pruning).
The essence of constructing a functional model of a given system is to gain a
sufficient understanding of the system's essential structural components and
their presumed functional roles; and to then build a formal framework of
manageable complexity around these two aspects. Anatomically, the organization
of the grasshopper song recognition pathway can be outlined as a feed-forward
network of three consecutive neuronal
populations~(Fig.\,\mbox{\ref{fig:pathway}a-c}): Peripheral auditory receptor
neurons, whose axons enter the ventral nerve cord at the level of the
metathoracic ganglion; local interneurons that remain exclusively within the
thoracic region of the ventral nerve cord; and ascending neurons projecting
from the thoracic region towards the supraesophageal
ganglion~(\bcite{rehbein1974structure}; \bcite{rehbein1976auditory};
The anatomical organisation of the grasshopper song recognition pathway can be
outlined as a feed-forward network of three consecutive neuronal
populations~(Fig.\,\ref{fig:pathway}a-c): Peripheral auditory receptor neurons,
whose axons enter the ventral nerve cord (VNC) at the level of the metathoracic
ganglion; local interneurons that remain exclusively within the thoracic region
of the VNC; and ascending neurons projecting from the thoracic region towards
the supraesophageal ganglion (SEG), or central
brain~(\bcite{rehbein1974structure}; \bcite{rehbein1976auditory};
\bcite{eichendorf1980projections}). The input to the network originates at the
tympanal membrane, which acts as acoustic receiver and is coupled to the
dendritic endings of the receptor neurons~(\bcite{gray1960fine}). The outputs
from the network converge in the supraesophageal ganglion, which is presumed to
harbor the neuronal substrate for conspecific song recognition and response
from the network converge in the SEG, which presumably harbors the neuronal
substrate for conspecific song recognition and response
initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
\bcite{bhavsar2017brain}). Functionally, the ascending neurons are the most
diverse of the three populations along the pathway. Individual ascending
neurons possess highly specific response properties that contrast with the
rather homogeneous response properties of the preceding receptor neurons and
local interneurons~(\bcite{clemens2011efficient}), indicating a transition from
a uniform population-wide processing stream into several parallel branches.
Based on these anatomical and physiological considerations, the overall
structure of the model pathway is divided into two distinct
stages~(Fig.\,\ref{fig:pathway}d). The preprocessing stage incorporates the
known physiological processing steps at the levels of the tympanal membrane,
the receptor neurons, and the local interneurons; and operates on
one-dimensional signal representations. The feature extraction stage
corresponds to the processing within the ascending neurons and further
downstream towards the supraesophageal ganglion; and operates on
high-dimensional signal representations. The details of each physiological
processing step and its functional approximation within the two stages are
outlined in the following sections.
\bcite{bhavsar2017brain}).
Functionally, the ascending neurons are the most diverse of the three neuronal
populations. Individual ascending neurons possess highly specific response
properties that contrast with the rather homogeneous response properties of the
preceding receptor neurons and local
interneurons~(\bcite{clemens2011efficient}), which indicates a transition from
a uniform population-wide processing stream into several parallel branches.
Accordingly, the model pathway is divided into two distinct
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
processing steps at the levels of the tympanal membrane, the receptor neurons,
and the local interneurons; and operates on one-dimensional signal
representations~(Fig.\,\ref{fig:stages_pre}). The feature extraction stage
corresponds to the processing within the ascending neurons and further
downstream towards the SEG; and operates on high-dimensional signal
representations~(Fig.\,\ref{fig:stages_feat}). The details of each
physiological processing step and its functional approximation are described in
the following sections.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_auditory_pathway.pdf}
@@ -389,53 +394,54 @@ outlined in the following sections.
\subsubsection{Population-driven signal preprocessing}
Grasshoppers receive airborne sound waves by a tympanal organ at either side of
Grasshoppers receive airborne sound waves by a tympanal organ at each side of
the body. The tympanal membrane acts as a mechanical resonance filter for
sound-induced vibrations~(\bcite{windmill2008time}; \bcite{malkin2014energy}).
Vibrations that fall within specific frequency bands are focused on different
membrane areas, while others are attenuated. This processing step can be
approximated by an initial bandpass filter
approximated by an initial bandpass filter~(Fig.\,\ref{fig:stages_pre}a)
applied to the acoustic input signal $\raw(t)$:
\begin{equation}
\filt(t)\,=\,\raw(t)\,*\,\bp, \qquad \fc\,=\,5\,\text{kHz},\,30\,\text{kHz}
\label{eq:bandpass}
\end{equation}
applied to the acoustic input signal $\raw(t)$. The auditory receptor neurons
transduce the vibrations of the tympanal membrane into sequences of action
potentials. Thereby, they encode the amplitude modulation, or envelope, of the
signal~(\bcite{machens2001discrimination}), which likely involves a rectifying
nonlinearity~(\bcite{machens2001representation}). This can be modelled as
full-wave rectification followed by lowpass filtering
The receptor neurons transduce the vibrations of the tympanal membrane into
sequences of action potentials. They thereby encode the amplitude modulation,
or envelope, of the signal~(\bcite{machens2001discrimination}), which likely
involves a rectifying nonlinearity~(\bcite{machens2001representation}). The
extraction of the signal envelope~(Fig.\,\ref{fig:stages_pre}b) can be modelled
as full-wave rectification followed by lowpass filtering of the tympanal signal
$\filt(t)$:
\begin{equation}
\env(t)\,=\,|\filt(t)|\,*\,\lp, \qquad \fc\,=\,250\,\text{Hz}
\label{eq:env}
\end{equation}
of the tympanal signal $\filt(t)$. Furthermore, the receptors exhibit a
sigmoidal response curve over logarithmically compressed intensity
levels~(\bcite{suga1960peripheral}; \bcite{gollisch2002energy}). In the model
pathway, logarithmic compression is achieved by conversion to decibel scale
Furthermore, the receptors exhibit a sigmoidal response curve over
logarithmically compressed stimulus intensities~(\bcite{suga1960peripheral};
\bcite{gollisch2002energy}). In the model pathway, logarithmic
compression~(Fig.\,\ref{fig:stages_pre}c) is achieved by conversion to decibel
scale
\begin{equation}
\db(t)\,=\,20\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,1
\label{eq:log}
\end{equation}
relative to the common reference intensity $\dbref$.
Both the receptor neurons~(\bcite{romer1976informationsverarbeitung};
\bcite{gollisch2004input}; \bcite{fisch2012channel}) and, on a larger scale,
the subsequent local interneurons~(\bcite{hildebrandt2009origin};
\bcite{clemens2010intensity}) adapt their firing rates in response to sustained
stimulus intensity levels, which allows for the robust encoding of faster
amplitude modulations against a slowly changing overall baseline intensity.
Functionally, the adaptation mechanism resembles a highpass filter
relative to the common reference intensity $\dbref$. Both the receptor
neurons~(\bcite{romer1976informationsverarbeitung}; \bcite{gollisch2004input};
\bcite{fisch2012channel}) and, on a larger scale, the subsequent local
interneurons~(\bcite{hildebrandt2009origin}; \bcite{clemens2010intensity})
adapt their firing rates in response to sustained stimulus intensities, which
allows for the robust encoding of faster amplitude modulations against a slowly
changing overall baseline intensity. Functionally, the adaptation mechanism
resembles a highpass filter~(Fig.\,\ref{fig:stages_pre}d) over the
logarithmically compressed envelope $\db(t)$:
\begin{equation}
\adapt(t)\,=\,\db(t)\,*\,\hp, \qquad \fc\,=\,10\,\text{Hz}
\label{eq:highpass}
\end{equation}
over the logarithmically scaled envelope $\db(t)$. This processing step
concludes the preprocessing stage of the model pathway. The resulting
intensity-adapted envelope $\adapt(t)$ is then passed on from the local
interneurons to the ascending neurons, where it serves as the basis for the
following feature extraction stage.
% Cite somewhere:
This processing step concludes the preprocessing stage of the model pathway.
The resulting intensity-adapted envelope $\adapt(t)$ is then passed on from the
local interneurons to the ascending neurons, where it serves as the basis for
the following feature extraction stage.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_pre_stages.pdf}
@@ -453,59 +459,71 @@ following feature extraction stage.
\subsubsection{Feature extraction by individual neurons}
The ascending neurons extract and encode a number of different features of the
preprocessed signal. As a population, they hence represent the signal in a
higher-dimensional space than the preceding receptor neurons and local
interneurons. Each ascending neuron is assumed to scan the signal for a
specific template pattern, which can be thought of as a kernel of a particular
structure and on a particular time scale. This process, known as template
matching, can be modelled as a convolution
preprocessed signal, and hence represent the signal in a higher-dimensional
space than the preceding receptor neurons and local interneurons. Each
ascending neuron is assumed to scan the signal for a specific template pattern,
which can be thought of as a kernel of a particular structure and on a
particular time scale. This process, known as template matching, can be
modelled as a convolution of the intensity-adapted envelope $\adapt(t)$ with a
kernel $k_i(t)$ specific to the $i$-th ascending neuron:
\begin{equation}
c_i(t)\,=\,\adapt(t)\,*\,k_i(t)
= \infint \adapt(\tau)\,\cdot\,k_i(t\,-\,\tau)\,d\tau
\label{eq:conv}
\end{equation}
of the intensity-adapted envelope $\adapt(t)$ with a kernel $k_i(t)$ per
ascending neuron. We use Gabor kernels as basis functions for creating
different template patterns. An arbitrary one-dimensional, real Gabor kernel is
generated by multiplication of a Gaussian envelope and a sinusoidal carrier
We use Gabor kernels as basis functions for creating different template
patterns. An arbitrary one-dimensional, real Gabor kernel is generated by
multiplication of a Gaussian envelope with standard deviation or kernel width
$\kwi$ and a sinusoidal carrier with frequency $\kfi$ and phase $\kpi$:
\begin{equation}
k_i(t,\,\kwi,\,\kfi,\,\kpi)\,=\,e^{-\frac{t^{2}}{2{\kwi}^{2}}}\,\cdot\,\sin(\kfi\,t\,+\,\kpi), \qquad \kfi\,=\,2\pi\fsin
k_i(t,\,\kwi,\,\kfi,\,\kpi)\,=\,e^{-\frac{t^{2}}{2{\kwi}^{2}}}\,\cdot\,\sin(\kfi\,t\,+\,\kpi), \qquad \kfi\,=\,2\pi f_{\text{sin}_i}
\label{eq:gabor}
\end{equation}
with Gaussian standard deviation or kernel width $\kwi$, carrier frequency
$\kfi$, and carrier phase $\kpi$. Different combinations of $\kw$ and $\kf$
result in Gabor kernels with different lobe number $\kn$, which is the number
of half-periods of the carrier that fit under the Gaussian envelope within
reasonable limits of attenuation. The interval under the Gaussian envelope that
contains the relevant lobes of the kernel can be defined as Gaussian full-width
measured at relative peak height $\rh$
Different combinations of $\kwi$ and $\kfi$ result in Gabor kernels with
different lobe number $\kni$, which is the number of half-periods of the
carrier that fit under the Gaussian envelope within reasonable limits of
attenuation. The time window under the Gaussian envelope that contains the
relevant lobes of the kernel can be defined as Gaussian full duration at height
$\rh$ relative to the maximum of the Gaussian:
\begin{equation}
\fwrh(\kw,\,\rh)\,=\,2\,\cdot\,\sqrt{-2\,\cdot\,\ln \rh}\cdot\,\kw, \qquad \rh\,\in\,(0,\,1]
\fdrm(\kwi,\,\rh)\,=\,2\,\cdot\,\sqrt{-2\,\cdot\,\ln \rh}\cdot\,\kwi, \qquad \rh\,\in\,(0,\,1]
\label{eq:fdrm}
\end{equation}
% Yes, FDRM is a hideous acronym. Based on the common "full width at half
% maximum" (FWHM) and adjusted because "full duration at half maximum" (FDHM)
% is apparently preferred in a temporal context. Alternatively, "w_\text{gauss}"?
With this, an appropriate carrier frequency $\kfi$ for obtaining a Gabor kernel
with width $\kwi$ and desired lobe number $\kni$ can be approximated as
\begin{equation}
\kfi(\kni,\,\kwi,\,\rh)\,=\,\frac{0.5\,\cdot\,(\kni\,+\,\beta_0)}{\fdrm(\kwi,\,\rh)}, \qquad \kni\,\geq\,2\enspace\forall\enspace \kni\,\in\,\mathbb{Z}
\label{eq:gabor_freq}
\end{equation}
With this, an appropriate carrier frequency $\kf$ for obtaining a Gabor kernel
with width $\kw$ and desired lobe number $\kn$ can be approximated as
% \begin{equation}
% \kf(\kn,\,\fwrh)\,=\,\frac{0.5\,\cdot\,\kn\,+\,\off}{\fwrh}, \qquad \kn\,\geq\,2\enspace\forall\enspace \kn\,\in\,\mathbb{Z}
% \kfi(\kni,\,\kwi,\,\rh)\,=\,\frac{0.5\,\cdot\,(\kni\,+\,\beta_0)}{2\,\cdot\,\sqrt{-2\,\cdot\,\ln \rh}\cdot\kwi}, \qquad \kni\,\geq\,2\enspace\forall\enspace \kni\,\in\,\mathbb{Z}
% \end{equation}
\begin{equation}
\kf(\kn,\,\kw,\,\rh)\,=\,\frac{\kn\,+\,\off}{4\,\cdot\,\sqrt{-2\,\cdot\,\ln \rh}}, \qquad \kn\,\geq\,2\enspace\forall\enspace \kn\,\in\,\mathbb{Z}
\end{equation}
where $\off$ is a small positive offset to the near-linear relationship between
$\kf$ and $\kn$ to balance the amplitude of the $\kn$ desired lobes of the
kernel --- which should be maximized --- against the amplitude of the
next-outer lobes, which should not exceed the threshold value determined by
$\rh$. For $\kn=1$, carrier frequency $\kf$ is set to zero, which results in a
simple Gaussian kernel. Carrier phase $\kp$ determines the position of the
kernel lobes relative to the kernel center. By setting $\kp$ to one of only
four specific phase values~(Tab.\,\ref{tab:gabor_phases}), we restrict the
Gabor kernels to be either even functions~(mirror-symmetric, uneven $\kn$) or
odd functions~(point-symmetric, even $\kn$) with either positive or negative
sign, which refers to the sign of the kernel's central lobe (even kernels) or
the left of the two central lobes (odd kernels).
The relationship between $\kfi$ and $\kni$ is approximately linear except for
small $\kni$. The offset term $\beta_0\approx0.5$ was added to balance the
amplitudes of the $\kni$ desired lobes of the kernel --- which should be
maximized --- against the amplitudes of the next-outer lobes, which should not
exceed the threshold value determined by $\rh$. Note that simple Gaussian
kernels with $\kni=1$ can be obtained by setting the carrier frequency to
$\kfi=0$ and are hence not covered by Eq.\,\ref{eq:gabor_freq}.
Carrier phase $\kpi$ determines the position of the kernel lobes relative to
the kernel center. We restrict the Gabor kernels to be either even or odd
functions by setting $\kpi$ to one of only four specific phase
values~(Tab.\,\ref{tab:gabor_phases}). Even Gabor kernels are mirror-symmetric
with uneven $\kni$, whereas odd Gabor kernels are point-symmetric with even
$\kni$. Both even and odd kernels can have either positive or negative sign,
which refers to the sign of the kernel's central lobe (even kernels) or the
left of the two central lobes (odd kernels). These four major groups of Gabor
kernels allow for the extraction of different types of signal features, such as
the presence of peaks (even, $+$), troughs (even, $-$), onsets (odd, $+$), and
offsets (odd, $-$) at various time scales.
\FloatBarrier
\begin{table}[!ht]
\centering
\captionsetup{width=.46\textwidth}
\captionsetup{width=.45\textwidth}
\caption{Values of phase $\kp$ that are specific for the four major groups
of Gabor kernels.}
\begin{tabular}{|ccc|}
@@ -519,13 +537,10 @@ the left of the two central lobes (odd kernels).
\label{tab:gabor_phases}
\end{table}
\FloatBarrier
These four major groups of Gabor kernels allow for the extraction of different
types of signal features, such as the presence of peaks (even, $+$), troughs
(even, $-$), onsets (odd, $+$), and offsets (odd, $-$) at various time scales.
% Add kernel normalization here.
Following the convolutional template matching, each kernel-specific response
$c_i(t)$ is passed through a shifted Heaviside step-function $\nl$ with
threshold value $\thr$ to obtain a binary response
Following the convolutional template matching~(Fig.\,\ref{fig:stages_feat}a),
each kernel-specific response $c_i(t)$ is passed through a shifted Heaviside
step-function $\nl$ with threshold value $\thr$ to obtain a binary
response~(Fig.\,\ref{fig:stages_feat}b):
\begin{equation}
b_i(t,\,\thr)\,=\,\begin{cases}
\;1, \quad c_i(t)\,>\,\thr\\
@@ -533,12 +548,12 @@ threshold value $\thr$ to obtain a binary response
\end{cases}
\label{eq:binary}
\end{equation}
which can be thought of as a categorization into "relevant" and "irrelevant"
response values. In the grasshopper, these thresholding nonlinearities might
either be part of the processing within the ascending neurons or take place
further downstream~(SOURCE). Finally, the responses of the ascending neurons
are assumed to be integrated somewhere in the supraesophageal
ganglion~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
The thresholding of $c_i(t)$ into $b_i(t)$ can be thought of as a
categorization into "relevant" and "irrelevant" response values.
% It is unclear whether such a thresholding nonlinearity is actually implemented
% either by the ascending neurons or at some point further downstream in the SEG.
Finally, the responses of the ascending neurons are assumed to be integrated
somewhere in the SEG~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
\bcite{bhavsar2017brain}). This processing step can be approximated as temporal
averaging of the binary responses $b_i(t)$ by a lowpass filter
\begin{equation}
@@ -752,26 +767,23 @@ array was moved as close to the grasshopper as possible without interrupting
its song production, which amounts to an approximate offset distance of 10\,cm
between the animal and the leading microphone. Care was taken to maintain a
stable position and height of the microphone array during recording. The
resulting recordings were then processed through the model pathway and analysed
resulting recordings were then processed through the model pathway and analyzed
according to the procedure described in Section~\ref{sec:intensity_measures}.
\section{Results}
\subsection{Mechanisms driving the emergence of intensity invariance}
% Still missing the SNR analysis. Should be able to write around it for now.
The robustness of song recognition is tied to the degree of intensity
invariance of the finalized feature representation. Ideally, the values of each
feature should depend only on the relative amplitude dynamics of the song
pattern but not on the overall intensity of the song. In the grasshopper, the
emergence of intensity-invariant representations along the song recognition
pathway likely is a distributed process that involves different neuronal
populations, which raises the question of what the essential computational
mechanisms are that drive this process. Within the model pathway, we identified
two key mechanisms that render the song representation more invariant to
intensity variations. The two mechanisms each comprise a nonlinear signal
transformation followed by a linear signal transformation but differ in the
specific operations involved, as outlined in the following sections.
It is not necessary to test each processing step along the model pathway for
intensity invariance. Instead, we can focus on those steps that involve
nonlinear transformations, since these are the only steps that can potentially
change the dependency on scale $\sca$ between the input and output
representations. Overall, there are three nonlinear transformations along the
model pathway: Full-wave rectification during envelope extraction, logarithmic
compression, and the thresholding nonlinearity during feature extraction. In
the following, we analyze the effects of each of these transformations on the
intensity and SNR of the resulting representations as well as their potential
contribution to intensity invariance.
\subsubsection{Full-wave rectification \& lowpass filtering}