ALMOST finished the methods section.

2026-05-15 16:45:51 +02:00
parent cbd0af7a5f
commit 155fb1eecf
46 changed files with 561 additions and 450 deletions
--- a/main.tex
+++ b/main.tex
@@ -79,18 +79,14 @@
 \newcommand{\kf}{\omega} % Unspecific Gabor kernel frequency
 \newcommand{\kp}{\phi} % Unspecific Gabor kernel phase
 \newcommand{\kn}{n} % Unspecific Gabor kernel lobe number
-% \newcommand{\ks}{s} % Unspecific Gabor kernel sign
 \newcommand{\kwi}{\kw_i} % Specific Gabor kernel width
 \newcommand{\kfi}{\kf_i} % Specific Gabor kernel frequency
 \newcommand{\kpi}{\kp_i} % Specific Gabor kernel phase
 \newcommand{\kni}{\kn_i} % Specific Gabor kernel lobe number
-% \newcommand{\ksi}{\ks_i} % Specific Gabor kernel sign

 % Math shorthands - Auxiliary kernel parameters:
-\newcommand{\fsin}{f_{\text{sin}}} % Carrier frequency
-\newcommand{\rh}{h_{\text{rel}}} % Relative Gaussian height for FWRH
-\newcommand{\fwrh}{\text{FWRH}} % Gaussian full-width at relative height
-\newcommand{\off}{\beta_0} % Offset for linear frequency approximation
+\newcommand{\fdrm}{\text{FDRM}} % Gaussian full duration relative to maximum
+\newcommand{\rh}{h_{\text{rel}}} % Relative Gaussian height for FDRM calculation

 % Math shorthands - Thresholding nonlinearity:
 \newcommand{\thr}{\Theta_i} % Step function threshold value
@@ -287,6 +283,20 @@ approximation by basic mathematical operations. We then elaborate on the key
 mechanisms that drive the emergence of intensity-invariant song representations
 within the auditory pathway.

+% RIPPED FROM RESULTS, MAYBE INTEGRATE SOMEWHERE HERE:
+% The robustness of song recognition is tied to the degree of intensity
+% invariance of the finalized feature representation. Ideally, the values of each
+% feature should depend only on the relative amplitude dynamics of the song
+% pattern but not on the overall intensity of the song. In the grasshopper, the
+% emergence of intensity-invariant representations along the song recognition
+% pathway likely is a distributed process that involves different neuronal
+% populations, which raises the question of what the essential computational
+% mechanisms are that drive this process. Within the model pathway, we identified
+% two key mechanisms that render the song representation more invariant to
+% intensity variations. The two mechanisms each comprise a nonlinear signal
+% transformation followed by a linear signal transformation but differ in the
+% specific operations involved, as outlined in the following sections.
+
 % SCRAPPED UNTIL FURTHER NOTICE:
 % Multi-species, multi-individual communally inhabited environments\\
 % - Temporal overlap: Simultaneous singing across individuals/species common\\
@@ -328,43 +338,38 @@ on PyPi.

 \subsection{Functional model of the grasshopper song recognition pathway}

-% Too long (no splitting, only pruning).
-The essence of constructing a functional model of a given system is to gain a
-sufficient understanding of the system's essential structural components and
-their presumed functional roles; and to then build a formal framework of
-manageable complexity around these two aspects. Anatomically, the organization
-of the grasshopper song recognition pathway can be outlined as a feed-forward
-network of three consecutive neuronal
-populations~(Fig.\,\mbox{\ref{fig:pathway}a-c}): Peripheral auditory receptor
-neurons, whose axons enter the ventral nerve cord at the level of the
-metathoracic ganglion; local interneurons that remain exclusively within the
-thoracic region of the ventral nerve cord; and ascending neurons projecting
-from the thoracic region towards the supraesophageal
-ganglion~(\bcite{rehbein1974structure}; \bcite{rehbein1976auditory};
+The anatomical organisation of the grasshopper song recognition pathway can be
+outlined as a feed-forward network of three consecutive neuronal
+populations~(Fig.\,\ref{fig:pathway}a-c): Peripheral auditory receptor neurons,
+whose axons enter the ventral nerve cord (VNC) at the level of the metathoracic
+ganglion; local interneurons that remain exclusively within the thoracic region
+of the VNC; and ascending neurons projecting from the thoracic region towards
+the supraesophageal ganglion (SEG), or central
+brain~(\bcite{rehbein1974structure}; \bcite{rehbein1976auditory};
 \bcite{eichendorf1980projections}). The input to the network originates at the
 tympanal membrane, which acts as acoustic receiver and is coupled to the
 dendritic endings of the receptor neurons~(\bcite{gray1960fine}). The outputs
-from the network converge in the supraesophageal ganglion, which is presumed to
-harbor the neuronal substrate for conspecific song recognition and response
+from the network converge in the SEG, which presumably harbors the neuronal
+substrate for conspecific song recognition and response
 initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
-\bcite{bhavsar2017brain}). Functionally, the ascending neurons are the most
-diverse of the three populations along the pathway. Individual ascending
-neurons possess highly specific response properties that contrast with the
-rather homogeneous response properties of the preceding receptor neurons and
-local interneurons~(\bcite{clemens2011efficient}), indicating a transition from
-a uniform population-wide processing stream into several parallel branches.
-Based on these anatomical and physiological considerations, the overall
-structure of the model pathway is divided into two distinct
-stages~(Fig.\,\ref{fig:pathway}d). The preprocessing stage incorporates the
-known physiological processing steps at the levels of the tympanal membrane,
-the receptor neurons, and the local interneurons; and operates on
-one-dimensional signal representations. The feature extraction stage
-corresponds to the processing within the ascending neurons and further
-downstream towards the supraesophageal ganglion; and operates on
-high-dimensional signal representations. The details of each physiological
-processing step and its functional approximation within the two stages are
-outlined in the following sections.
+\bcite{bhavsar2017brain}).

+Functionally, the ascending neurons are the most diverse of the three neuronal
+populations. Individual ascending neurons possess highly specific response
+properties that contrast with the rather homogeneous response properties of the
+preceding receptor neurons and local
+interneurons~(\bcite{clemens2011efficient}), which indicates a transition from
+a uniform population-wide processing stream into several parallel branches.
+Accordingly, the model pathway is divided into two distinct
+stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
+processing steps at the levels of the tympanal membrane, the receptor neurons,
+and the local interneurons; and operates on one-dimensional signal
+representations~(Fig.\,\ref{fig:stages_pre}). The feature extraction stage
+corresponds to the processing within the ascending neurons and further
+downstream towards the SEG; and operates on high-dimensional signal
+representations~(Fig.\,\ref{fig:stages_feat}). The details of each
+physiological processing step and its functional approximation are described in
+the following sections.
 \begin{figure}[!ht]
    \centering
    \includegraphics[width=\textwidth]{figures/fig_auditory_pathway.pdf}
@@ -389,53 +394,54 @@ outlined in the following sections.

 \subsubsection{Population-driven signal preprocessing}

-Grasshoppers receive airborne sound waves by a tympanal organ at either side of
+Grasshoppers receive airborne sound waves by a tympanal organ at each side of
 the body. The tympanal membrane acts as a mechanical resonance filter for
 sound-induced vibrations~(\bcite{windmill2008time}; \bcite{malkin2014energy}).
 Vibrations that fall within specific frequency bands are focused on different
 membrane areas, while others are attenuated. This processing step can be
-approximated by an initial bandpass filter
+approximated by an initial bandpass filter~(Fig.\,\ref{fig:stages_pre}a)
+applied to the acoustic input signal $\raw(t)$:
 \begin{equation}
    \filt(t)\,=\,\raw(t)\,*\,\bp, \qquad \fc\,=\,5\,\text{kHz},\,30\,\text{kHz}
    \label{eq:bandpass}
 \end{equation}
-applied to the acoustic input signal $\raw(t)$. The auditory receptor neurons
-transduce the vibrations of the tympanal membrane into sequences of action
-potentials. Thereby, they encode the amplitude modulation, or envelope, of the
-signal~(\bcite{machens2001discrimination}), which likely involves a rectifying
-nonlinearity~(\bcite{machens2001representation}). This can be modelled as
-full-wave rectification followed by lowpass filtering
+The receptor neurons transduce the vibrations of the tympanal membrane into
+sequences of action potentials. They thereby encode the amplitude modulation,
+or envelope, of the signal~(\bcite{machens2001discrimination}), which likely
+involves a rectifying nonlinearity~(\bcite{machens2001representation}). The
+extraction of the signal envelope~(Fig.\,\ref{fig:stages_pre}b) can be modelled
+as full-wave rectification followed by lowpass filtering of the tympanal signal
+$\filt(t)$:
 \begin{equation}
    \env(t)\,=\,|\filt(t)|\,*\,\lp, \qquad \fc\,=\,250\,\text{Hz}
    \label{eq:env}
 \end{equation}
-of the tympanal signal $\filt(t)$. Furthermore, the receptors exhibit a
-sigmoidal response curve over logarithmically compressed intensity
-levels~(\bcite{suga1960peripheral}; \bcite{gollisch2002energy}). In the model
-pathway, logarithmic compression is achieved by conversion to decibel scale
+Furthermore, the receptors exhibit a sigmoidal response curve over
+logarithmically compressed stimulus intensities~(\bcite{suga1960peripheral};
+\bcite{gollisch2002energy}). In the model pathway, logarithmic
+compression~(Fig.\,\ref{fig:stages_pre}c) is achieved by conversion to decibel
+scale
 \begin{equation}
    \db(t)\,=\,20\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,1
    \label{eq:log}
 \end{equation}
-relative to the common reference intensity $\dbref$.
-Both the receptor neurons~(\bcite{romer1976informationsverarbeitung};
-\bcite{gollisch2004input}; \bcite{fisch2012channel}) and, on a larger scale,
-the subsequent local interneurons~(\bcite{hildebrandt2009origin};
-\bcite{clemens2010intensity}) adapt their firing rates in response to sustained
-stimulus intensity levels, which allows for the robust encoding of faster
-amplitude modulations against a slowly changing overall baseline intensity.
-Functionally, the adaptation mechanism resembles a highpass filter
+relative to the common reference intensity $\dbref$. Both the receptor
+neurons~(\bcite{romer1976informationsverarbeitung}; \bcite{gollisch2004input};
+\bcite{fisch2012channel}) and, on a larger scale, the subsequent local
+interneurons~(\bcite{hildebrandt2009origin}; \bcite{clemens2010intensity})
+adapt their firing rates in response to sustained stimulus intensities, which
+allows for the robust encoding of faster amplitude modulations against a slowly
+changing overall baseline intensity. Functionally, the adaptation mechanism
+resembles a highpass filter~(Fig.\,\ref{fig:stages_pre}d) over the
+logarithmically compressed envelope $\db(t)$:
 \begin{equation}
    \adapt(t)\,=\,\db(t)\,*\,\hp, \qquad \fc\,=\,10\,\text{Hz}
    \label{eq:highpass}
 \end{equation}
-over the logarithmically scaled envelope $\db(t)$. This processing step
-concludes the preprocessing stage of the model pathway. The resulting
-intensity-adapted envelope $\adapt(t)$ is then passed on from the local
-interneurons to the ascending neurons, where it serves as the basis for the
-following feature extraction stage.
-
-% Cite somewhere:
+This processing step concludes the preprocessing stage of the model pathway.
+The resulting intensity-adapted envelope $\adapt(t)$ is then passed on from the
+local interneurons to the ascending neurons, where it serves as the basis for
+the following feature extraction stage.
 \begin{figure}[!ht]
    \centering
    \includegraphics[width=\textwidth]{figures/fig_pre_stages.pdf}
@@ -453,59 +459,71 @@ following feature extraction stage.
 \subsubsection{Feature extraction by individual neurons}

 The ascending neurons extract and encode a number of different features of the
-preprocessed signal. As a population, they hence represent the signal in a
-higher-dimensional space than the preceding receptor neurons and local
-interneurons. Each ascending neuron is assumed to scan the signal for a
-specific template pattern, which can be thought of as a kernel of a particular
-structure and on a particular time scale. This process, known as template
-matching, can be modelled as a convolution
+preprocessed signal, and hence represent the signal in a higher-dimensional
+space than the preceding receptor neurons and local interneurons. Each
+ascending neuron is assumed to scan the signal for a specific template pattern,
+which can be thought of as a kernel of a particular structure and on a
+particular time scale. This process, known as template matching, can be
+modelled as a convolution of the intensity-adapted envelope $\adapt(t)$ with a
+kernel $k_i(t)$ specific to the $i$-th ascending neuron:
 \begin{equation}
    c_i(t)\,=\,\adapt(t)\,*\,k_i(t)
    = \infint \adapt(\tau)\,\cdot\,k_i(t\,-\,\tau)\,d\tau
    \label{eq:conv}
 \end{equation}
-of the intensity-adapted envelope $\adapt(t)$ with a kernel $k_i(t)$ per
-ascending neuron. We use Gabor kernels as basis functions for creating
-different template patterns. An arbitrary one-dimensional, real Gabor kernel is
-generated by multiplication of a Gaussian envelope and a sinusoidal carrier
+We use Gabor kernels as basis functions for creating different template
+patterns. An arbitrary one-dimensional, real Gabor kernel is generated by
+multiplication of a Gaussian envelope with standard deviation or kernel width
+$\kwi$ and a sinusoidal carrier with frequency $\kfi$ and phase $\kpi$:
 \begin{equation}
-    k_i(t,\,\kwi,\,\kfi,\,\kpi)\,=\,e^{-\frac{t^{2}}{2{\kwi}^{2}}}\,\cdot\,\sin(\kfi\,t\,+\,\kpi), \qquad \kfi\,=\,2\pi\fsin
+    k_i(t,\,\kwi,\,\kfi,\,\kpi)\,=\,e^{-\frac{t^{2}}{2{\kwi}^{2}}}\,\cdot\,\sin(\kfi\,t\,+\,\kpi), \qquad \kfi\,=\,2\pi f_{\text{sin}_i}
    \label{eq:gabor}
 \end{equation}
-with Gaussian standard deviation or kernel width $\kwi$, carrier frequency
-$\kfi$, and carrier phase $\kpi$. Different combinations of $\kw$ and $\kf$
-result in Gabor kernels with different lobe number $\kn$, which is the number
-of half-periods of the carrier that fit under the Gaussian envelope within
-reasonable limits of attenuation. The interval under the Gaussian envelope that
-contains the relevant lobes of the kernel can be defined as Gaussian full-width
-measured at relative peak height $\rh$
+Different combinations of $\kwi$ and $\kfi$ result in Gabor kernels with
+different lobe number $\kni$, which is the number of half-periods of the
+carrier that fit under the Gaussian envelope within reasonable limits of
+attenuation. The time window under the Gaussian envelope that contains the
+relevant lobes of the kernel can be defined as Gaussian full duration at height
+$\rh$ relative to the maximum of the Gaussian:
 \begin{equation}
-    \fwrh(\kw,\,\rh)\,=\,2\,\cdot\,\sqrt{-2\,\cdot\,\ln \rh}\cdot\,\kw, \qquad \rh\,\in\,(0,\,1]
+    \fdrm(\kwi,\,\rh)\,=\,2\,\cdot\,\sqrt{-2\,\cdot\,\ln \rh}\cdot\,\kwi, \qquad \rh\,\in\,(0,\,1]
+    \label{eq:fdrm}
+\end{equation}
+% Yes, FDRM is a hideous acronym. Based on the common "full width at half
+% maximum" (FWHM) and adjusted because "full duration at half maximum" (FDHM)
+% is apparently preferred in a temporal context. Alternatively, "w_\text{gauss}"?
+With this, an appropriate carrier frequency $\kfi$ for obtaining a Gabor kernel
+with width $\kwi$ and desired lobe number $\kni$ can be approximated as
+\begin{equation}
+    \kfi(\kni,\,\kwi,\,\rh)\,=\,\frac{0.5\,\cdot\,(\kni\,+\,\beta_0)}{\fdrm(\kwi,\,\rh)}, \qquad \kni\,\geq\,2\enspace\forall\enspace \kni\,\in\,\mathbb{Z}
+    \label{eq:gabor_freq}
 \end{equation}
-With this, an appropriate carrier frequency $\kf$ for obtaining a Gabor kernel
-with width $\kw$ and desired lobe number $\kn$ can be approximated as
 % \begin{equation}
-%     \kf(\kn,\,\fwrh)\,=\,\frac{0.5\,\cdot\,\kn\,+\,\off}{\fwrh}, \qquad \kn\,\geq\,2\enspace\forall\enspace \kn\,\in\,\mathbb{Z}
+%     \kfi(\kni,\,\kwi,\,\rh)\,=\,\frac{0.5\,\cdot\,(\kni\,+\,\beta_0)}{2\,\cdot\,\sqrt{-2\,\cdot\,\ln \rh}\cdot\kwi}, \qquad \kni\,\geq\,2\enspace\forall\enspace \kni\,\in\,\mathbb{Z}
 % \end{equation}
-\begin{equation}
-    \kf(\kn,\,\kw,\,\rh)\,=\,\frac{\kn\,+\,\off}{4\,\cdot\,\sqrt{-2\,\cdot\,\ln \rh}}, \qquad \kn\,\geq\,2\enspace\forall\enspace \kn\,\in\,\mathbb{Z}
-\end{equation}
-where $\off$ is a small positive offset to the near-linear relationship between
-$\kf$ and $\kn$ to balance the amplitude of the $\kn$ desired lobes of the
-kernel --- which should be maximized --- against the amplitude of the
-next-outer lobes, which should not exceed the threshold value determined by
-$\rh$. For $\kn=1$, carrier frequency $\kf$ is set to zero, which results in a
-simple Gaussian kernel. Carrier phase $\kp$ determines the position of the
-kernel lobes relative to the kernel center. By setting $\kp$ to one of only
-four specific phase values~(Tab.\,\ref{tab:gabor_phases}), we restrict the
-Gabor kernels to be either even functions~(mirror-symmetric, uneven $\kn$) or
-odd functions~(point-symmetric, even $\kn$) with either positive or negative
-sign, which refers to the sign of the kernel's central lobe (even kernels) or
-the left of the two central lobes (odd kernels).
+The relationship between $\kfi$ and $\kni$ is approximately linear except for
+small $\kni$. The offset term $\beta_0\approx0.5$ was added to balance the
+amplitudes of the $\kni$ desired lobes of the kernel --- which should be
+maximized --- against the amplitudes of the next-outer lobes, which should not
+exceed the threshold value determined by $\rh$. Note that simple Gaussian
+kernels with $\kni=1$ can be obtained by setting the carrier frequency to
+$\kfi=0$ and are hence not covered by Eq.\,\ref{eq:gabor_freq}.
+
+Carrier phase $\kpi$ determines the position of the kernel lobes relative to
+the kernel center. We restrict the Gabor kernels to be either even or odd
+functions by setting $\kpi$ to one of only four specific phase
+values~(Tab.\,\ref{tab:gabor_phases}). Even Gabor kernels are mirror-symmetric
+with uneven $\kni$, whereas odd Gabor kernels are point-symmetric with even
+$\kni$. Both even and odd kernels can have either positive or negative sign,
+which refers to the sign of the kernel's central lobe (even kernels) or the
+left of the two central lobes (odd kernels). These four major groups of Gabor
+kernels allow for the extraction of different types of signal features, such as
+the presence of peaks (even, $+$), troughs (even, $-$), onsets (odd, $+$), and
+offsets (odd, $-$) at various time scales.
 \FloatBarrier
 \begin{table}[!ht]
    \centering
-    \captionsetup{width=.46\textwidth}
+    \captionsetup{width=.45\textwidth}
    \caption{Values of phase $\kp$ that are specific for the four major groups
             of Gabor kernels.}
    \begin{tabular}{|ccc|}
@@ -519,13 +537,10 @@ the left of the two central lobes (odd kernels).
    \label{tab:gabor_phases}
 \end{table}
 \FloatBarrier
-These four major groups of Gabor kernels allow for the extraction of different
-types of signal features, such as the presence of peaks (even, $+$), troughs
-(even, $-$), onsets (odd, $+$), and offsets (odd, $-$) at various time scales.
-% Add kernel normalization here.
-Following the convolutional template matching, each kernel-specific response
-$c_i(t)$ is passed through a shifted Heaviside step-function $\nl$ with
-threshold value $\thr$ to obtain a binary response
+Following the convolutional template matching~(Fig.\,\ref{fig:stages_feat}a),
+each kernel-specific response $c_i(t)$ is passed through a shifted Heaviside
+step-function $\nl$ with threshold value $\thr$ to obtain a binary
+response~(Fig.\,\ref{fig:stages_feat}b):
 \begin{equation}
    b_i(t,\,\thr)\,=\,\begin{cases}
        \;1, \quad c_i(t)\,>\,\thr\\
@@ -533,12 +548,12 @@ threshold value $\thr$ to obtain a binary response
    \end{cases}
    \label{eq:binary}
 \end{equation}
-which can be thought of as a categorization into "relevant" and "irrelevant"
-response values. In the grasshopper, these thresholding nonlinearities might
-either be part of the processing within the ascending neurons or take place
-further downstream~(SOURCE). Finally, the responses of the ascending neurons
-are assumed to be integrated somewhere in the supraesophageal
-ganglion~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
+The thresholding of $c_i(t)$ into $b_i(t)$ can be thought of as a
+categorization into "relevant" and "irrelevant" response values.
+% It is unclear whether such a thresholding nonlinearity is actually implemented
+% either by the ascending neurons or at some point further downstream in the SEG.
+Finally, the responses of the ascending neurons are assumed to be integrated
+somewhere in the SEG~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
 \bcite{bhavsar2017brain}). This processing step can be approximated as temporal
 averaging of the binary responses $b_i(t)$ by a lowpass filter
 \begin{equation}
@@ -752,26 +767,23 @@ array was moved as close to the grasshopper as possible without interrupting
 its song production, which amounts to an approximate offset distance of 10\,cm
 between the animal and the leading microphone. Care was taken to maintain a
 stable position and height of the microphone array during recording. The
-resulting recordings were then processed through the model pathway and analysed
+resulting recordings were then processed through the model pathway and analyzed
 according to the procedure described in Section~\ref{sec:intensity_measures}.

 \section{Results}

 \subsection{Mechanisms driving the emergence of intensity invariance}

-% Still missing the SNR analysis. Should be able to write around it for now.
-The robustness of song recognition is tied to the degree of intensity
-invariance of the finalized feature representation. Ideally, the values of each
-feature should depend only on the relative amplitude dynamics of the song
-pattern but not on the overall intensity of the song. In the grasshopper, the
-emergence of intensity-invariant representations along the song recognition
-pathway likely is a distributed process that involves different neuronal
-populations, which raises the question of what the essential computational
-mechanisms are that drive this process. Within the model pathway, we identified
-two key mechanisms that render the song representation more invariant to
-intensity variations. The two mechanisms each comprise a nonlinear signal
-transformation followed by a linear signal transformation but differ in the
-specific operations involved, as outlined in the following sections.
+It is not necessary to test each processing step along the model pathway for
+intensity invariance. Instead, we can focus on those steps that involve
+nonlinear transformations, since these are the only steps that can potentially
+change the dependency on scale $\sca$ between the input and output
+representations. Overall, there are three nonlinear transformations along the
+model pathway: Full-wave rectification during envelope extraction, logarithmic
+compression, and the thresholding nonlinearity during feature extraction. In
+the following, we analyze the effects of each of these transformations on the
+intensity and SNR of the resulting representations as well as their potential
+contribution to intensity invariance.

 \subsubsection{Full-wave rectification \& lowpass filtering}