Kind of done drafting the discussion. Needs polishing.

2026-05-29 18:00:36 +02:00
parent 1878fb5eaf
commit dea5923dd7
2 changed files with 128 additions and 112 deletions
--- a/main.tex
+++ b/main.tex
@@ -258,14 +258,15 @@ substrate for conspecific song recognition and response
 initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
 \bcite{bhavsar2017brain}).

-Functionally, the ascending neurons are the most diverse of the three neuronal
-populations. Around 15 to 20 ascending neurons have been identified in the
-grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
-ascending neurons possess highly specific response properties that contrast
-with the rather homogeneous response properties of the preceding receptor
-neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
-a transition from a uniform population-wide processing stream into several
-parallel branches. Accordingly, the model pathway is divided into two distinct
+Around 15 to 20 ascending neurons have been identified in the grasshopper
+auditory system~(\bcite{stumpner1991auditory}), whose functional
+characteristics are conserved even between species that are not closely
+related~(\bcite{neuhofer2008evolutionarily}). The population of ascending
+neurons possesses a diverse range of response properties that contrasts with
+the rather homogeneous responses of receptor neurons and local
+interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a
+uniform population-wide processing stream into several parallel branches.
+Accordingly, the model pathway is divided into two distinct
 stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
 processing steps at the levels of the tympanal membrane, the receptor neurons,
 and the local interneurons; and operates on one-dimensional signal
@@ -275,6 +276,26 @@ downstream towards the SEG; and operates on high-dimensional signal
 representations~(Fig.\,\ref{fig:stages_feat}). The details of each
 physiological processing step and its functional approximation are described in
 the following sections.
+
+Around 15 to 20 ascending neurons have been identified in the grasshopper
+auditory system~(\bcite{stumpner1991auditory}), whose functional
+characteristics are conserved even between species that are not closely
+related~(\bcite{neuhofer2008evolutionarily}). The population of ascending
+neurons possesses a diverse range of response properties that contrasts with
+the rather homogeneous responses of receptor neurons and local
+interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a
+uniform population-wide processing stream into several parallel branches.
+Accordingly, the model pathway is divided into two distinct
+stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
+processing steps at the levels of the tympanal membrane, the receptor neurons,
+and the local interneurons; and operates on one-dimensional signal
+representations~(Fig.\,\ref{fig:stages_pre}). The feature extraction stage
+corresponds to the processing within the ascending neurons and further
+downstream towards the SEG; and operates on high-dimensional signal
+representations~(Fig.\,\ref{fig:stages_feat}). The details of each
+physiological processing step and its functional approximation are described in
+the following sections.
+
 \begin{figure}[!ht]
    \centering
    \includegraphics[width=\textwidth]{figures/fig_auditory_pathway.pdf}
@@ -1549,6 +1570,7 @@ have not been subject to decades of study will likely not be suitable for this
 approach yet.

 \subsection{Feature representation, temporal averaging, and song design}
+\label{sec:constant_feat}

 The feature set is the final song representation along the model pathway and
 constitutes the basis for song recognition. Each feature $f_i(t)$ results from
@@ -1703,12 +1725,15 @@ lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
 $\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
 amplitude dynamics of the song pattern. The saturation level of $\adapt$,
 unlike its saturation point, is independent of the SNR of $\env(t)$ because the
-influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
-saturation level and the saturation point of $\adapt(t)$ vary between different
-species and specific songs. These differences are likely rooted in the way in
-which logarithmic compression acts on the specific distribution of $\env(t)$,
-which is determined by $\fc$ and the structure and frequency spectrum of the
-rectified $\filt(t)$.
+influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
+SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
+in parts be a consequence of the logarithm, which compresses different higher
+intensities but also amplifies lower intensities, including the noise floor.
+Both the saturation level and the saturation point of $\adapt(t)$ vary between
+different species and individual songs. These differences are likely rooted in
+the way in which logarithmic compression acts on the specific distribution of
+$\env(t)$, which is determined by $\fc$ as well as the temporal structure and
+frequency spectrum of the rectified $\filt(t)$.

 Thresholding and temporal averaging renders feature $f_i(t)$
 intensity-invariant for sufficiently large $\sca$. The trade-off between
@@ -1720,71 +1745,103 @@ invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
 therefore determined solely by the pure-noise response of $f_i(t)$. The
 distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
 normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
-of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
+of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for higher
 $\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
-value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
-higher saturation point. In this case, any non-zero feature value that is
-sustained for a sufficient duration could serve as indicator for the presence
-of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
-of $\thr$ to the properties of both the species-specific song and the natural
+value is 0, which results in an "unlimited" SNR of $f_i(t)$. In this case, any
+non-zero feature value that is sustained for a sufficient duration could serve
+as indicator for the presence of $\soc(t)$, although at the cost of a higher
+saturation point. The maximum of the pure-noise $c_i(t)$ is assumed to be very
+small due to the various SNR improvements along the pathway, so that the
+required increase in $\thr$ and hence the saturation point of $f_i(t)$ is not
+expected to be substantial. However, exploiting the capacity of $f_i(t)$ for
+arbitrarily high SNR would certainly require a fine evolutionary tuning of
+$\thr$ to the properties of both the species-specific song and the natural
 noise in a certain habitat.

-
-It seems reasonable to assume that $\thr$ is one of the parameters along the
-pathway
-
-Physiologically, it is presumably easier to
-manipulate $\thr$ 
-
-
-It seems reasonable that $\thr$ is easier to
-manipulate in ev
-
-
-Furthermore, $\thr$ is presumably a parameter along
-the pathway that 
-
-
-$\thr$
-
-
-Furthermore, $\thr$ might be one of the parameters
-along the pathway 
-
-
-
-% However, the parameters that determine the SNR of $\adapt(t)$ are much less
-% understood and likely relate to properties of the signal, whereas the SNR of
-% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
-% by the system.
-
 \newpage
-\textbf{Thresh-LP: Implication for intensity invariance:}\\
-
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
-$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
-$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
-other criteria such as song-noise separation or diversity between features
-
- Nonlinear operations can be used to detach representations from graded physical
-stimulus (to fasciliate categorical behavioral decision-making?):\\
-1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
-$\rightarrow$ Closely following the AM of the acoustic stimulus\\
-2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
-$\rightarrow$ More decorrelated representation, compared to prior stages\\
-3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
-$\rightarrow$ Trading a graded scale for two or more categorical states\\
-4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
-$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
-5) Categorical behavioral decision-making requires further nonlinearities\\
-$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
-initiation of one behavior over another is categorical (e.g. approach/stay)
-
 \subsection{Intensity invariance versus intensity invariance}

+Two consecutive mechanisms of intensity invariance do not necessarily add up to
+a stronger overall intensity invariance. If the first mechanism results in a
+lower saturation point than the second mechanism by itself, the saturation
+point of feature $f_i(t)$ will be determined solely by the first mechanism. In
+this case, the saturation level of $f_i(t)$ will conform to the intensity that
+$f_i(t)$ can reach for the given saturation point rather than the intrinsic
+saturation level of $f_i(t)$. Conversely, if the second mechanism results in a
+lower saturation point than the first mechanism, both the saturation point and
+the saturation level of $f_i(t)$ will be determined by the second mechanism.
+The saturation points of $f_i(t)$ across the set are distributed over a much
+wider range than those of the preceeding kernel responses $c_i(t)$, which
+suggests that the interaction between the two mechanisms is specific to
+individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
+point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
+marginally lower saturation points. This raises the question whether two
+consecutive mechanisms of intensity invariance are actually beneficial for the
+overall system.
+
+From a purely functional perspective, the answer could be that logarithmic
+compression and adaptation is a necessary preprocessing step towards a robust
+feature representation, even if thresholding and temporal averaging alone would
+be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
+improves the temporal regularity of the song pattern in $\adapt(t)$ and
+$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
+song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
+the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
+is essential for the generation of consistent species-specific $f_i(t)$ under a
+static $\thr$. From a physiological perspective, the answer is likely that
+neurons possess only a limited firing rate for encoding stimulus intensities
+that can range over several orders of magnitude. Sigmoidal tuning curves over
+logarithmically compressed stimulus intensities are a common property of
+sensory neurons across various modalities~(SOURCE?), and neurons of the
+grasshopper auditory system are no exception~(\bcite{suga1960peripheral};
+\bcite{gollisch2002energy}).
+
 \subsection{Implications for behavior in a natural acoustic environment}

-% RIPPED FROM INTRODUCTION:
+Most grasshoppers live in environments that are communally inhabited by
+numerous individuals from multiple species. Their acoustic environment is
+characterized by noise from various sources --- abiotic ones like wind and
+water, but also the songs of both hetero- and conspecifics. This limits the SNR
+that each individual can achieve for its own song, and hence the effectiveness
+of the intensity-invariant processing in the auditory system. Producing higher
+song intensities is not a viable solution to this problem, because these also
+contribute to the overall noise floor. A possible behavioral solution could be
+to produce songs in a "turn-taking" manner to avoid the temporal superposition
+of multiple songs into overly intense signals. This would also prevent the
+mutual distortion of the respective song pattern. Another solution could be to
+spatially separate from other nearby grasshoppers to spread the potential noise
+sources over a larger area. However, according to our analysis based on field
+recordings as well as previous work on the topic~(\bcite{lang2000acoustic}),
+reliable song recognition is limited to little more than 1\,m from the sender,
+so that a grasshopper also cannot afford to stay too far away from its
+conspecifics. A better solution may hence be to collectively produce songs at
+lower-than-possible intensities, which would reduce the overall noise floor for
+all nearby individuals. Importantly, the limitation of intensity invariance by
+SNR likely applies to all grasshoppers regardless of species, so that the
+behavioral strategies could be shared among the species that coexist in a given
+habitat.
+
+% Because the presumed restriction of song recognition
+% by means of the noise floor applies to all grasshoppers in a certain area,
+% these strategies may not be specific to some of the species at this location.
+% Instead, they must be shared by all grasshopper species that coexist within a
+% portion of a given habitat, which would provide an important implication for
+% the evolution of grasshopper songs in communities of multiple species.
+
+%%% RELICS OF INTRODUCTION %%%
+% - Nonlinear operations can be used to detach representations from graded physical
+% stimulus (to fasciliate categorical behavioral decision-making?):\\
+% 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
+% $\rightarrow$ Closely following the AM of the acoustic stimulus\\
+% 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
+% $\rightarrow$ More decorrelated representation, compared to prior stages\\
+% 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
+% $\rightarrow$ Trading a graded scale for two or more categorical states\\
+% 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
+% $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
+% 5) Categorical behavioral decision-making requires further nonlinearities\\
+% $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
+% initiation of one behavior over another is categorical (e.g. approach/stay)

 % Multi-species, multi-individual communally inhabited environments\\
 % - Temporal overlap: Simultaneous singing across individuals/species common\\
@@ -1802,53 +1859,12 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
 % recognize the ones produced by conspecifics, and make appropriate behavioral
 % decisions based on context (sender identity, song type, mate/rival quality)

-% How can the auditory system of grasshoppers meet these challenges?\\
-% - What are the minimum functional processing steps required?\\
-% - Which known neuronal mechanisms can implement these steps?\\
-% - Which and how many stages along the auditory pathway contribute?\\
-% $\rightarrow$ What are the limitations of the system as a whole?
-
 % How can a human observer conceive a grasshopper's auditory percepts?\\
 % - How to investigate the workings of the auditory pathway as a whole?\\
 % - How to systematically test effects and interactions of processing parameters?\\
 % - How to integrate the available knowledge on anatomy, physiology, ethology?\\
 % $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework

-\textbf{Differences between the model pathway and the previous framework:}
-In the first step, a bank of parallel linear-nonlinear feature detectors is
-applied to the input signal. Each feature detector consists of a convolutional
-filter and a subsequent sigmoidal nonlinearity. The outputs of these feature
-detectors are temporally averaged to obtain a single feature value per
-detector, which is then assigned a specific weight. The linear combination of
-weighted feature values results in a single preference value, that serves as
-predictor for the behavioral response of the animal to the presented input
-signal. Our model pathway adopts the general structure of the existing
-framework but modifies it in several key aspects. The convolutional filters,
-which have previously been fitted to behavioral data for each individual
-species~(\bcite{clemens2013computational}), are replaced by a larger, generic
-set of unfitted Gabor basis functions in order to cover a wide range of
-possible song features across different species. Gabor functions approximate
-the general structure of the filters used in the existing framework as well as
-the filter functions found in various auditory neurons~(\bcite{rokem2006spike};
-\bcite{clemens2011efficient}; \bcite{clemens2012nonlinear}). The fitted
-sigmoidal nonlinearities in the existing framework consistently exhibited very
-steep slopes and are therefore replaced by shifted Heaviside step-functions,
-which results in a binarization of the feature detector outputs. Another, more
-substantial modification is that the feature detector outputs are temporally
-averaged in a way that does not condense them into single feature values but
-retains their time-varying structure. This is in line with the fact that songs
-are no discrete units but part of a continuous acoustic stream that the
-auditory system has to process in real time. Moreover, a time-varying feature
-representation only stabilizes after a certain delay following the onset of a
-song, which emphasizes the temporal dynamics of evidence accumulation towards a
-final categorical decision. The most notable difference between our model
-pathway and the existing framework, however, lays in the addition of a
-physiologically inspired preprocessing stage, whose starting point corresponds
-to the initial reception of airborne sound waves. This allows the model to
-operate on unmodified recordings of natural grasshopper songs instead of
-condensed pulse train approximations, which widens its scope towards more
-realistic, ecologically relevant scenarios.
-
 \newpage
 \section{Appendix}