Kind of done drafting the discussion. Needs polishing.

2026-05-29 18:00:36 +02:00
parent 1878fb5eaf
commit dea5923dd7
2 changed files with 128 additions and 112 deletions
--- a/main.pdf
+++ b/main.pdf
--- a/main.tex
+++ b/main.tex
@@ -258,14 +258,15 @@ substrate for conspecific song recognition and response
 initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
 \bcite{bhavsar2017brain}).
-Functionally, the ascending neurons are the most diverse of the three neuronal
+Around 15 to 20 ascending neurons have been identified in the grasshopper
-populations. Around 15 to 20 ascending neurons have been identified in the
+auditory system~(\bcite{stumpner1991auditory}), whose functional
-grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
+characteristics are conserved even between species that are not closely
-ascending neurons possess highly specific response properties that contrast
+related~(\bcite{neuhofer2008evolutionarily}). The population of ascending
-with the rather homogeneous response properties of the preceding receptor
+neurons possesses a diverse range of response properties that contrasts with
-neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
+the rather homogeneous responses of receptor neurons and local
-a transition from a uniform population-wide processing stream into several
+interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a
-parallel branches. Accordingly, the model pathway is divided into two distinct
+uniform population-wide processing stream into several parallel branches.
 Accordingly, the model pathway is divided into two distinct
 stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
 processing steps at the levels of the tympanal membrane, the receptor neurons,
 and the local interneurons; and operates on one-dimensional signal
@@ -275,6 +276,26 @@ downstream towards the SEG; and operates on high-dimensional signal
 representations~(Fig.\,\ref{fig:stages_feat}). The details of each
 physiological processing step and its functional approximation are described in
 the following sections.
 Around 15 to 20 ascending neurons have been identified in the grasshopper
 auditory system~(\bcite{stumpner1991auditory}), whose functional
 characteristics are conserved even between species that are not closely
 related~(\bcite{neuhofer2008evolutionarily}). The population of ascending
 neurons possesses a diverse range of response properties that contrasts with
 the rather homogeneous responses of receptor neurons and local
 interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a
 uniform population-wide processing stream into several parallel branches.
 Accordingly, the model pathway is divided into two distinct
 stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
 processing steps at the levels of the tympanal membrane, the receptor neurons,
 and the local interneurons; and operates on one-dimensional signal
 representations~(Fig.\,\ref{fig:stages_pre}). The feature extraction stage
 corresponds to the processing within the ascending neurons and further
 downstream towards the SEG; and operates on high-dimensional signal
 representations~(Fig.\,\ref{fig:stages_feat}). The details of each
 physiological processing step and its functional approximation are described in
 the following sections.
 \begin{figure}[!ht]
    \centering
    \includegraphics[width=\textwidth]{figures/fig_auditory_pathway.pdf}
@@ -1549,6 +1570,7 @@ have not been subject to decades of study will likely not be suitable for this
 approach yet.
 \subsection{Feature representation, temporal averaging, and song design}
 \label{sec:constant_feat}
 The feature set is the final song representation along the model pathway and
 constitutes the basis for song recognition. Each feature $f_i(t)$ results from
@@ -1703,12 +1725,15 @@ lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
 $\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
 amplitude dynamics of the song pattern. The saturation level of $\adapt$,
 unlike its saturation point, is independent of the SNR of $\env(t)$ because the
-influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
+influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
-saturation level and the saturation point of $\adapt(t)$ vary between different
+SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
-species and specific songs. These differences are likely rooted in the way in
+in parts be a consequence of the logarithm, which compresses different higher
-which logarithmic compression acts on the specific distribution of $\env(t)$,
+intensities but also amplifies lower intensities, including the noise floor.
-which is determined by $\fc$ and the structure and frequency spectrum of the
+Both the saturation level and the saturation point of $\adapt(t)$ vary between
-rectified $\filt(t)$.
+different species and individual songs. These differences are likely rooted in
 the way in which logarithmic compression acts on the specific distribution of
 $\env(t)$, which is determined by $\fc$ as well as the temporal structure and
 frequency spectrum of the rectified $\filt(t)$.
 Thresholding and temporal averaging renders feature $f_i(t)$
 intensity-invariant for sufficiently large $\sca$. The trade-off between
@@ -1720,71 +1745,103 @@ invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
 therefore determined solely by the pure-noise response of $f_i(t)$. The
 distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
 normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
-of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
+of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for higher
 $\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
-value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
+value is 0, which results in an "unlimited" SNR of $f_i(t)$. In this case, any
-higher saturation point. In this case, any non-zero feature value that is
+non-zero feature value that is sustained for a sufficient duration could serve
-sustained for a sufficient duration could serve as indicator for the presence
+as indicator for the presence of $\soc(t)$, although at the cost of a higher
-of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
+saturation point. The maximum of the pure-noise $c_i(t)$ is assumed to be very
-of $\thr$ to the properties of both the species-specific song and the natural
+small due to the various SNR improvements along the pathway, so that the
 required increase in $\thr$ and hence the saturation point of $f_i(t)$ is not
 expected to be substantial. However, exploiting the capacity of $f_i(t)$ for
 arbitrarily high SNR would certainly require a fine evolutionary tuning of
 $\thr$ to the properties of both the species-specific song and the natural
 noise in a certain habitat.
 It seems reasonable to assume that $\thr$ is one of the parameters along the
 pathway
 Physiologically, it is presumably easier to
 manipulate $\thr$ 
 It seems reasonable that $\thr$ is easier to
 manipulate in ev
 Furthermore, $\thr$ is presumably a parameter along
 the pathway that 
 $\thr$
 Furthermore, $\thr$ might be one of the parameters
 along the pathway 
 % However, the parameters that determine the SNR of $\adapt(t)$ are much less
 % understood and likely relate to properties of the signal, whereas the SNR of
 % $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
 % by the system.
 \newpage
 \textbf{Thresh-LP: Implication for intensity invariance:}\\
 - Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
 $\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
 $\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
 other criteria such as song-noise separation or diversity between features
 - Nonlinear operations can be used to detach representations from graded physical
 stimulus (to fasciliate categorical behavioral decision-making?):\\
 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
 $\rightarrow$ Closely following the AM of the acoustic stimulus\\
 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
 $\rightarrow$ More decorrelated representation, compared to prior stages\\
 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
 $\rightarrow$ Trading a graded scale for two or more categorical states\\
 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
 $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
 5) Categorical behavioral decision-making requires further nonlinearities\\
 $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
 initiation of one behavior over another is categorical (e.g. approach/stay)
 \subsection{Intensity invariance versus intensity invariance}
 Two consecutive mechanisms of intensity invariance do not necessarily add up to
 a stronger overall intensity invariance. If the first mechanism results in a
 lower saturation point than the second mechanism by itself, the saturation
 point of feature $f_i(t)$ will be determined solely by the first mechanism. In
 this case, the saturation level of $f_i(t)$ will conform to the intensity that
 $f_i(t)$ can reach for the given saturation point rather than the intrinsic
 saturation level of $f_i(t)$. Conversely, if the second mechanism results in a
 lower saturation point than the first mechanism, both the saturation point and
 the saturation level of $f_i(t)$ will be determined by the second mechanism.
 The saturation points of $f_i(t)$ across the set are distributed over a much
 wider range than those of the preceeding kernel responses $c_i(t)$, which
 suggests that the interaction between the two mechanisms is specific to
 individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
 point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
 marginally lower saturation points. This raises the question whether two
 consecutive mechanisms of intensity invariance are actually beneficial for the
 overall system.
 From a purely functional perspective, the answer could be that logarithmic
 compression and adaptation is a necessary preprocessing step towards a robust
 feature representation, even if thresholding and temporal averaging alone would
 be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
 improves the temporal regularity of the song pattern in $\adapt(t)$ and
 $c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
 song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
 the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
 is essential for the generation of consistent species-specific $f_i(t)$ under a
 static $\thr$. From a physiological perspective, the answer is likely that
 neurons possess only a limited firing rate for encoding stimulus intensities
 that can range over several orders of magnitude. Sigmoidal tuning curves over
 logarithmically compressed stimulus intensities are a common property of
 sensory neurons across various modalities~(SOURCE?), and neurons of the
 grasshopper auditory system are no exception~(\bcite{suga1960peripheral};
 \bcite{gollisch2002energy}).
 \subsection{Implications for behavior in a natural acoustic environment}
-% RIPPED FROM INTRODUCTION:
+Most grasshoppers live in environments that are communally inhabited by
 numerous individuals from multiple species. Their acoustic environment is
 characterized by noise from various sources --- abiotic ones like wind and
 water, but also the songs of both hetero- and conspecifics. This limits the SNR
 that each individual can achieve for its own song, and hence the effectiveness
 of the intensity-invariant processing in the auditory system. Producing higher
 song intensities is not a viable solution to this problem, because these also
 contribute to the overall noise floor. A possible behavioral solution could be
 to produce songs in a "turn-taking" manner to avoid the temporal superposition
 of multiple songs into overly intense signals. This would also prevent the
 mutual distortion of the respective song pattern. Another solution could be to
 spatially separate from other nearby grasshoppers to spread the potential noise
 sources over a larger area. However, according to our analysis based on field
 recordings as well as previous work on the topic~(\bcite{lang2000acoustic}),
 reliable song recognition is limited to little more than 1\,m from the sender,
 so that a grasshopper also cannot afford to stay too far away from its
 conspecifics. A better solution may hence be to collectively produce songs at
 lower-than-possible intensities, which would reduce the overall noise floor for
 all nearby individuals. Importantly, the limitation of intensity invariance by
 SNR likely applies to all grasshoppers regardless of species, so that the
 behavioral strategies could be shared among the species that coexist in a given
 habitat.
 % Because the presumed restriction of song recognition
 % by means of the noise floor applies to all grasshoppers in a certain area,
 % these strategies may not be specific to some of the species at this location.
 % Instead, they must be shared by all grasshopper species that coexist within a
 % portion of a given habitat, which would provide an important implication for
 % the evolution of grasshopper songs in communities of multiple species.
 %%% RELICS OF INTRODUCTION %%%
 % - Nonlinear operations can be used to detach representations from graded physical
 % stimulus (to fasciliate categorical behavioral decision-making?):\\
 % 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
 % $\rightarrow$ Closely following the AM of the acoustic stimulus\\
 % 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
 % $\rightarrow$ More decorrelated representation, compared to prior stages\\
 % 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
 % $\rightarrow$ Trading a graded scale for two or more categorical states\\
 % 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
 % $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
 % 5) Categorical behavioral decision-making requires further nonlinearities\\
 % $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
 % initiation of one behavior over another is categorical (e.g. approach/stay)
 % Multi-species, multi-individual communally inhabited environments\\
 % - Temporal overlap: Simultaneous singing across individuals/species common\\
@@ -1802,53 +1859,12 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
 % recognize the ones produced by conspecifics, and make appropriate behavioral
 % decisions based on context (sender identity, song type, mate/rival quality)
 % How can the auditory system of grasshoppers meet these challenges?\\
 % - What are the minimum functional processing steps required?\\
 % - Which known neuronal mechanisms can implement these steps?\\
 % - Which and how many stages along the auditory pathway contribute?\\
 % $\rightarrow$ What are the limitations of the system as a whole?
 % How can a human observer conceive a grasshopper's auditory percepts?\\
 % - How to investigate the workings of the auditory pathway as a whole?\\
 % - How to systematically test effects and interactions of processing parameters?\\
 % - How to integrate the available knowledge on anatomy, physiology, ethology?\\
 % $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
 \textbf{Differences between the model pathway and the previous framework:}
 In the first step, a bank of parallel linear-nonlinear feature detectors is
 applied to the input signal. Each feature detector consists of a convolutional
 filter and a subsequent sigmoidal nonlinearity. The outputs of these feature
 detectors are temporally averaged to obtain a single feature value per
 detector, which is then assigned a specific weight. The linear combination of
 weighted feature values results in a single preference value, that serves as
 predictor for the behavioral response of the animal to the presented input
 signal. Our model pathway adopts the general structure of the existing
 framework but modifies it in several key aspects. The convolutional filters,
 which have previously been fitted to behavioral data for each individual
 species~(\bcite{clemens2013computational}), are replaced by a larger, generic
 set of unfitted Gabor basis functions in order to cover a wide range of
 possible song features across different species. Gabor functions approximate
 the general structure of the filters used in the existing framework as well as
 the filter functions found in various auditory neurons~(\bcite{rokem2006spike};
 \bcite{clemens2011efficient}; \bcite{clemens2012nonlinear}). The fitted
 sigmoidal nonlinearities in the existing framework consistently exhibited very
 steep slopes and are therefore replaced by shifted Heaviside step-functions,
 which results in a binarization of the feature detector outputs. Another, more
 substantial modification is that the feature detector outputs are temporally
 averaged in a way that does not condense them into single feature values but
 retains their time-varying structure. This is in line with the fact that songs
 are no discrete units but part of a continuous acoustic stream that the
 auditory system has to process in real time. Moreover, a time-varying feature
 representation only stabilizes after a certain delay following the onset of a
 song, which emphasizes the temporal dynamics of evidence accumulation towards a
 final categorical decision. The most notable difference between our model
 pathway and the existing framework, however, lays in the addition of a
 physiologically inspired preprocessing stage, whose starting point corresponds
 to the initial reception of airborne sound waves. This allows the model to
 operate on unmodified recordings of natural grasshopper songs instead of
 condensed pulse train approximations, which widens its scope towards more
 realistic, ecologically relevant scenarios.
 \newpage
 \section{Appendix}