Kind of done drafting the discussion. Needs polishing.

This commit is contained in:
j-hartling
2026-05-29 18:00:36 +02:00
parent 1878fb5eaf
commit dea5923dd7
2 changed files with 128 additions and 112 deletions

240
main.tex
View File

@@ -258,14 +258,15 @@ substrate for conspecific song recognition and response
initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
\bcite{bhavsar2017brain}).
Functionally, the ascending neurons are the most diverse of the three neuronal
populations. Around 15 to 20 ascending neurons have been identified in the
grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
ascending neurons possess highly specific response properties that contrast
with the rather homogeneous response properties of the preceding receptor
neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
a transition from a uniform population-wide processing stream into several
parallel branches. Accordingly, the model pathway is divided into two distinct
Around 15 to 20 ascending neurons have been identified in the grasshopper
auditory system~(\bcite{stumpner1991auditory}), whose functional
characteristics are conserved even between species that are not closely
related~(\bcite{neuhofer2008evolutionarily}). The population of ascending
neurons possesses a diverse range of response properties that contrasts with
the rather homogeneous responses of receptor neurons and local
interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a
uniform population-wide processing stream into several parallel branches.
Accordingly, the model pathway is divided into two distinct
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
processing steps at the levels of the tympanal membrane, the receptor neurons,
and the local interneurons; and operates on one-dimensional signal
@@ -275,6 +276,26 @@ downstream towards the SEG; and operates on high-dimensional signal
representations~(Fig.\,\ref{fig:stages_feat}). The details of each
physiological processing step and its functional approximation are described in
the following sections.
Around 15 to 20 ascending neurons have been identified in the grasshopper
auditory system~(\bcite{stumpner1991auditory}), whose functional
characteristics are conserved even between species that are not closely
related~(\bcite{neuhofer2008evolutionarily}). The population of ascending
neurons possesses a diverse range of response properties that contrasts with
the rather homogeneous responses of receptor neurons and local
interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a
uniform population-wide processing stream into several parallel branches.
Accordingly, the model pathway is divided into two distinct
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
processing steps at the levels of the tympanal membrane, the receptor neurons,
and the local interneurons; and operates on one-dimensional signal
representations~(Fig.\,\ref{fig:stages_pre}). The feature extraction stage
corresponds to the processing within the ascending neurons and further
downstream towards the SEG; and operates on high-dimensional signal
representations~(Fig.\,\ref{fig:stages_feat}). The details of each
physiological processing step and its functional approximation are described in
the following sections.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_auditory_pathway.pdf}
@@ -1549,6 +1570,7 @@ have not been subject to decades of study will likely not be suitable for this
approach yet.
\subsection{Feature representation, temporal averaging, and song design}
\label{sec:constant_feat}
The feature set is the final song representation along the model pathway and
constitutes the basis for song recognition. Each feature $f_i(t)$ results from
@@ -1703,12 +1725,15 @@ lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
saturation level and the saturation point of $\adapt(t)$ vary between different
species and specific songs. These differences are likely rooted in the way in
which logarithmic compression acts on the specific distribution of $\env(t)$,
which is determined by $\fc$ and the structure and frequency spectrum of the
rectified $\filt(t)$.
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
in parts be a consequence of the logarithm, which compresses different higher
intensities but also amplifies lower intensities, including the noise floor.
Both the saturation level and the saturation point of $\adapt(t)$ vary between
different species and individual songs. These differences are likely rooted in
the way in which logarithmic compression acts on the specific distribution of
$\env(t)$, which is determined by $\fc$ as well as the temporal structure and
frequency spectrum of the rectified $\filt(t)$.
Thresholding and temporal averaging renders feature $f_i(t)$
intensity-invariant for sufficiently large $\sca$. The trade-off between
@@ -1720,71 +1745,103 @@ invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
therefore determined solely by the pure-noise response of $f_i(t)$. The
distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for higher
$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
higher saturation point. In this case, any non-zero feature value that is
sustained for a sufficient duration could serve as indicator for the presence
of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
of $\thr$ to the properties of both the species-specific song and the natural
value is 0, which results in an "unlimited" SNR of $f_i(t)$. In this case, any
non-zero feature value that is sustained for a sufficient duration could serve
as indicator for the presence of $\soc(t)$, although at the cost of a higher
saturation point. The maximum of the pure-noise $c_i(t)$ is assumed to be very
small due to the various SNR improvements along the pathway, so that the
required increase in $\thr$ and hence the saturation point of $f_i(t)$ is not
expected to be substantial. However, exploiting the capacity of $f_i(t)$ for
arbitrarily high SNR would certainly require a fine evolutionary tuning of
$\thr$ to the properties of both the species-specific song and the natural
noise in a certain habitat.
It seems reasonable to assume that $\thr$ is one of the parameters along the
pathway
Physiologically, it is presumably easier to
manipulate $\thr$
It seems reasonable that $\thr$ is easier to
manipulate in ev
Furthermore, $\thr$ is presumably a parameter along
the pathway that
$\thr$
Furthermore, $\thr$ might be one of the parameters
along the pathway
% However, the parameters that determine the SNR of $\adapt(t)$ are much less
% understood and likely relate to properties of the signal, whereas the SNR of
% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
% by the system.
\newpage
\textbf{Thresh-LP: Implication for intensity invariance:}\\
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
other criteria such as song-noise separation or diversity between features
- Nonlinear operations can be used to detach representations from graded physical
stimulus (to fasciliate categorical behavioral decision-making?):\\
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
$\rightarrow$ More decorrelated representation, compared to prior stages\\
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
$\rightarrow$ Trading a graded scale for two or more categorical states\\
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
5) Categorical behavioral decision-making requires further nonlinearities\\
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
initiation of one behavior over another is categorical (e.g. approach/stay)
\subsection{Intensity invariance versus intensity invariance}
Two consecutive mechanisms of intensity invariance do not necessarily add up to
a stronger overall intensity invariance. If the first mechanism results in a
lower saturation point than the second mechanism by itself, the saturation
point of feature $f_i(t)$ will be determined solely by the first mechanism. In
this case, the saturation level of $f_i(t)$ will conform to the intensity that
$f_i(t)$ can reach for the given saturation point rather than the intrinsic
saturation level of $f_i(t)$. Conversely, if the second mechanism results in a
lower saturation point than the first mechanism, both the saturation point and
the saturation level of $f_i(t)$ will be determined by the second mechanism.
The saturation points of $f_i(t)$ across the set are distributed over a much
wider range than those of the preceeding kernel responses $c_i(t)$, which
suggests that the interaction between the two mechanisms is specific to
individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
marginally lower saturation points. This raises the question whether two
consecutive mechanisms of intensity invariance are actually beneficial for the
overall system.
From a purely functional perspective, the answer could be that logarithmic
compression and adaptation is a necessary preprocessing step towards a robust
feature representation, even if thresholding and temporal averaging alone would
be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
improves the temporal regularity of the song pattern in $\adapt(t)$ and
$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
is essential for the generation of consistent species-specific $f_i(t)$ under a
static $\thr$. From a physiological perspective, the answer is likely that
neurons possess only a limited firing rate for encoding stimulus intensities
that can range over several orders of magnitude. Sigmoidal tuning curves over
logarithmically compressed stimulus intensities are a common property of
sensory neurons across various modalities~(SOURCE?), and neurons of the
grasshopper auditory system are no exception~(\bcite{suga1960peripheral};
\bcite{gollisch2002energy}).
\subsection{Implications for behavior in a natural acoustic environment}
% RIPPED FROM INTRODUCTION:
Most grasshoppers live in environments that are communally inhabited by
numerous individuals from multiple species. Their acoustic environment is
characterized by noise from various sources --- abiotic ones like wind and
water, but also the songs of both hetero- and conspecifics. This limits the SNR
that each individual can achieve for its own song, and hence the effectiveness
of the intensity-invariant processing in the auditory system. Producing higher
song intensities is not a viable solution to this problem, because these also
contribute to the overall noise floor. A possible behavioral solution could be
to produce songs in a "turn-taking" manner to avoid the temporal superposition
of multiple songs into overly intense signals. This would also prevent the
mutual distortion of the respective song pattern. Another solution could be to
spatially separate from other nearby grasshoppers to spread the potential noise
sources over a larger area. However, according to our analysis based on field
recordings as well as previous work on the topic~(\bcite{lang2000acoustic}),
reliable song recognition is limited to little more than 1\,m from the sender,
so that a grasshopper also cannot afford to stay too far away from its
conspecifics. A better solution may hence be to collectively produce songs at
lower-than-possible intensities, which would reduce the overall noise floor for
all nearby individuals. Importantly, the limitation of intensity invariance by
SNR likely applies to all grasshoppers regardless of species, so that the
behavioral strategies could be shared among the species that coexist in a given
habitat.
% Because the presumed restriction of song recognition
% by means of the noise floor applies to all grasshoppers in a certain area,
% these strategies may not be specific to some of the species at this location.
% Instead, they must be shared by all grasshopper species that coexist within a
% portion of a given habitat, which would provide an important implication for
% the evolution of grasshopper songs in communities of multiple species.
%%% RELICS OF INTRODUCTION %%%
% - Nonlinear operations can be used to detach representations from graded physical
% stimulus (to fasciliate categorical behavioral decision-making?):\\
% 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
% $\rightarrow$ Closely following the AM of the acoustic stimulus\\
% 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
% $\rightarrow$ More decorrelated representation, compared to prior stages\\
% 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
% $\rightarrow$ Trading a graded scale for two or more categorical states\\
% 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
% $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
% 5) Categorical behavioral decision-making requires further nonlinearities\\
% $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
% initiation of one behavior over another is categorical (e.g. approach/stay)
% Multi-species, multi-individual communally inhabited environments\\
% - Temporal overlap: Simultaneous singing across individuals/species common\\
@@ -1802,53 +1859,12 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
% recognize the ones produced by conspecifics, and make appropriate behavioral
% decisions based on context (sender identity, song type, mate/rival quality)
% How can the auditory system of grasshoppers meet these challenges?\\
% - What are the minimum functional processing steps required?\\
% - Which known neuronal mechanisms can implement these steps?\\
% - Which and how many stages along the auditory pathway contribute?\\
% $\rightarrow$ What are the limitations of the system as a whole?
% How can a human observer conceive a grasshopper's auditory percepts?\\
% - How to investigate the workings of the auditory pathway as a whole?\\
% - How to systematically test effects and interactions of processing parameters?\\
% - How to integrate the available knowledge on anatomy, physiology, ethology?\\
% $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
\textbf{Differences between the model pathway and the previous framework:}
In the first step, a bank of parallel linear-nonlinear feature detectors is
applied to the input signal. Each feature detector consists of a convolutional
filter and a subsequent sigmoidal nonlinearity. The outputs of these feature
detectors are temporally averaged to obtain a single feature value per
detector, which is then assigned a specific weight. The linear combination of
weighted feature values results in a single preference value, that serves as
predictor for the behavioral response of the animal to the presented input
signal. Our model pathway adopts the general structure of the existing
framework but modifies it in several key aspects. The convolutional filters,
which have previously been fitted to behavioral data for each individual
species~(\bcite{clemens2013computational}), are replaced by a larger, generic
set of unfitted Gabor basis functions in order to cover a wide range of
possible song features across different species. Gabor functions approximate
the general structure of the filters used in the existing framework as well as
the filter functions found in various auditory neurons~(\bcite{rokem2006spike};
\bcite{clemens2011efficient}; \bcite{clemens2012nonlinear}). The fitted
sigmoidal nonlinearities in the existing framework consistently exhibited very
steep slopes and are therefore replaced by shifted Heaviside step-functions,
which results in a binarization of the feature detector outputs. Another, more
substantial modification is that the feature detector outputs are temporally
averaged in a way that does not condense them into single feature values but
retains their time-varying structure. This is in line with the fact that songs
are no discrete units but part of a continuous acoustic stream that the
auditory system has to process in real time. Moreover, a time-varying feature
representation only stabilizes after a certain delay following the onset of a
song, which emphasizes the temporal dynamics of evidence accumulation towards a
final categorical decision. The most notable difference between our model
pathway and the existing framework, however, lays in the addition of a
physiologically inspired preprocessing stage, whose starting point corresponds
to the initial reception of airborne sound waves. This allows the model to
operate on unmodified recordings of natural grasshopper songs instead of
condensed pulse train approximations, which widens its scope towards more
realistic, ecologically relevant scenarios.
\newpage
\section{Appendix}