Busy integrating more sources into discussion.
This commit is contained in:
253
main.tex
253
main.tex
@@ -167,8 +167,8 @@ Grasshopper songs are amplitude-modulated broad-band acoustic signals. They
|
||||
consist of a series of noisy syllables and relatively quiet pauses, which form
|
||||
a characteristic repetitive pattern~(\bcite{helversen1977stridulatory};
|
||||
\bcite{stumpner1994song}). Song recognition depends on certain structural
|
||||
parameters of this pattern --- such as the duration of syllables and
|
||||
pauses~(\bcite{helversen1972gesang}), the slope of pulse
|
||||
parameters of this pattern --- such as the ratio of syllable duration to pause
|
||||
duration~(\bcite{helversen1972gesang}), the slope of pulse
|
||||
onsets~(\bcite{helversen1993absolute}), and the accentuation of syllable onsets
|
||||
relative to the preceeding pause~(\bcite{balakrishnan2001song};
|
||||
\bcite{helversen2004acoustic}) --- which are sufficiently conveyed by the
|
||||
@@ -1000,26 +1000,23 @@ corresponding averaging interval $\tlp$:
|
||||
f(t)\,\approx\,\int_{\Theta}^{+\infty} \pclp\,dc\,=\,P(c\,>\,\Theta,\,\tlp)
|
||||
\label{eq:feat_prop}
|
||||
\end{equation}
|
||||
In a sense, $f(t)$ can be interpreted as some sort of duty cycle with respect
|
||||
to $\Theta$. For example, a feature value of $f(t)=0.4$ means that $c(t)$
|
||||
exceeds $\Theta$ for approximately 40\,\% of the time within $\tlp$ around $t$.
|
||||
In the most extreme cases, $\Theta$ lays either above the maximum of $c(t)$ or
|
||||
below the minimum of $c(t)$, which results in a minimum or maximum possible
|
||||
feature value of $f(t)=0$~(Fig.\,\ref{fig:thresh-lp_single}d, left column) or
|
||||
$f(t)=1$, respectively.
|
||||
In a sense, $f(t)$ can be interpreted as some sort of duty cycle of $c(t)$ with
|
||||
respect to $\Theta$. For example, a feature value of $f(t)=0.4$ means that
|
||||
$c(t)$ exceeds $\Theta$ for approximately 40\,\% of the time within $\tlp$
|
||||
around $t$. In the most extreme cases, $\Theta$ lays either above the maximum
|
||||
of $c(t)$ or below the minimum of $c(t)$, which results in a minimum or maximum
|
||||
possible feature value of $f(t)=0$~(Fig.\,\ref{fig:thresh-lp_single}d, left
|
||||
column) or $f(t)=1$, respectively.
|
||||
|
||||
Importantly, $f(t)$ neither retains information about the timing of individual
|
||||
threshold crossings nor the precise values of $c(t)$ apart from their relation
|
||||
to $\Theta$. Accordingly, for a given $\Theta$, different $\sca$ can still
|
||||
result in similar $T_1$ segments (and hence similar feature values) depending
|
||||
on the magnitude of the derivative of $c(t)$ in temporal proximity to time
|
||||
points at which $c(t)$ crosses $\Theta$: The steeper the slope of $c(t)$, the
|
||||
less $T_1$ changes with variations in $\sca$. The most reliable way of
|
||||
exploiting this invariant porperty of $f(t)$ is to set $\Theta$ to a value near
|
||||
0, because these values are least affected by different scales of $c(t)$. For
|
||||
sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
|
||||
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
|
||||
saturation regime).
|
||||
to $\Theta$. Different $\sca$ can hence result in similar feature values by
|
||||
producing similar $T_1$ segments. The most reliable way of exploiting this
|
||||
invariant porperty of $f(t)$ is to set $\Theta$ to a value near 0, because
|
||||
these values are least affected by different scales of $c(t)$. For sufficiently
|
||||
large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in both the
|
||||
noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e, saturation
|
||||
regime).
|
||||
|
||||
The saturation level of $f(t)$ is independent of the precise value of $\Theta$,
|
||||
but the saturation point decreases with
|
||||
@@ -1526,52 +1523,55 @@ natural song variation.
|
||||
\newpage
|
||||
\section{Discussion}
|
||||
|
||||
% Recap of main findings:
|
||||
In the current study, we have established a physiologically inspired functional
|
||||
model of the grasshopper song recognition pathway. The model pathway covers the
|
||||
entire auditory processing stream, from the sound reception at the tympanal
|
||||
membrane over peripheral receptor neurons and local interneurons up to the
|
||||
generation of a high-dimensional feature representation at the level of the
|
||||
ascending neurons and beyond in the SEG. Using this model pathway, we have
|
||||
model of the grasshopper song recognition pathway; from the sound reception at
|
||||
the tympanal membrane over peripheral receptor neurons and local interneurons
|
||||
to the generation of a high-dimensional feature representation at the level of
|
||||
the ascending neurons and beyond in the SEG. Using this model pathway, we have
|
||||
identified two computational key mechanisms for the emergence of
|
||||
intensity-invariant song representations. Each mechanism comprises a nonlinear
|
||||
transformation and a subsequent linear transformation. The first mechanism
|
||||
consists of logarithmic compression and adaptation, which takes place at the
|
||||
level of the receptor neurons and local interneurons. The second mechanism
|
||||
consists of thresholding and temporal averaging, which takes place either at
|
||||
the level of the ascending neurons or further downstream in the SEG. Systematic
|
||||
investigation of both mechanisms revealed a persistent trade-off between the
|
||||
intensity invariance and the SNR of the song representations along the pathway.
|
||||
In the following, we discuss the capabilities and limitations of our model
|
||||
approach as well as the implications of our findings for the design of the
|
||||
grasshopper auditory system, the evolution of species-specific grasshopper
|
||||
songs, and the ethological relevance of intensity invariance in a natural
|
||||
acoustic environment.
|
||||
consists of logarithmic compression and adaptation by highpass filtering, which
|
||||
takes place at the level of the receptor neurons and local interneurons. The
|
||||
second mechanism consists of thresholding and temporal averaging by lowpass
|
||||
filtering, which takes place either at the level of the ascending neurons or
|
||||
further downstream in the SEG. Systematic investigation of both mechanisms
|
||||
revealed a persistent trade-off between the intensity invariance and the SNR of
|
||||
the song representations along the pathway. In the following, we briefly
|
||||
reflect on the potential of functional modelling for research on sensory
|
||||
systems. We then discuss the implications of our findings for the evolutionary
|
||||
design of both the auditory system and the species-specific songs of
|
||||
grasshoppers as well as the ethological relevance of intensity invariance in a
|
||||
natural acoustic environment.
|
||||
|
||||
\subsection{Leveraging functional modelling to investigate sensory systems}
|
||||
|
||||
% Functional modelling is cool but bound to freeload on previous research:
|
||||
Our understanding of sensory processing systems is based on the distributed
|
||||
accumulation of anatomical, physiological, and ethological evidence. Functional
|
||||
modelling provides a powerful tool to integrate the available fragments into a
|
||||
coherent whole. It fasciliates systematic, reproducible investigations of
|
||||
relevant parameters such as scale $\sca$ or threshold value $\thr$. Moreover,
|
||||
it allows to address questions of broader scope by generalizing from concrete
|
||||
relevant parameters, such as scale $\sca$ or threshold value $\thr$. It also
|
||||
allows to address questions of broader scope by generalizing from concrete
|
||||
evidence. For instance, the interaction between the two mechanisms of intensity
|
||||
invariance is most assessible if both mechanisms can be treated as consecutive
|
||||
stages along the pathway --- where the output of the first stage relates
|
||||
directly to the input of the second stage --- rather than separate entities.
|
||||
The model pathway also provides a general basis for comparing song
|
||||
Moreover, the model pathway provides a general basis for comparing song
|
||||
representations across different species without the need for species-specific
|
||||
models. However, the potential of functional modelling for research on sensory
|
||||
systems depends entirely on the amount of available knowledge about the system.
|
||||
The grasshopper song recognition pathway is a comparably simple and very
|
||||
well-understood system and is therefore a particularly suitable candidate for
|
||||
functional modelling. Other sensory systems that are either more complex or
|
||||
have not been subject to decades of study will likely not be suitable for this
|
||||
approach yet.
|
||||
models. However, the potential of functional modelling for research on a
|
||||
sensory system depends entirely on the amount of available knowledge about the
|
||||
system and its specific stimuli. The grasshopper song recognition pathway is a
|
||||
comparably simple, extensively researched and hence well-understood sensory
|
||||
system and is therefore a particularly suitable candidate for functional
|
||||
modelling. Other sensory systems that are either more complex or have not been
|
||||
subject to decades of study will likely not be suitable for this approach yet.
|
||||
|
||||
\subsection{Feature representation, temporal averaging, and song design}
|
||||
\subsection{Song design, temporal averaging, and feature representation}
|
||||
\label{sec:constant_feat}
|
||||
|
||||
% Recap of feature theory and relevant parameters:
|
||||
The feature set is the final song representation along the model pathway and
|
||||
constitutes the basis for song recognition. Each feature $f_i(t)$ results from
|
||||
the thresholding of the respective kernel response $c_i(t)$ by $\nl$ and the
|
||||
@@ -1582,24 +1582,73 @@ the threshold value $\thr$ within the averaging interval $\tlp$ specified by
|
||||
$\fc$. The value of $f_i(t)$ is hence determined by $\thr$ with respect to the
|
||||
distribution $\pci$ of $c_i(t)$ and is restricted to the interval $[0,1]$.
|
||||
|
||||
% Feature representation and the constraint of repetitive song structure:
|
||||
Different species-specific songs are represented by different combinations of
|
||||
feature values, which should preferably be constant for the duration of a song
|
||||
to enable reliable recognition. The fundamental requirement for a constant
|
||||
$f_i(t)$ is that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all
|
||||
$t$, which is fulfilled if $\pci$ is stable across $t$. The most
|
||||
straightforward way to achieve a stable $\pci$ is that $c_i(t)$ is periodic and
|
||||
$\tlp$ is sufficiently long to average over multiple cycles of $c_i(t)$.
|
||||
Song-evoked $c_i(t)$ are indeed approximately periodic, which is largely an
|
||||
to fasciliate recognition. The fundamental requirement for constant $f_i(t)$ is
|
||||
that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all $t$, which
|
||||
is fulfilled if $\pci$ is stable across $t$. The most straightforward way to
|
||||
achieve a stable $\pci$ is that $c_i(t)$ is periodic and $\tlp$ is sufficiently
|
||||
long to average over multiple cycles of $c_i(t)$. Most song-evoked $c_i(t)$ are
|
||||
indeed highly repetitive, albeit not perfectly periodic, which is largely an
|
||||
inherited property of the song itself. Most grasshopper songs are produced by
|
||||
stridulation, which refers to the pulling of the serrated stridulatory file on
|
||||
the hindlegs across a resonating vein on the
|
||||
forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song};
|
||||
\bcite{helversen1997recognition}). Every "tooth" that strikes the vein
|
||||
generates a brief sound pulse; multiple pulses make up a syllable; and the
|
||||
repetition of syllables and pauses results in a pattern with a high degree of
|
||||
temporal regularity. Accordingly, a robust feature representation in the sense
|
||||
of constant $f_i(t)$ is tightly linked to the mechanism of sound production and
|
||||
the temporal structure of the generated song.
|
||||
\bcite{helversen1997recognition}). Every "peg" that strikes the vein generates
|
||||
a brief sound pulse; multiple pulses make up a syllable; and the repetition of
|
||||
syllables and pauses results in a pattern with a high degree of temporal
|
||||
regularity. A repetitive motor pattern during stridulation hence lays the basis
|
||||
for constant $f_i(t)$.
|
||||
|
||||
The second requirement for constant $f_i(t)$ is a suitable averaging interval
|
||||
$\tlp$. The minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$
|
||||
to ensure a stable $\pci$. The maximum $\tlp$ should not exceed the duration of
|
||||
the song to avoid the inclusion of noise. The duration of species-specific
|
||||
grasshopper songs can range from a few hundred milliseconds~(e\,.g
|
||||
\textit{Stethophyma grossum}) to well over a minute~(e\,.g. \textit{C.
|
||||
mollis}), so that the optimal $\tlp$ likely differs between species. The longer
|
||||
$\tlp$, the longer $f_i(t)$ takes to stabilize after the onset of the song,
|
||||
which narrows the time window for reliable recognition.
|
||||
\\What about \bcite{ronacher1998song}??
|
||||
\\ $\rightarrow$ Answer might be \bcite{clemens2021sex}
|
||||
|
||||
|
||||
|
||||
|
||||
If the basis for constant $f_i(t)$ is already laid
|
||||
|
||||
The basis for constant $f_i(t)$ is hence already
|
||||
|
||||
The basis for a robust feature representation in the sense
|
||||
of constant $f_i(t)$ is hence already laid during the song production.
|
||||
|
||||
If the feature representation relies on a repetitive song pattern, one would
|
||||
expect that grasshopper songs are evolutionary constrained to include such a
|
||||
pattern.
|
||||
|
||||
If constant $f_i(t)$ rely on a repetitive song pattern and are benefitial for
|
||||
reliable song recognition, one would expect that repetitiveness is a common
|
||||
design principle of species-specific grasshopper songs.
|
||||
|
||||
, and if constant
|
||||
$f_i(t)$ are required for reliable song recognition, then one would expect that
|
||||
|
||||
|
||||
grasshopper songs are evolutionarily constrained to have such a repetitive
|
||||
structure.
|
||||
|
||||
This is true for many species-specific calling songs but less for
|
||||
courtship songs, which tend to have a more complex structure~()
|
||||
|
||||
If constant $f_i(t)$ rely on a repetitive song pattern and are benefitial for
|
||||
song recognition, then one would expect that grasshopper songs are
|
||||
evolutionarily constrained to have such a repetitive temporal structure.
|
||||
|
||||
From an evolutionary perspective, one would then expect that grasshopper songs
|
||||
are evolutionarily constrained to have a repetitive temporal structure in order
|
||||
to elicit a robust feature representation.
|
||||
|
||||
|
||||
Various grasshopper species, especially those with longer songs like \textit{C.
|
||||
mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
|
||||
@@ -1627,22 +1676,11 @@ song could result in similar $\pci$ despite their different temporal structure,
|
||||
which would allow for consistent $f_i(t)$ across the entire song. However, it
|
||||
appears more likely that only one part of the song encodes species identity,
|
||||
while the other part serves a different purpose such as fitness
|
||||
advertisement~(SOURCE?).
|
||||
advertisement~(\bcite{stumpner1992recognition}).
|
||||
|
||||
Finally, the question remains how the choice of an appropriate averaging
|
||||
interval $\tlp$ depends on the duration and temporal structure of a song. The
|
||||
minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$ to ensure a
|
||||
stable $\pci$ and hence a constant $f_i(t)$. The maximum $\tlp$ should not
|
||||
exceed the duration of a song to avoid the inclusion of behaviorally irrelevant
|
||||
information. The longer $\tlp$, the longer $f_i(t)$ takes to stabilize after
|
||||
the onset and before the offset of a song, which narrows the time window for
|
||||
reliable recognition. The duration of species-specific grasshopper songs can
|
||||
range from a few hundred milliseconds (e\,.g \textit{Stethophyma grossum}) to
|
||||
well over a minute (e\,.g. \textit{C. mollis}), so that the optimal $\tlp$ is
|
||||
likely to differ between species.
|
||||
|
||||
\subsection{Sensory invariances in the grasshopper auditory system}
|
||||
\subsection{Invariant processing in the grasshopper auditory system}
|
||||
|
||||
% Invariance in the general (systemic) sense:
|
||||
The notion of invariance is fundamental for sensory processing systems.
|
||||
Invariance, in the general sense, can be described as the property of a
|
||||
transformation to maintain variation across certain meaningful input parameters
|
||||
@@ -1651,46 +1689,52 @@ boils down to a selective input-output decorrelation that allows the system to
|
||||
represent only those aspects of the stimulus that are behaviorally relevant to
|
||||
the organism.
|
||||
|
||||
% "Easy" case - Throw away parameters that are not relevant:
|
||||
The grasshopper auditory system has to deal with a number of sources of
|
||||
non-informative song variation. For instance, the temporal structure of the
|
||||
song pattern warps with temperature~(\bcite{skovmand1983song}). This also
|
||||
affects certain structural parameters that are essential for song recognition,
|
||||
mainly the duration of syllables and pauses. The auditory system can compensate
|
||||
for this variation by reading out relative temporal relationships rather than
|
||||
absolute time intervals~(\bcite{creutzig2009timescale};
|
||||
\bcite{creutzig2010timescale}). The ratio of syllable duration to pause
|
||||
duration is relatively constant across temperatures and has been shown to be
|
||||
suitable for song recognition~(\bcite{helversen1972gesang}), so that there is
|
||||
likely no need to retain any information about the absolute duration of
|
||||
syllables and pauses.
|
||||
song pattern warps with temperature~(\bcite{skovmand1983song}). The auditory
|
||||
system can compensate for this time warping by reading out relative temporal
|
||||
relationships, such as the ratio of syllable duration to pause duration, rather
|
||||
than the absolute time intervals~(\bcite{creutzig2009timescale};
|
||||
\bcite{creutzig2010timescale}). This allows for reliable song recognition
|
||||
across different temperatures~(\bcite{helversen1972gesang}). Accordingly, the
|
||||
auditory system does likely not retain any information about the precise
|
||||
duration of syllables and pauses.
|
||||
|
||||
% Hard case - When a parameter is both relevant and irrelevant across functions:
|
||||
The situation is more complex for variations in song intensity. Song intensity
|
||||
at the receiver's position depends mostly on the distance to the sender and is
|
||||
hence not a reliable cue to infer species identity. The auditory system should
|
||||
therefore be invariant to intensity variations to recognize conspecific songs
|
||||
therefore not a reliable cue to infer species identity. The auditory system
|
||||
must hence be invariant to intensity variations to recognize conspecific songs
|
||||
regardless of sender distance. However, song intensity --- specifically, the
|
||||
interaural intensity difference --- is also required for directional hearing,
|
||||
which is essential for phonotaxis~(\bcite{helversen1988interaural}). Conflicts
|
||||
between song recognition and directional hearing are avoided in the auditory
|
||||
system by distributing both functions across two parallel
|
||||
interaural intensity difference --- is also a relevant cue for directional
|
||||
hearing, which is essential for phonotaxis~(\bcite{helversen1988interaural}).
|
||||
Interference between song recognition and directional hearing is avoided in the
|
||||
auditory system by distributing both functions across two parallel
|
||||
pathways~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}). This is
|
||||
the main reason why our model pathway is focused entirely on song recognition
|
||||
and has no capacity for directional hearing, no matter how relevant it may be
|
||||
to the grasshopper.
|
||||
and has no capacity for directional hearing, even though it is crucial to the
|
||||
grasshopper's behavior.
|
||||
|
||||
Furthermore, "invariance to variations in song intensity" does not do justice
|
||||
to the full extent of the problem. Intensity is a function of song amplitude
|
||||
within a certain time frame. It can refer to the individual syllables and
|
||||
pauses of the song pattern as well as the entire song --- the former is
|
||||
relevant for song recognition, while the latter is not. Intensity invariance in
|
||||
the current context can therefore be described as time scale-selective
|
||||
sensitivity to the faster amplitude dynamics of the song pattern and
|
||||
simultaneous insensitivity to slower, more sustained amplitude dynamics. In the
|
||||
model pathway, this time scale selectivity is reflected by the cutoff frequency
|
||||
$\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most
|
||||
$\fc$ are effective in removing the local offset of $\db(t)$ and render
|
||||
$\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ will leave the
|
||||
relevant amplitude dynamics of the song pattern intact.
|
||||
% Hard case+ - When a parameter is both relevant and irrelevant within a function:
|
||||
Song intensity is a function of the song amplitudes within a certain time
|
||||
frame. "Invariance to variations in song intensity" is hence entirely a matter
|
||||
of time scales. It can refer to intensity variations across different
|
||||
songs~(longer time scales) or intensity variations across the syllables within
|
||||
a song~(shorter time scales), but also to the intensity difference that
|
||||
differentiates a syllable from a pause~(very short time scales). The time scale
|
||||
of intensity invariance must therefore be sufficiently long to leave the
|
||||
syllables and pauses of the song pattern intact. In the model pathway, this
|
||||
time scale-selectivity is reflected by the cutoff frequency $\fc$ of the
|
||||
highpass filter that underlies the adaptation of $\adapt(t)$: Most $\fc$ except
|
||||
the lowest ones are effective in removing the local offset of $\db(t)$ and
|
||||
render $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$
|
||||
preserve the relevant amplitude dynamics of the song pattern. Intensity
|
||||
invariance by thresholding and temporal averaging also has a relevant time
|
||||
scale, which is determined by the averaging interval $\tlp$. However, this time
|
||||
scale is not constrained by the need to preserve the temporal structure of the
|
||||
song pattern but to provide a suitable degree of temporal integration across
|
||||
the song pattern~(Section\,\ref{sec:constant_feat}).
|
||||
|
||||
\subsection{Intensity invariance versus SNR}
|
||||
|
||||
@@ -1795,6 +1839,7 @@ logarithmically compressed stimulus intensities are a common property of
|
||||
sensory neurons across various modalities~(SOURCE?), and neurons of the
|
||||
grasshopper auditory system are no exception~(\bcite{suga1960peripheral};
|
||||
\bcite{gollisch2002energy}).
|
||||
\\$\rightarrow$ \bcite{ronacher2004neuronal}
|
||||
|
||||
\subsection{Implications for behavior in a natural acoustic environment}
|
||||
|
||||
@@ -1820,6 +1865,8 @@ all nearby individuals. Importantly, the limitation of intensity invariance by
|
||||
SNR likely applies to all grasshoppers regardless of species, so that the
|
||||
behavioral strategies could be shared among the species that coexist in a given
|
||||
habitat.
|
||||
\\ \bcite{kramer2018robustness}
|
||||
\\ \bcite{einhaupl2011attractiveness}
|
||||
|
||||
% Because the presumed restriction of song recognition
|
||||
% by means of the noise floor applies to all grasshoppers in a certain area,
|
||||
|
||||
Reference in New Issue
Block a user