Writing discussion.

This commit is contained in:
j-hartling
2026-05-28 18:17:59 +02:00
parent 6cd56b82b0
commit 1878fb5eaf
2 changed files with 289 additions and 142 deletions

BIN
main.pdf

Binary file not shown.

431
main.tex
View File

@@ -104,6 +104,7 @@
\newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation \newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval) \newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval) \newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
\newcommand{\pci}{p(c_i,\,\tlp)} % Kernel-specific probability density (lowpass interval)
\newcommand{\muf}{\mu_{f_i}} % Average feature value \newcommand{\muf}{\mu_{f_i}} % Average feature value
\section{Introduction} \section{Introduction}
@@ -258,12 +259,13 @@ initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
\bcite{bhavsar2017brain}). \bcite{bhavsar2017brain}).
Functionally, the ascending neurons are the most diverse of the three neuronal Functionally, the ascending neurons are the most diverse of the three neuronal
populations. Individual ascending neurons possess highly specific response populations. Around 15 to 20 ascending neurons have been identified in the
properties that contrast with the rather homogeneous response properties of the grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
preceding receptor neurons and local ascending neurons possess highly specific response properties that contrast
interneurons~(\bcite{clemens2011efficient}), which indicates a transition from with the rather homogeneous response properties of the preceding receptor
a uniform population-wide processing stream into several parallel branches. neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
Accordingly, the model pathway is divided into two distinct a transition from a uniform population-wide processing stream into several
parallel branches. Accordingly, the model pathway is divided into two distinct
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
processing steps at the levels of the tympanal membrane, the receptor neurons, processing steps at the levels of the tympanal membrane, the receptor neurons,
and the local interneurons; and operates on one-dimensional signal and the local interneurons; and operates on one-dimensional signal
@@ -754,16 +756,15 @@ This effect is more pronounced for lower $\fc$ of the lowpass filter and is
presumably caused by the attenuation of high-frequency components in the presumably caused by the attenuation of high-frequency components in the
signal, which are more prominent in the noise component $\noc(t)$ than in the signal, which are more prominent in the noise component $\noc(t)$ than in the
song component $\soc(t)$. The effect also appears relatively consistent across song component $\soc(t)$. The effect also appears relatively consistent across
different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e) different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e
that are presumably based on different song structures and frequency spectra. and appendix Fig.\,\ref{fig:app_rect-lp}). In summary, the standard deviation
In summary, the standard deviation of $\env(t)$ has never been observed to of $\env(t)$ has never been observed to saturate for larger $\sca$ but rather
transition into a saturation regime for larger $\sca$ but rather continues to continues to increase proportionally to $\sca$ for all tested $\fc$, in both
increase proportionally to $\sca$ for all tested $\fc$, in both the noiseless the noiseless and the noisy case and across different species. Consequently,
and the noisy case and across different species. Consequently, the combination the combination of rectification and lowpass filtering does not contribute to
of rectification and lowpass filtering does not contribute to intensity intensity invariance. However, this transformation pair does improve the SNR of
invariance. However, this transformation pair does improve the SNR of $\env(t)$ $\env(t)$ relative to $\filt(t)$ and thus provides subsequent processing stages
relative to $\filt(t)$ and thus provides subsequent processing stages with a with a more robust input representation and higher input SNR.
more robust input representation and higher input SNR.
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
@@ -883,24 +884,23 @@ $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the
effective intensity invariance of $\adapt(t)$ through logarithmic compression effective intensity invariance of $\adapt(t)$ through logarithmic compression
and adaptation is limited by the SNR of $\env(t)$: Songs that have already and adaptation is limited by the SNR of $\env(t)$: Songs that have already
sunken into the noise floor at the level of $\env(t)$ cannot be recovered by sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
subsequent processing steps, which emphasizes the importance of the SNR subsequent processing steps. The general pattern of noise regime, transient
improvement by rectification and lowpass filtering during the previous regime, and saturation regime remains consistent across different
processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise species~(Fig.\,\ref{fig:log-hp}e). However, the saturation point --- the $\sca$
regime, transient regime, and saturation regime remains consistent across value at which the SNR of $\adapt(t)$ starts to saturate --- and the saturation
different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of level --- the constant SNR of $\adapt(t)$ within the saturation regime --- vary
$\sca$ at which the saturation regime is reached (see appendix considerably between and within species~(appendix
Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$ Figs.\,\ref{fig:app_log-hp_curves}+\ref{fig:app_log-hp_saturation}). For
within the saturation regime vary considerably between and within species. For
example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
lower maximum SNR of $\adapt(t)$ compared to other species. These differences lower saturation level compared to other species. These differences are not to
are not to be underestimated, since the SNR of $\adapt(t)$ within the be underestimated, since the saturation level of $\adapt(t)$ determines the
saturation regime determines the maximum input SNR for subsequent processing maximum input SNR for subsequent processing steps. In other words, the fact
steps. In other words, the fact that $\adapt(t)$ eventually reaches a that $\adapt(t)$ eventually reaches a saturation regime is, of course,
saturation regime is, of course, desirable in the context of intensity desirable in the context of intensity invariance, but it also means to pass up
invariance, but it also means to pass up on the higher SNR values that are on the higher SNR values that are achieved by $\env(t)$ for the same $\sca$ (up
achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude, to several orders of magnitude, Fig.\,\ref{fig:log-hp}d). This trade-off
Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR between intensity invariance and SNR is a recurring phenomenon that is further
is a recurring phenomenon that is further addressed in the following sections. addressed in the following sections.
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
@@ -1000,24 +1000,17 @@ sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e, both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
saturation regime). saturation regime).
The value of $\mu_f$ in the saturation regime is independent of the precise The saturation level of $f(t)$ is independent of the precise value of $\Theta$,
value of $\Theta$, but the value of $\sca$ at which the saturation regime is but the saturation point decreases with
reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, a threshold value of
a threshold value of $\Theta=0$ would be the optimal choice for achieving $\Theta=0$ would be the optimal choice for achieving intensity invariance at
intensity invariance at the lowest possible $\sca$. In stark contrast, the the lowest possible $\sca$. In stark contrast, the closer $\Theta$ is to 0, the
closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise higher $\mu_f$ in response to the pure noise component $\noc(t)$ and the lower
component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise the resulting SNR of $f(t)$ between noise regime and saturation
regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an Fig.\,\ref{fig:thresh-lp_single}e). This trade-off between intensity invariance
"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the and SNR has already been observed during the previous analysis on logarithmic
pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song compression and adaptation~(Fig.\,\ref{fig:log-hp}d).
component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
$\sca$ to reach the saturation regime. This trade-off between intensity
invariance and SNR has already been observed during the previous analysis on
logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
parameters that determine the SNR of $\adapt(t)$ are much less understood and
likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
the choice of $\Theta$ and can be more directly manipulated by the system.
Finally, the effects of thresholding and temporal averaging must be seen in the Finally, the effects of thresholding and temporal averaging must be seen in the
context of the previous transformation pair of logarithmic compression and context of the previous transformation pair of logarithmic compression and
@@ -1102,11 +1095,11 @@ that the songs of each species are eventually represented by distinct points in
feature space. However, the species-specific trajectories cross each other at feature space. However, the species-specific trajectories cross each other at
numerous points, which means that the songs of two species --- each at a numerous points, which means that the songs of two species --- each at a
specific $\sca$ --- can result in the same combination of $\muf$. Furthermore, specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and the specific saturation point of $f_i(t)$ depends on the species: For
the species: For \textit{C. mollis}, all $\muf$ saturate around the same \textit{C. mollis}, all $\muf$ saturate around the same $\sca$, while
$\sca$, while \textit{O. rufipes} exhibits considerable variation between the \textit{O. rufipes} exhibits considerable variation between the three $f_i(t)$.
three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$, The larger the variation in saturation points between $f_i(t)$, the stronger
the stronger the curvature of the trajectory through feature space. the curvature of the trajectory through feature space.
In the noisy case, $\muf$ is non-zero even for the smallest In the noisy case, $\muf$ is non-zero even for the smallest
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise $\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
@@ -1121,9 +1114,9 @@ previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
trajectories now move a much shorter distance through feature space for a trajectories now move a much shorter distance through feature space for a
similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
and saturation regime, which increases the likelihood of trajectories crossing and saturation regime, which increases the likelihood of trajectories crossing
each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given each other. Finally, the saturation points of $f_i(t)$ for a given species are
species are slightly higher in the noisy case, but the variation between slightly higher in the noisy case, but the variation between $f_i(t)$ remains
$f_i(t)$ remains largely unchanged. largely unchanged.
In summary, even a comparably small set of three features $f_i(t)$ can, in In summary, even a comparably small set of three features $f_i(t)$ can, in
principle, represent different species-specific songs at distinct points in principle, represent different species-specific songs at distinct points in
@@ -1238,15 +1231,10 @@ broader and is not centered around the single saturation point based on the
median but rather shifted towards lower $\sca$. Care must be taken when median but rather shifted towards lower $\sca$. Care must be taken when
interpreting the height of either distribution due to the logarithmic scaling interpreting the height of either distribution due to the logarithmic scaling
of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their the saturation points of specific $f_i(t)$ are indeed lower than those of their
$c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal $c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
averaging on intensity invariance is not necessarily nullified by the previous averaging on intensity invariance is not necessarily nullified by the previous
logarithmic compression and adaptation, which means that both mechanisms can, logarithmic compression and adaptation.
in principle, work together towards an intensity-invariant song representation.
% Or does one simply overwrite the other? Can there even be a higher intensity
% invariance based on the sum of both effects? Or does one simply kick in for
% lower scales than the other and thus dictates the overall intensity
% invariance? Whatever, discussion material.
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
@@ -1313,7 +1301,7 @@ representation goes hand in hand with a substantial degree of redundancy and is
hardly expected to be present in the actual grasshopper auditory system. But hardly expected to be present in the actual grasshopper auditory system. But
the fact that the saturated $\muf$ are distributed symmetrically around 0.5 the fact that the saturated $\muf$ are distributed symmetrically around 0.5
provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
saturation value in the absence of logarithmic saturation level in the absence of logarithmic
compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
the capping of $\adapt(t)$, as seen during previous the capping of $\adapt(t)$, as seen during previous
analyses~(Fig.\,\ref{fig:thresh-lp_single}f and analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
@@ -1327,8 +1315,8 @@ that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
results in a wider range of $\muf$ across the feature set, it should be results in a wider range of $\muf$ across the feature set, it should be
benefitial for forming species-specific combinations. However, this depends on benefitial for forming species-specific combinations. However, this depends on
multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
the structure and distribution of the specific song and is hence not the structure and distribution of the specific song and is hence not guaranteed
guaranteed simply by disabling logarithmic compression. simply by disabling logarithmic compression.
\begin{figure}[!ht] \begin{figure}[!ht]
\centering \centering
@@ -1560,25 +1548,241 @@ functional modelling. Other sensory systems that are either more complex or
have not been subject to decades of study will likely not be suitable for this have not been subject to decades of study will likely not be suitable for this
approach yet. approach yet.
% \textbf{Song recognition pathway: Grasshopper vs. model:}\\ \subsection{Feature representation, temporal averaging, and song design}
% The model pathway includes a rather large number of Gabor kernels compared to
% the 15 to 20 ascending neurons in the grasshopper auditory
% system~(\bcite{stumpner1991auditory}).
\subsection{Interplay of song representation and song design} The feature set is the final song representation along the model pathway and
constitutes the basis for song recognition. Each feature $f_i(t)$ results from
the thresholding of the respective kernel response $c_i(t)$ by $\nl$ and the
subsequent temporal averaging of binary response $b_i(t)$ by a lowpass filter
with extremely low cutoff frequency $\fc$. At a given time point $t$, $f_i(t)$
approximately quantifies the proportion of time during which $c_i(t)$ exceeds
the threshold value $\thr$ within the averaging interval $\tlp$ specified by
$\fc$. The value of $f_i(t)$ is hence determined by $\thr$ with respect to the
distribution $\pci$ of $c_i(t)$ and is restricted to the interval $[0,1]$.
\textbf{The role of repetitive songs for the feature representation:} Different species-specific songs are represented by different combinations of
Most grasshopper songs are produced by stridulation, which refers to the feature values, which should preferably be constant for the duration of a song
pulling of the serrated stridulatory file on the hindlegs across a resonating to enable reliable recognition. The fundamental requirement for a constant
vein on the forewings~(\bcite{helversen1977stridulatory}; $f_i(t)$ is that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all
\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Every "tooth" that $t$, which is fulfilled if $\pci$ is stable across $t$. The most
strikes the vein generates a brief sound pulse; multiple pulses make up a straightforward way to achieve a stable $\pci$ is that $c_i(t)$ is periodic and
syllable; and the repetition of syllables and pauses results in a $\tlp$ is sufficiently long to average over multiple cycles of $c_i(t)$.
characteristic amplitude-modulated waveform pattern. Song-evoked $c_i(t)$ are indeed approximately periodic, which is largely an
inherited property of the song itself. Most grasshopper songs are produced by
stridulation, which refers to the pulling of the serrated stridulatory file on
the hindlegs across a resonating vein on the
forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song};
\bcite{helversen1997recognition}). Every "tooth" that strikes the vein
generates a brief sound pulse; multiple pulses make up a syllable; and the
repetition of syllables and pauses results in a pattern with a high degree of
temporal regularity. Accordingly, a robust feature representation in the sense
of constant $f_i(t)$ is tightly linked to the mechanism of sound production and
the temporal structure of the generated song.
\subsection{Intensity invariance versus SNR along the auditory pathway} Various grasshopper species, especially those with longer songs like \textit{C.
mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
at first and then continuously increase the amplitude of their song over time.
This slow "ramping" amplitude modulation makes the overall song less periodic
despite its temporal regularity. The "ramping" appears more pronounced in
$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
compression and adaptation during the preprocessing stage might be at least
partially beneficial for mitigating the effect of this amplitude modulation on
later representations. However, the adaptation of $\adapt(t)$ can only act on
certain time scales --- depending on the cutoff frequency of the underlying
highpass filter --- and is hence not able to compensate for "ramping" across
the entire duration of a song.
\subsection{Behavior in a natural acoustic environment} Certain grasshopper species like \textit{Chorthippus dorsatus} are known to
switch their stridulation pattern in the middle of a
song~(\bcite{stumpner1994song}). \textit{C. dorsatus} starts stridulating with
both hindlegs in synchrony and thereby generates a pronounced syllable-pause
pattern similar to that of \textit{P. parallelus}. For the last part of its
song, however, \textit{C. dorsatus} switches to an alternating leg movement,
which results in a more continuous but not entirely unstructured rattling
sound. It is unclear what this composite design means for the feature
representation of \textit{C. dorsatus} songs. In principle, both parts of the
song could result in similar $\pci$ despite their different temporal structure,
which would allow for consistent $f_i(t)$ across the entire song. However, it
appears more likely that only one part of the song encodes species identity,
while the other part serves a different purpose such as fitness
advertisement~(SOURCE?).
Finally, the question remains how the choice of an appropriate averaging
interval $\tlp$ depends on the duration and temporal structure of a song. The
minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$ to ensure a
stable $\pci$ and hence a constant $f_i(t)$. The maximum $\tlp$ should not
exceed the duration of a song to avoid the inclusion of behaviorally irrelevant
information. The longer $\tlp$, the longer $f_i(t)$ takes to stabilize after
the onset and before the offset of a song, which narrows the time window for
reliable recognition. The duration of species-specific grasshopper songs can
range from a few hundred milliseconds (e\,.g \textit{Stethophyma grossum}) to
well over a minute (e\,.g. \textit{C. mollis}), so that the optimal $\tlp$ is
likely to differ between species.
\subsection{Sensory invariances in the grasshopper auditory system}
The notion of invariance is fundamental for sensory processing systems.
Invariance, in the general sense, can be described as the property of a
transformation to maintain variation across certain meaningful input parameters
in its output while discarding variation across other input parameters. This
boils down to a selective input-output decorrelation that allows the system to
represent only those aspects of the stimulus that are behaviorally relevant to
the organism.
The grasshopper auditory system has to deal with a number of sources of
non-informative song variation. For instance, the temporal structure of the
song pattern warps with temperature~(\bcite{skovmand1983song}). This also
affects certain structural parameters that are essential for song recognition,
mainly the duration of syllables and pauses. The auditory system can compensate
for this variation by reading out relative temporal relationships rather than
absolute time intervals~(\bcite{creutzig2009timescale};
\bcite{creutzig2010timescale}). The ratio of syllable duration to pause
duration is relatively constant across temperatures and has been shown to be
suitable for song recognition~(\bcite{helversen1972gesang}), so that there is
likely no need to retain any information about the absolute duration of
syllables and pauses.
The situation is more complex for variations in song intensity. Song intensity
at the receiver's position depends mostly on the distance to the sender and is
hence not a reliable cue to infer species identity. The auditory system should
therefore be invariant to intensity variations to recognize conspecific songs
regardless of sender distance. However, song intensity --- specifically, the
interaural intensity difference --- is also required for directional hearing,
which is essential for phonotaxis~(\bcite{helversen1988interaural}). Conflicts
between song recognition and directional hearing are avoided in the auditory
system by distributing both functions across two parallel
pathways~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}). This is
the main reason why our model pathway is focused entirely on song recognition
and has no capacity for directional hearing, no matter how relevant it may be
to the grasshopper.
Furthermore, "invariance to variations in song intensity" does not do justice
to the full extent of the problem. Intensity is a function of song amplitude
within a certain time frame. It can refer to the individual syllables and
pauses of the song pattern as well as the entire song --- the former is
relevant for song recognition, while the latter is not. Intensity invariance in
the current context can therefore be described as time scale-selective
sensitivity to the faster amplitude dynamics of the song pattern and
simultaneous insensitivity to slower, more sustained amplitude dynamics. In the
model pathway, this time scale selectivity is reflected by the cutoff frequency
$\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most
$\fc$ are effective in removing the local offset of $\db(t)$ and render
$\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ will leave the
relevant amplitude dynamics of the song pattern intact.
\subsection{Intensity invariance versus SNR}
Each processing step along the model pathway is a transformation between input
representation and output representation. The intensity of the input is
characterized by scale $\sca$. The intensity of the output is characterized by
an appropriate intensity measure. If the transformation renders the output more
intensity-invariant, then the intensity measure will saturate for sufficiently
large $\sca$, which caps the output SNR to a constant value across these
$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
monotonically with $\sca$. The trade-off between intensity invariance and SNR
refers to the principle that a transformation can either improve intensity
invariance or maintain SNR --- it cannot do both at the same time. This
principle is presumably not specific to the two mechanisms along the model
pathway but rather a general property of transformations that equalize between
different input intensities.
Logarithmic compression and adaptation by highpass filtering is capable of
equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
output $\adapt(t)$ is a perfectly intensity-invariant representation of song
component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
limits the effectiveness of this mechanism to sufficiently large $\sca$. This
means that intensity invariance and SNR interact at the input level, as well.
Specifically, the saturation point of $\adapt(t)$ is determined by the input
SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
frequencies outside the relevant range of grasshopper songs. The SNR is then
further improved by the rectification and lowpass filtering of $\filt(t)$ into
$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
saturation level and the saturation point of $\adapt(t)$ vary between different
species and specific songs. These differences are likely rooted in the way in
which logarithmic compression acts on the specific distribution of $\env(t)$,
which is determined by $\fc$ and the structure and frequency spectrum of the
rectified $\filt(t)$.
Thresholding and temporal averaging renders feature $f_i(t)$
intensity-invariant for sufficiently large $\sca$. The trade-off between
intensity invariance and SNR is mediated by threshold value $\thr$. A lower
$\thr$ ($\thr\to0$) improves intensity invariance by shifting the saturation
point towards lower $\sca$ but also decreases the SNR of $f_i(t)$. The
saturation level of $f_i(t)$ is independent of $\thr$ as long as the intensity
invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
therefore determined solely by the pure-noise response of $f_i(t)$. The
distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
higher saturation point. In this case, any non-zero feature value that is
sustained for a sufficient duration could serve as indicator for the presence
of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
of $\thr$ to the properties of both the species-specific song and the natural
noise in a certain habitat.
It seems reasonable to assume that $\thr$ is one of the parameters along the
pathway
Physiologically, it is presumably easier to
manipulate $\thr$
It seems reasonable that $\thr$ is easier to
manipulate in ev
Furthermore, $\thr$ is presumably a parameter along
the pathway that
$\thr$
Furthermore, $\thr$ might be one of the parameters
along the pathway
% However, the parameters that determine the SNR of $\adapt(t)$ are much less
% understood and likely relate to properties of the signal, whereas the SNR of
% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
% by the system.
\newpage
\textbf{Thresh-LP: Implication for intensity invariance:}\\
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
other criteria such as song-noise separation or diversity between features
- Nonlinear operations can be used to detach representations from graded physical
stimulus (to fasciliate categorical behavioral decision-making?):\\
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
$\rightarrow$ More decorrelated representation, compared to prior stages\\
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
$\rightarrow$ Trading a graded scale for two or more categorical states\\
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
5) Categorical behavioral decision-making requires further nonlinearities\\
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
initiation of one behavior over another is categorical (e.g. approach/stay)
\subsection{Intensity invariance versus intensity invariance}
\subsection{Implications for behavior in a natural acoustic environment}
% RIPPED FROM INTRODUCTION: % RIPPED FROM INTRODUCTION:
@@ -1645,63 +1849,6 @@ operate on unmodified recordings of natural grasshopper songs instead of
condensed pulse train approximations, which widens its scope towards more condensed pulse train approximations, which widens its scope towards more
realistic, ecologically relevant scenarios. realistic, ecologically relevant scenarios.
\textbf{Excursion into time-warp invariance:}
For instance, the temporal structure of grasshopper songs warps with
temperature~(\bcite{skovmand1983song}). The auditory system can compensate for
this variability by reading out relative temporal relationships rather than
absolute time intervals~(\bcite{creutzig2009timescale};
\bcite{creutzig2010timescale}), as those remain relatively constant across
different temperatures~(\bcite{helversen1972gesang}).
\textbf{Definition of invariance (general, systemic):}\\
Invariance = Property of a system to maintain a stable output with respect to a
set of relevant input parameters (variation to be represented) but irrespective
of one or more other parameters (variation to be discarded)
$\rightarrow$ Selective input-output decorrelation
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
Intensity invariance = Time scale-selective sensitivity to certain faster
amplitude dynamics (song waveform, small-scale AM) and simultaneous
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
large-scale AM, current overall intensity level)\\
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
output will be a flat line
\textbf{Log-HP: Implication for intensity invariance:}\\
- Logarithmic scaling is essential for equalizing different song intensities\\
$\rightarrow$ Intensity information can be manipulated more easily when in form
of a signal offset in log-space than a multiplicative scale in linear space
- Capability to compensate for intensity variations, i.e. selective amplification
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
\textbf{Thresh-LP: Implication for intensity invariance:}\\
- Role of song periodicity for feature representation!
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
other criteria such as song-noise separation or diversity between features
- Nonlinear operations can be used to detach representations from graded physical
stimulus (to fasciliate categorical behavioral decision-making?):\\
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
$\rightarrow$ More decorrelated representation, compared to prior stages\\
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
$\rightarrow$ Trading a graded scale for two or more categorical states\\
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
5) Categorical behavioral decision-making requires further nonlinearities\\
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
initiation of one behavior over another is categorical (e.g. approach/stay)
\newpage \newpage
\section{Appendix} \section{Appendix}
@@ -1716,7 +1863,7 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
$\noc(t)$ within the signal envelope $\env(t)$ over scale $\noc(t)$ within the signal envelope $\env(t)$ over scale
$\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$ $\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$
(corresponding to the analysis underlying (corresponding to the analysis underlying
Fig.\,\ref{fig:rect-lp}), using random 100 realizations of Fig.\,\ref{fig:rect-lp}), using 100 random realizations of
$\noc(t)$.} $\noc(t)$.}
\label{fig:app_env-sd} \label{fig:app_env-sd}
\end{figure}% Referenced. \end{figure}% Referenced.