Writing discussion.

This commit is contained in:
j-hartling
2026-05-28 18:17:59 +02:00
parent 6cd56b82b0
commit 1878fb5eaf
2 changed files with 289 additions and 142 deletions

431
main.tex
View File

@@ -104,6 +104,7 @@
\newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
\newcommand{\pci}{p(c_i,\,\tlp)} % Kernel-specific probability density (lowpass interval)
\newcommand{\muf}{\mu_{f_i}} % Average feature value
\section{Introduction}
@@ -258,12 +259,13 @@ initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
\bcite{bhavsar2017brain}).
Functionally, the ascending neurons are the most diverse of the three neuronal
populations. Individual ascending neurons possess highly specific response
properties that contrast with the rather homogeneous response properties of the
preceding receptor neurons and local
interneurons~(\bcite{clemens2011efficient}), which indicates a transition from
a uniform population-wide processing stream into several parallel branches.
Accordingly, the model pathway is divided into two distinct
populations. Around 15 to 20 ascending neurons have been identified in the
grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
ascending neurons possess highly specific response properties that contrast
with the rather homogeneous response properties of the preceding receptor
neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
a transition from a uniform population-wide processing stream into several
parallel branches. Accordingly, the model pathway is divided into two distinct
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
processing steps at the levels of the tympanal membrane, the receptor neurons,
and the local interneurons; and operates on one-dimensional signal
@@ -754,16 +756,15 @@ This effect is more pronounced for lower $\fc$ of the lowpass filter and is
presumably caused by the attenuation of high-frequency components in the
signal, which are more prominent in the noise component $\noc(t)$ than in the
song component $\soc(t)$. The effect also appears relatively consistent across
different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e)
that are presumably based on different song structures and frequency spectra.
In summary, the standard deviation of $\env(t)$ has never been observed to
transition into a saturation regime for larger $\sca$ but rather continues to
increase proportionally to $\sca$ for all tested $\fc$, in both the noiseless
and the noisy case and across different species. Consequently, the combination
of rectification and lowpass filtering does not contribute to intensity
invariance. However, this transformation pair does improve the SNR of $\env(t)$
relative to $\filt(t)$ and thus provides subsequent processing stages with a
more robust input representation and higher input SNR.
different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e
and appendix Fig.\,\ref{fig:app_rect-lp}). In summary, the standard deviation
of $\env(t)$ has never been observed to saturate for larger $\sca$ but rather
continues to increase proportionally to $\sca$ for all tested $\fc$, in both
the noiseless and the noisy case and across different species. Consequently,
the combination of rectification and lowpass filtering does not contribute to
intensity invariance. However, this transformation pair does improve the SNR of
$\env(t)$ relative to $\filt(t)$ and thus provides subsequent processing stages
with a more robust input representation and higher input SNR.
\begin{figure}[!ht]
\centering
@@ -883,24 +884,23 @@ $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the
effective intensity invariance of $\adapt(t)$ through logarithmic compression
and adaptation is limited by the SNR of $\env(t)$: Songs that have already
sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
subsequent processing steps, which emphasizes the importance of the SNR
improvement by rectification and lowpass filtering during the previous
processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise
regime, transient regime, and saturation regime remains consistent across
different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of
$\sca$ at which the saturation regime is reached (see appendix
Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$
within the saturation regime vary considerably between and within species. For
subsequent processing steps. The general pattern of noise regime, transient
regime, and saturation regime remains consistent across different
species~(Fig.\,\ref{fig:log-hp}e). However, the saturation point --- the $\sca$
value at which the SNR of $\adapt(t)$ starts to saturate --- and the saturation
level --- the constant SNR of $\adapt(t)$ within the saturation regime --- vary
considerably between and within species~(appendix
Figs.\,\ref{fig:app_log-hp_curves}+\ref{fig:app_log-hp_saturation}). For
example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
lower maximum SNR of $\adapt(t)$ compared to other species. These differences
are not to be underestimated, since the SNR of $\adapt(t)$ within the
saturation regime determines the maximum input SNR for subsequent processing
steps. In other words, the fact that $\adapt(t)$ eventually reaches a
saturation regime is, of course, desirable in the context of intensity
invariance, but it also means to pass up on the higher SNR values that are
achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
is a recurring phenomenon that is further addressed in the following sections.
lower saturation level compared to other species. These differences are not to
be underestimated, since the saturation level of $\adapt(t)$ determines the
maximum input SNR for subsequent processing steps. In other words, the fact
that $\adapt(t)$ eventually reaches a saturation regime is, of course,
desirable in the context of intensity invariance, but it also means to pass up
on the higher SNR values that are achieved by $\env(t)$ for the same $\sca$ (up
to several orders of magnitude, Fig.\,\ref{fig:log-hp}d). This trade-off
between intensity invariance and SNR is a recurring phenomenon that is further
addressed in the following sections.
\begin{figure}[!ht]
\centering
@@ -1000,24 +1000,17 @@ sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
saturation regime).
The value of $\mu_f$ in the saturation regime is independent of the precise
value of $\Theta$, but the value of $\sca$ at which the saturation regime is
reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
a threshold value of $\Theta=0$ would be the optimal choice for achieving
intensity invariance at the lowest possible $\sca$. In stark contrast, the
closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise
component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise
regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column,
and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song
component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
$\sca$ to reach the saturation regime. This trade-off between intensity
invariance and SNR has already been observed during the previous analysis on
logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
parameters that determine the SNR of $\adapt(t)$ are much less understood and
likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
the choice of $\Theta$ and can be more directly manipulated by the system.
The saturation level of $f(t)$ is independent of the precise value of $\Theta$,
but the saturation point decreases with
$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, a threshold value of
$\Theta=0$ would be the optimal choice for achieving intensity invariance at
the lowest possible $\sca$. In stark contrast, the closer $\Theta$ is to 0, the
higher $\mu_f$ in response to the pure noise component $\noc(t)$ and the lower
the resulting SNR of $f(t)$ between noise regime and saturation
regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
Fig.\,\ref{fig:thresh-lp_single}e). This trade-off between intensity invariance
and SNR has already been observed during the previous analysis on logarithmic
compression and adaptation~(Fig.\,\ref{fig:log-hp}d).
Finally, the effects of thresholding and temporal averaging must be seen in the
context of the previous transformation pair of logarithmic compression and
@@ -1102,11 +1095,11 @@ that the songs of each species are eventually represented by distinct points in
feature space. However, the species-specific trajectories cross each other at
numerous points, which means that the songs of two species --- each at a
specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and
the species: For \textit{C. mollis}, all $\muf$ saturate around the same
$\sca$, while \textit{O. rufipes} exhibits considerable variation between the
three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$,
the stronger the curvature of the trajectory through feature space.
the specific saturation point of $f_i(t)$ depends on the species: For
\textit{C. mollis}, all $\muf$ saturate around the same $\sca$, while
\textit{O. rufipes} exhibits considerable variation between the three $f_i(t)$.
The larger the variation in saturation points between $f_i(t)$, the stronger
the curvature of the trajectory through feature space.
In the noisy case, $\muf$ is non-zero even for the smallest
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
@@ -1121,9 +1114,9 @@ previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
trajectories now move a much shorter distance through feature space for a
similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
and saturation regime, which increases the likelihood of trajectories crossing
each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given
species are slightly higher in the noisy case, but the variation between
$f_i(t)$ remains largely unchanged.
each other. Finally, the saturation points of $f_i(t)$ for a given species are
slightly higher in the noisy case, but the variation between $f_i(t)$ remains
largely unchanged.
In summary, even a comparably small set of three features $f_i(t)$ can, in
principle, represent different species-specific songs at distinct points in
@@ -1238,15 +1231,10 @@ broader and is not centered around the single saturation point based on the
median but rather shifted towards lower $\sca$. Care must be taken when
interpreting the height of either distribution due to the logarithmic scaling
of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their
the saturation points of specific $f_i(t)$ are indeed lower than those of their
$c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
averaging on intensity invariance is not necessarily nullified by the previous
logarithmic compression and adaptation, which means that both mechanisms can,
in principle, work together towards an intensity-invariant song representation.
% Or does one simply overwrite the other? Can there even be a higher intensity
% invariance based on the sum of both effects? Or does one simply kick in for
% lower scales than the other and thus dictates the overall intensity
% invariance? Whatever, discussion material.
logarithmic compression and adaptation.
\begin{figure}[!ht]
\centering
@@ -1313,7 +1301,7 @@ representation goes hand in hand with a substantial degree of redundancy and is
hardly expected to be present in the actual grasshopper auditory system. But
the fact that the saturated $\muf$ are distributed symmetrically around 0.5
provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
saturation value in the absence of logarithmic
saturation level in the absence of logarithmic
compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
the capping of $\adapt(t)$, as seen during previous
analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
@@ -1327,8 +1315,8 @@ that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
results in a wider range of $\muf$ across the feature set, it should be
benefitial for forming species-specific combinations. However, this depends on
multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
the structure and distribution of the specific song and is hence not
guaranteed simply by disabling logarithmic compression.
the structure and distribution of the specific song and is hence not guaranteed
simply by disabling logarithmic compression.
\begin{figure}[!ht]
\centering
@@ -1560,25 +1548,241 @@ functional modelling. Other sensory systems that are either more complex or
have not been subject to decades of study will likely not be suitable for this
approach yet.
% \textbf{Song recognition pathway: Grasshopper vs. model:}\\
% The model pathway includes a rather large number of Gabor kernels compared to
% the 15 to 20 ascending neurons in the grasshopper auditory
% system~(\bcite{stumpner1991auditory}).
\subsection{Feature representation, temporal averaging, and song design}
\subsection{Interplay of song representation and song design}
The feature set is the final song representation along the model pathway and
constitutes the basis for song recognition. Each feature $f_i(t)$ results from
the thresholding of the respective kernel response $c_i(t)$ by $\nl$ and the
subsequent temporal averaging of binary response $b_i(t)$ by a lowpass filter
with extremely low cutoff frequency $\fc$. At a given time point $t$, $f_i(t)$
approximately quantifies the proportion of time during which $c_i(t)$ exceeds
the threshold value $\thr$ within the averaging interval $\tlp$ specified by
$\fc$. The value of $f_i(t)$ is hence determined by $\thr$ with respect to the
distribution $\pci$ of $c_i(t)$ and is restricted to the interval $[0,1]$.
\textbf{The role of repetitive songs for the feature representation:}
Most grasshopper songs are produced by stridulation, which refers to the
pulling of the serrated stridulatory file on the hindlegs across a resonating
vein on the forewings~(\bcite{helversen1977stridulatory};
\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Every "tooth" that
strikes the vein generates a brief sound pulse; multiple pulses make up a
syllable; and the repetition of syllables and pauses results in a
characteristic amplitude-modulated waveform pattern.
Different species-specific songs are represented by different combinations of
feature values, which should preferably be constant for the duration of a song
to enable reliable recognition. The fundamental requirement for a constant
$f_i(t)$ is that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all
$t$, which is fulfilled if $\pci$ is stable across $t$. The most
straightforward way to achieve a stable $\pci$ is that $c_i(t)$ is periodic and
$\tlp$ is sufficiently long to average over multiple cycles of $c_i(t)$.
Song-evoked $c_i(t)$ are indeed approximately periodic, which is largely an
inherited property of the song itself. Most grasshopper songs are produced by
stridulation, which refers to the pulling of the serrated stridulatory file on
the hindlegs across a resonating vein on the
forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song};
\bcite{helversen1997recognition}). Every "tooth" that strikes the vein
generates a brief sound pulse; multiple pulses make up a syllable; and the
repetition of syllables and pauses results in a pattern with a high degree of
temporal regularity. Accordingly, a robust feature representation in the sense
of constant $f_i(t)$ is tightly linked to the mechanism of sound production and
the temporal structure of the generated song.
\subsection{Intensity invariance versus SNR along the auditory pathway}
Various grasshopper species, especially those with longer songs like \textit{C.
mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
at first and then continuously increase the amplitude of their song over time.
This slow "ramping" amplitude modulation makes the overall song less periodic
despite its temporal regularity. The "ramping" appears more pronounced in
$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
compression and adaptation during the preprocessing stage might be at least
partially beneficial for mitigating the effect of this amplitude modulation on
later representations. However, the adaptation of $\adapt(t)$ can only act on
certain time scales --- depending on the cutoff frequency of the underlying
highpass filter --- and is hence not able to compensate for "ramping" across
the entire duration of a song.
\subsection{Behavior in a natural acoustic environment}
Certain grasshopper species like \textit{Chorthippus dorsatus} are known to
switch their stridulation pattern in the middle of a
song~(\bcite{stumpner1994song}). \textit{C. dorsatus} starts stridulating with
both hindlegs in synchrony and thereby generates a pronounced syllable-pause
pattern similar to that of \textit{P. parallelus}. For the last part of its
song, however, \textit{C. dorsatus} switches to an alternating leg movement,
which results in a more continuous but not entirely unstructured rattling
sound. It is unclear what this composite design means for the feature
representation of \textit{C. dorsatus} songs. In principle, both parts of the
song could result in similar $\pci$ despite their different temporal structure,
which would allow for consistent $f_i(t)$ across the entire song. However, it
appears more likely that only one part of the song encodes species identity,
while the other part serves a different purpose such as fitness
advertisement~(SOURCE?).
Finally, the question remains how the choice of an appropriate averaging
interval $\tlp$ depends on the duration and temporal structure of a song. The
minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$ to ensure a
stable $\pci$ and hence a constant $f_i(t)$. The maximum $\tlp$ should not
exceed the duration of a song to avoid the inclusion of behaviorally irrelevant
information. The longer $\tlp$, the longer $f_i(t)$ takes to stabilize after
the onset and before the offset of a song, which narrows the time window for
reliable recognition. The duration of species-specific grasshopper songs can
range from a few hundred milliseconds (e\,.g \textit{Stethophyma grossum}) to
well over a minute (e\,.g. \textit{C. mollis}), so that the optimal $\tlp$ is
likely to differ between species.
\subsection{Sensory invariances in the grasshopper auditory system}
The notion of invariance is fundamental for sensory processing systems.
Invariance, in the general sense, can be described as the property of a
transformation to maintain variation across certain meaningful input parameters
in its output while discarding variation across other input parameters. This
boils down to a selective input-output decorrelation that allows the system to
represent only those aspects of the stimulus that are behaviorally relevant to
the organism.
The grasshopper auditory system has to deal with a number of sources of
non-informative song variation. For instance, the temporal structure of the
song pattern warps with temperature~(\bcite{skovmand1983song}). This also
affects certain structural parameters that are essential for song recognition,
mainly the duration of syllables and pauses. The auditory system can compensate
for this variation by reading out relative temporal relationships rather than
absolute time intervals~(\bcite{creutzig2009timescale};
\bcite{creutzig2010timescale}). The ratio of syllable duration to pause
duration is relatively constant across temperatures and has been shown to be
suitable for song recognition~(\bcite{helversen1972gesang}), so that there is
likely no need to retain any information about the absolute duration of
syllables and pauses.
The situation is more complex for variations in song intensity. Song intensity
at the receiver's position depends mostly on the distance to the sender and is
hence not a reliable cue to infer species identity. The auditory system should
therefore be invariant to intensity variations to recognize conspecific songs
regardless of sender distance. However, song intensity --- specifically, the
interaural intensity difference --- is also required for directional hearing,
which is essential for phonotaxis~(\bcite{helversen1988interaural}). Conflicts
between song recognition and directional hearing are avoided in the auditory
system by distributing both functions across two parallel
pathways~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}). This is
the main reason why our model pathway is focused entirely on song recognition
and has no capacity for directional hearing, no matter how relevant it may be
to the grasshopper.
Furthermore, "invariance to variations in song intensity" does not do justice
to the full extent of the problem. Intensity is a function of song amplitude
within a certain time frame. It can refer to the individual syllables and
pauses of the song pattern as well as the entire song --- the former is
relevant for song recognition, while the latter is not. Intensity invariance in
the current context can therefore be described as time scale-selective
sensitivity to the faster amplitude dynamics of the song pattern and
simultaneous insensitivity to slower, more sustained amplitude dynamics. In the
model pathway, this time scale selectivity is reflected by the cutoff frequency
$\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most
$\fc$ are effective in removing the local offset of $\db(t)$ and render
$\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ will leave the
relevant amplitude dynamics of the song pattern intact.
\subsection{Intensity invariance versus SNR}
Each processing step along the model pathway is a transformation between input
representation and output representation. The intensity of the input is
characterized by scale $\sca$. The intensity of the output is characterized by
an appropriate intensity measure. If the transformation renders the output more
intensity-invariant, then the intensity measure will saturate for sufficiently
large $\sca$, which caps the output SNR to a constant value across these
$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
monotonically with $\sca$. The trade-off between intensity invariance and SNR
refers to the principle that a transformation can either improve intensity
invariance or maintain SNR --- it cannot do both at the same time. This
principle is presumably not specific to the two mechanisms along the model
pathway but rather a general property of transformations that equalize between
different input intensities.
Logarithmic compression and adaptation by highpass filtering is capable of
equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
output $\adapt(t)$ is a perfectly intensity-invariant representation of song
component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
limits the effectiveness of this mechanism to sufficiently large $\sca$. This
means that intensity invariance and SNR interact at the input level, as well.
Specifically, the saturation point of $\adapt(t)$ is determined by the input
SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
frequencies outside the relevant range of grasshopper songs. The SNR is then
further improved by the rectification and lowpass filtering of $\filt(t)$ into
$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
saturation level and the saturation point of $\adapt(t)$ vary between different
species and specific songs. These differences are likely rooted in the way in
which logarithmic compression acts on the specific distribution of $\env(t)$,
which is determined by $\fc$ and the structure and frequency spectrum of the
rectified $\filt(t)$.
Thresholding and temporal averaging renders feature $f_i(t)$
intensity-invariant for sufficiently large $\sca$. The trade-off between
intensity invariance and SNR is mediated by threshold value $\thr$. A lower
$\thr$ ($\thr\to0$) improves intensity invariance by shifting the saturation
point towards lower $\sca$ but also decreases the SNR of $f_i(t)$. The
saturation level of $f_i(t)$ is independent of $\thr$ as long as the intensity
invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
therefore determined solely by the pure-noise response of $f_i(t)$. The
distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
higher saturation point. In this case, any non-zero feature value that is
sustained for a sufficient duration could serve as indicator for the presence
of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
of $\thr$ to the properties of both the species-specific song and the natural
noise in a certain habitat.
It seems reasonable to assume that $\thr$ is one of the parameters along the
pathway
Physiologically, it is presumably easier to
manipulate $\thr$
It seems reasonable that $\thr$ is easier to
manipulate in ev
Furthermore, $\thr$ is presumably a parameter along
the pathway that
$\thr$
Furthermore, $\thr$ might be one of the parameters
along the pathway
% However, the parameters that determine the SNR of $\adapt(t)$ are much less
% understood and likely relate to properties of the signal, whereas the SNR of
% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
% by the system.
\newpage
\textbf{Thresh-LP: Implication for intensity invariance:}\\
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
other criteria such as song-noise separation or diversity between features
- Nonlinear operations can be used to detach representations from graded physical
stimulus (to fasciliate categorical behavioral decision-making?):\\
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
$\rightarrow$ More decorrelated representation, compared to prior stages\\
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
$\rightarrow$ Trading a graded scale for two or more categorical states\\
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
5) Categorical behavioral decision-making requires further nonlinearities\\
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
initiation of one behavior over another is categorical (e.g. approach/stay)
\subsection{Intensity invariance versus intensity invariance}
\subsection{Implications for behavior in a natural acoustic environment}
% RIPPED FROM INTRODUCTION:
@@ -1645,63 +1849,6 @@ operate on unmodified recordings of natural grasshopper songs instead of
condensed pulse train approximations, which widens its scope towards more
realistic, ecologically relevant scenarios.
\textbf{Excursion into time-warp invariance:}
For instance, the temporal structure of grasshopper songs warps with
temperature~(\bcite{skovmand1983song}). The auditory system can compensate for
this variability by reading out relative temporal relationships rather than
absolute time intervals~(\bcite{creutzig2009timescale};
\bcite{creutzig2010timescale}), as those remain relatively constant across
different temperatures~(\bcite{helversen1972gesang}).
\textbf{Definition of invariance (general, systemic):}\\
Invariance = Property of a system to maintain a stable output with respect to a
set of relevant input parameters (variation to be represented) but irrespective
of one or more other parameters (variation to be discarded)
$\rightarrow$ Selective input-output decorrelation
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
Intensity invariance = Time scale-selective sensitivity to certain faster
amplitude dynamics (song waveform, small-scale AM) and simultaneous
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
large-scale AM, current overall intensity level)\\
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
output will be a flat line
\textbf{Log-HP: Implication for intensity invariance:}\\
- Logarithmic scaling is essential for equalizing different song intensities\\
$\rightarrow$ Intensity information can be manipulated more easily when in form
of a signal offset in log-space than a multiplicative scale in linear space
- Capability to compensate for intensity variations, i.e. selective amplification
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
\textbf{Thresh-LP: Implication for intensity invariance:}\\
- Role of song periodicity for feature representation!
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
other criteria such as song-noise separation or diversity between features
- Nonlinear operations can be used to detach representations from graded physical
stimulus (to fasciliate categorical behavioral decision-making?):\\
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
$\rightarrow$ More decorrelated representation, compared to prior stages\\
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
$\rightarrow$ Trading a graded scale for two or more categorical states\\
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
5) Categorical behavioral decision-making requires further nonlinearities\\
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
initiation of one behavior over another is categorical (e.g. approach/stay)
\newpage
\section{Appendix}
@@ -1716,7 +1863,7 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
$\noc(t)$ within the signal envelope $\env(t)$ over scale
$\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$
(corresponding to the analysis underlying
Fig.\,\ref{fig:rect-lp}), using random 100 realizations of
Fig.\,\ref{fig:rect-lp}), using 100 random realizations of
$\noc(t)$.}
\label{fig:app_env-sd}
\end{figure}% Referenced.