Progress with cleaning up "IntInv vs SNR".

This commit is contained in:
j-hartling
2026-06-15 18:20:33 +02:00
parent 67690b97f7
commit 21e6ab4d64
2 changed files with 83 additions and 64 deletions

BIN
main.pdf

Binary file not shown.

147
main.tex
View File

@@ -1595,7 +1595,7 @@ does not change substantially within $\tstat$.
% Constraints on the song structure:
% Also: Constant model features vs. actual grasshopper (calling) songs:
% (Also: Third revision and this section still doesn't sound good)
% (Also: Third revision and still far from done and good)
Grasshoppers sing by pulling the stridulatory file on the hindlegs across a
resonating vein on the forewings~(\bcite{helversen1977stridulatory};
\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Different
@@ -1660,6 +1660,7 @@ as soon as $f_i(t)$ is within tolerance or wait for $f_i(t)$ to stabilize for
additional certainty.
\subsection{Invariant processing in the grasshopper auditory system}
\label{sec:general_inv}
% Invariance in the general (systemic) sense:
The notion of invariance is fundamental for sensory processing systems.
@@ -1710,45 +1711,64 @@ time scale-selectivity is reflected by the cutoff frequency $\fc$ of the
highpass filter that underlies the adaptation of $\adapt(t)$: Most $\fc$ except
the lowest ones are effective in removing the local offset of $\db(t)$ and
render $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$
preserve the relevant amplitude dynamics of the song pattern. Intensity
invariance by thresholding and temporal averaging also has a relevant time
scale, which is determined by the averaging interval $\tlp$. However, this time
scale is not constrained by the need to preserve the temporal structure of the
song pattern but to provide a suitable degree of temporal integration across
the song pattern~(Section\,\ref{sec:constant_feat}).
preserve the relevant amplitude dynamics of the song pattern. The time scale of
intensity invariance by thresholding and temporal averaging is determined by
the averaging interval $\tlp$. However, unlike $\fc$, $\tlp$ is not constrained
by the need to preserve the song pattern but rather to provide a suitable
degree of temporal integration~(Section\,\ref{sec:constant_feat}).
\subsection{Intensity invariance versus SNR}
\subsection{Intensity invariance versus SNR along the model pathway}
Each processing step along the model pathway is a transformation between input
representation and output representation. The intensity of the input is
characterized by scale $\sca$. The intensity of the output is characterized by
an appropriate intensity measure. If the transformation renders the output more
intensity-invariant, then the intensity measure will saturate for sufficiently
large $\sca$, which caps the output SNR to a constant value across these
$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
monotonically with $\sca$. The trade-off between intensity invariance and SNR
refers to the principle that a transformation can either improve intensity
invariance or maintain SNR --- it cannot do both at the same time. This
principle is presumably not specific to the two mechanisms along the model
pathway but rather a general property of transformations that equalize between
different input intensities.
% % Establishing the principle trade-off (should maybe come later?):
% The output of a transformation is considered to be intensity-invariant if its
% intensity measure saturates for sufficiently large scales $\sca$, which in turn
% caps the output SNR to a constant value across these $\sca$. Otherwise, the
% output SNR will increase monotonically with $\sca$. The trade-off between
% intensity invariance and SNR refers to the principle that a transformation can
% either improve intensity invariance or maintain SNR --- it cannot do both at
% the same time. This principle is most likely not specific to the two mechanisms
% along the model pathway but rather a general property of transformations that
% equalize between different input intensities.
Logarithmic compression and adaptation by highpass filtering is capable of
equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
output $\adapt(t)$ is a perfectly intensity-invariant representation of song
component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
limits the effectiveness of this mechanism to sufficiently large $\sca$. This
means that intensity invariance and SNR interact at the input level, as well.
Specifically, the saturation point of $\adapt(t)$ is determined by the input
SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
frequencies outside the relevant range of grasshopper songs. The SNR is then
further improved by the rectification and lowpass filtering of $\filt(t)$ into
$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
% Building a sufficient SNR "buffer":
A stridulating grasshopper generates a song with a specific initial intensity,
which is steadily attenuated as the song propagates through the
environment~(\bcite{michelsen1978sound}). A listening grasshopper receives a
sound signal $\raw(t)$, which is a mixture of the song component $\soc(t)$ with
scale $\sca$ and the environmental noise component $\noc(t)$. The greater the
distance between sender and receiver, the smaller $\sca$ and hence the lower
the SNR of $\raw(t)$ at the position of the receiver. The tympanal bandpass
filtering of $\raw(t)$ into $\filt(t)$ likely improves the SNR by attenuating
frequencies outside the relevant range of grasshopper songs. The SNR is further
improved by the rectification and lowpass filtering of $\filt(t)$ into
$\env(t)$. The lower the cutoff frequency $\fc$ of the lowpass filter, the
higher the SNR of $\env(t)$ at a given $\sca$, although $\fc$ must also be
sufficiently high to preserve the amplitude dynamics of the song pattern.
Overall, the first processing steps along the pathway are not designed to
achieve intensity invariance but rather to improve the SNR of the song
representation beyond the initial SNR of $\raw(t)$.
The first mechanism of intensity invariance consists of logarithmic compression
and adaptation of $\env(t)$ into $\adapt(t)$. In the absence of $\noc(t)$,
$\adapt(t)$ is a perfectly intensity-invariant representation of $\soc(t)$. In
the presence of $\noc(t)$, $\adapt(t)$ is intensity-invariant only for a
sufficiently high SNR of $\env(t)$. The preceeding SNR improvements from
$\raw(t)$ to $\env(t)$ thus serve to improve the intensity invariance of
$\adapt(t)$ by shifting the saturation point towards lower $\sca$. However,
this effect is limited --- if the SNR of $\raw(t)$ at the receiver's position
does not allow for a sufficiently high SNR of $\env(t)$, $\adapt(t)$ will not
be intensity-invariant. The initial song intensity that the sender can achieve
therefore determines the distance at which $\adapt(t)$ is intensity-invariant
to the receiver.
Assuming that intensity invariance of $\adapt(t)$ is required for reliable song
recognition,
This might be a reason why robustness to noise masking is an
attractive property of male calling songs~(\bcite{einhaupl2011attractiveness}).
The saturation level of $\adapt$,
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
@@ -1798,35 +1818,34 @@ the saturation level of $f_i(t)$ will be determined by the second mechanism.
The saturation points of $f_i(t)$ across the set are distributed over a much
wider range than those of the preceeding kernel responses $c_i(t)$, which
suggests that the interaction between the two mechanisms is specific to
individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
marginally lower saturation points. This raises the question whether two
consecutive mechanisms of intensity invariance are actually beneficial for the
overall system.
individual kernels. A number of $f_i(t)$ achieve a lower saturation point than
the respective $c_i(t)$, whereas some $f_i(t)$ exhibit similar or only
marginally lower saturation points. In these cases, the question arises to what
extent two consecutive mechanisms of intensity invariance are actually
beneficial for the overall system.
Various grasshopper species, especially those with longer songs like \textit{C.
mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
at first and then continuously increase the amplitude of their song over time.
This slow "ramping" amplitude modulation makes the overall song less periodic
despite its temporal regularity. The "ramping" appears more pronounced in
$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
compression and adaptation during the preprocessing stage might be at least
partially beneficial for mitigating the effect of this amplitude modulation on
later representations. However, the adaptation of $\adapt(t)$ can only act on
certain time scales --- depending on the cutoff frequency of the underlying
highpass filter --- and is hence not able to compensate for "ramping" across
the entire duration of a song.
From a computational perspective, the answer could be that logarithmic
compression and adaptation is a necessary preprocessing step towards robust
$f_i(t)$ because it works towards a more consistent distribution $\pci$ of
$c_i(t)$. If $\pci$ is consistent between different songs of the same species,
a static threshold value $\thr$ is sufficient to generate a consistent
species-specific feature representation. If $\pci$ is consistent over the
course of a song, $f_i(t)$ is constant throughout the song, which extends the
time window for reliable recognition~(Section\,\ref{sec:constant_feat}).
From a purely functional perspective, the answer could be that logarithmic
compression and adaptation is a necessary preprocessing step towards a robust
feature representation, even if thresholding and temporal averaging alone would
be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
improves the temporal regularity of the song pattern in $\adapt(t)$ and
$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
is essential for the generation of consistent species-specific $f_i(t)$ under a
static $\thr$. From a physiological perspective, the answer is likely that
First, the preprocessing results in a more consistent
distribution $\pci$ of $c_i(t)$ between songs of different intensity and in
turn allows for the generation of consistent $f_i(t)$ under a static threshold
value $\thr$. Second, this preprocessing improves the temporal regularity of
the song pattern by mitigating the slow "ramping" amplitude modulation that is
common to many grasshopper songs.
This preprocessing likely improves the temporal regularity of the song pattern
in $\adapt(t)$ and $c_i(t)$, which is required for constant $f_i(t)$ across the
duration of a song~(Section\,\ref{sec:constant_feat}).
From a physiological perspective, the answer is likely that
neurons possess only a limited firing rate for encoding stimulus intensities
that can range over several orders of magnitude. Sigmoidal tuning curves over
logarithmically compressed stimulus intensities are a common property of