Progress with cleaning up "IntInv vs SNR".

2026-06-15 18:20:33 +02:00
parent 67690b97f7
commit 21e6ab4d64
2 changed files with 83 additions and 64 deletions
--- a/main.pdf
+++ b/main.pdf
--- a/main.tex
+++ b/main.tex
@@ -1595,7 +1595,7 @@ does not change substantially within $\tstat$.

 % Constraints on the song structure:
 % Also: Constant model features vs. actual grasshopper (calling) songs:
-% (Also: Third revision and this section still doesn't sound good)
+% (Also: Third revision and still far from done and good)
 Grasshoppers sing by pulling the stridulatory file on the hindlegs across a
 resonating vein on the forewings~(\bcite{helversen1977stridulatory};
 \bcite{stumpner1994song}; \bcite{helversen1997recognition}). Different
@@ -1660,6 +1660,7 @@ as soon as $f_i(t)$ is within tolerance or wait for $f_i(t)$ to stabilize for
 additional certainty.

 \subsection{Invariant processing in the grasshopper auditory system}
+\label{sec:general_inv}

 % Invariance in the general (systemic) sense:
 The notion of invariance is fundamental for sensory processing systems.
@@ -1710,45 +1711,64 @@ time scale-selectivity is reflected by the cutoff frequency $\fc$ of the
 highpass filter that underlies the adaptation of $\adapt(t)$: Most $\fc$ except
 the lowest ones are effective in removing the local offset of $\db(t)$ and
 render $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$
-preserve the relevant amplitude dynamics of the song pattern. Intensity
-invariance by thresholding and temporal averaging also has a relevant time
-scale, which is determined by the averaging interval $\tlp$. However, this time
-scale is not constrained by the need to preserve the temporal structure of the
-song pattern but to provide a suitable degree of temporal integration across
-the song pattern~(Section\,\ref{sec:constant_feat}).
+preserve the relevant amplitude dynamics of the song pattern. The time scale of
+intensity invariance by thresholding and temporal averaging is determined by
+the averaging interval $\tlp$. However, unlike $\fc$, $\tlp$ is not constrained
+by the need to preserve the song pattern but rather to provide a suitable
+degree of temporal integration~(Section\,\ref{sec:constant_feat}).

-\subsection{Intensity invariance versus SNR}
+\subsection{Intensity invariance versus SNR along the model pathway}

-Each processing step along the model pathway is a transformation between input
-representation and output representation. The intensity of the input is
-characterized by scale $\sca$. The intensity of the output is characterized by
-an appropriate intensity measure. If the transformation renders the output more
-intensity-invariant, then the intensity measure will saturate for sufficiently
-large $\sca$, which caps the output SNR to a constant value across these
-$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
-monotonically with $\sca$. The trade-off between intensity invariance and SNR
-refers to the principle that a transformation can either improve intensity
-invariance or maintain SNR --- it cannot do both at the same time. This
-principle is presumably not specific to the two mechanisms along the model
-pathway but rather a general property of transformations that equalize between
-different input intensities.
+% % Establishing the principle trade-off (should maybe come later?):
+% The output of a transformation is considered to be intensity-invariant if its
+% intensity measure saturates for sufficiently large scales $\sca$, which in turn
+% caps the output SNR to a constant value across these $\sca$. Otherwise, the
+% output SNR will increase monotonically with $\sca$. The trade-off between
+% intensity invariance and SNR refers to the principle that a transformation can
+% either improve intensity invariance or maintain SNR --- it cannot do both at
+% the same time. This principle is most likely not specific to the two mechanisms
+% along the model pathway but rather a general property of transformations that
+% equalize between different input intensities.

-Logarithmic compression and adaptation by highpass filtering is capable of
-equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
-output $\adapt(t)$ is a perfectly intensity-invariant representation of song
-component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
-limits the effectiveness of this mechanism to sufficiently large $\sca$. This
-means that intensity invariance and SNR interact at the input level, as well.
-Specifically, the saturation point of $\adapt(t)$ is determined by the input
-SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
-$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
-$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
-frequencies outside the relevant range of grasshopper songs. The SNR is then
-further improved by the rectification and lowpass filtering of $\filt(t)$ into
-$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
-lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
-$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
-amplitude dynamics of the song pattern. The saturation level of $\adapt$,
+% Building a sufficient SNR "buffer":
+A stridulating grasshopper generates a song with a specific initial intensity,
+which is steadily attenuated as the song propagates through the
+environment~(\bcite{michelsen1978sound}). A listening grasshopper receives a
+sound signal $\raw(t)$, which is a mixture of the song component $\soc(t)$ with
+scale $\sca$ and the environmental noise component $\noc(t)$. The greater the
+distance between sender and receiver, the smaller $\sca$ and hence the lower
+the SNR of $\raw(t)$ at the position of the receiver. The tympanal bandpass
+filtering of $\raw(t)$ into $\filt(t)$ likely improves the SNR by attenuating
+frequencies outside the relevant range of grasshopper songs. The SNR is further
+improved by the rectification and lowpass filtering of $\filt(t)$ into
+$\env(t)$. The lower the cutoff frequency $\fc$ of the lowpass filter, the
+higher the SNR of $\env(t)$ at a given $\sca$, although $\fc$ must also be
+sufficiently high to preserve the amplitude dynamics of the song pattern.
+Overall, the first processing steps along the pathway are not designed to
+achieve intensity invariance but rather to improve the SNR of the song
+representation beyond the initial SNR of $\raw(t)$.
+
+The first mechanism of intensity invariance consists of logarithmic compression
+and adaptation of $\env(t)$ into $\adapt(t)$. In the absence of $\noc(t)$,
+$\adapt(t)$ is a perfectly intensity-invariant representation of $\soc(t)$. In
+the presence of $\noc(t)$, $\adapt(t)$ is intensity-invariant only for a
+sufficiently high SNR of $\env(t)$. The preceeding SNR improvements from
+$\raw(t)$ to $\env(t)$ thus serve to improve the intensity invariance of
+$\adapt(t)$ by shifting the saturation point towards lower $\sca$. However,
+this effect is limited --- if the SNR of $\raw(t)$ at the receiver's position
+does not allow for a sufficiently high SNR of $\env(t)$, $\adapt(t)$ will not
+be intensity-invariant. The initial song intensity that the sender can achieve
+therefore determines the distance at which $\adapt(t)$ is intensity-invariant
+to the receiver.
+
+Assuming that intensity invariance of $\adapt(t)$ is required for reliable song
+recognition, 
+
+
+This might be a reason why robustness to noise masking is an
+attractive property of male calling songs~(\bcite{einhaupl2011attractiveness}).
+
+The saturation level of $\adapt$,
 unlike its saturation point, is independent of the SNR of $\env(t)$ because the
 influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
 SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
@@ -1798,35 +1818,34 @@ the saturation level of $f_i(t)$ will be determined by the second mechanism.
 The saturation points of $f_i(t)$ across the set are distributed over a much
 wider range than those of the preceeding kernel responses $c_i(t)$, which
 suggests that the interaction between the two mechanisms is specific to
-individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
-point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
-marginally lower saturation points. This raises the question whether two
-consecutive mechanisms of intensity invariance are actually beneficial for the
-overall system.
+individual kernels. A number of $f_i(t)$ achieve a lower saturation point than
+the respective $c_i(t)$, whereas some $f_i(t)$ exhibit similar or only
+marginally lower saturation points. In these cases, the question arises to what
+extent two consecutive mechanisms of intensity invariance are actually
+beneficial for the overall system.

-Various grasshopper species, especially those with longer songs like \textit{C.
-mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
-at first and then continuously increase the amplitude of their song over time.
-This slow "ramping" amplitude modulation makes the overall song less periodic
-despite its temporal regularity. The "ramping" appears more pronounced in
-$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
-compression and adaptation during the preprocessing stage might be at least
-partially beneficial for mitigating the effect of this amplitude modulation on
-later representations. However, the adaptation of $\adapt(t)$ can only act on
-certain time scales --- depending on the cutoff frequency of the underlying
-highpass filter --- and is hence not able to compensate for "ramping" across
-the entire duration of a song.
+From a computational perspective, the answer could be that logarithmic
+compression and adaptation is a necessary preprocessing step towards robust
+$f_i(t)$ because it works towards a more consistent distribution $\pci$ of
+$c_i(t)$. If $\pci$ is consistent between different songs of the same species,
+a static threshold value $\thr$ is sufficient to generate a consistent
+species-specific feature representation. If $\pci$ is consistent over the
+course of a song, $f_i(t)$ is constant throughout the song, which extends the
+time window for reliable recognition~(Section\,\ref{sec:constant_feat}).

-From a purely functional perspective, the answer could be that logarithmic
-compression and adaptation is a necessary preprocessing step towards a robust
-feature representation, even if thresholding and temporal averaging alone would
-be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
-improves the temporal regularity of the song pattern in $\adapt(t)$ and
-$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
-song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
-the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
-is essential for the generation of consistent species-specific $f_i(t)$ under a
-static $\thr$. From a physiological perspective, the answer is likely that
+
+First, the preprocessing results in a more consistent
+distribution $\pci$ of $c_i(t)$ between songs of different intensity and in
+turn allows for the generation of consistent $f_i(t)$ under a static threshold
+value $\thr$. Second, this preprocessing improves the temporal regularity of
+the song pattern by mitigating the slow "ramping" amplitude modulation that is
+common to many grasshopper songs.
+
+This preprocessing likely improves the temporal regularity of the song pattern
+in $\adapt(t)$ and $c_i(t)$, which is required for constant $f_i(t)$ across the
+duration of a song~(Section\,\ref{sec:constant_feat}).
+
+From a physiological perspective, the answer is likely that
 neurons possess only a limited firing rate for encoding stimulus intensities
 that can range over several orders of magnitude. Sigmoidal tuning curves over
 logarithmically compressed stimulus intensities are a common property of