Progress with cleaning up "IntInv vs SNR".

2026-06-15 18:20:33 +02:00
parent 67690b97f7
commit 21e6ab4d64
2 changed files with 83 additions and 64 deletions
--- a/main.pdf
+++ b/main.pdf
--- a/main.tex
+++ b/main.tex
@@ -1595,7 +1595,7 @@ does not change substantially within $\tstat$.
 % Constraints on the song structure:
 % Also: Constant model features vs. actual grasshopper (calling) songs:
-% (Also: Third revision and this section still doesn't sound good)
+% (Also: Third revision and still far from done and good)
 Grasshoppers sing by pulling the stridulatory file on the hindlegs across a
 resonating vein on the forewings~(\bcite{helversen1977stridulatory};
 \bcite{stumpner1994song}; \bcite{helversen1997recognition}). Different
@@ -1660,6 +1660,7 @@ as soon as $f_i(t)$ is within tolerance or wait for $f_i(t)$ to stabilize for
 additional certainty.
 \subsection{Invariant processing in the grasshopper auditory system}
 \label{sec:general_inv}
 % Invariance in the general (systemic) sense:
 The notion of invariance is fundamental for sensory processing systems.
@@ -1710,45 +1711,64 @@ time scale-selectivity is reflected by the cutoff frequency $\fc$ of the
 highpass filter that underlies the adaptation of $\adapt(t)$: Most $\fc$ except
 the lowest ones are effective in removing the local offset of $\db(t)$ and
 render $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$
-preserve the relevant amplitude dynamics of the song pattern. Intensity
+preserve the relevant amplitude dynamics of the song pattern. The time scale of
-invariance by thresholding and temporal averaging also has a relevant time
+intensity invariance by thresholding and temporal averaging is determined by
-scale, which is determined by the averaging interval $\tlp$. However, this time
+the averaging interval $\tlp$. However, unlike $\fc$, $\tlp$ is not constrained
-scale is not constrained by the need to preserve the temporal structure of the
+by the need to preserve the song pattern but rather to provide a suitable
-song pattern but to provide a suitable degree of temporal integration across
+degree of temporal integration~(Section\,\ref{sec:constant_feat}).
 the song pattern~(Section\,\ref{sec:constant_feat}).
-\subsection{Intensity invariance versus SNR}
+\subsection{Intensity invariance versus SNR along the model pathway}
-Each processing step along the model pathway is a transformation between input
+% % Establishing the principle trade-off (should maybe come later?):
-representation and output representation. The intensity of the input is
+% The output of a transformation is considered to be intensity-invariant if its
-characterized by scale $\sca$. The intensity of the output is characterized by
+% intensity measure saturates for sufficiently large scales $\sca$, which in turn
-an appropriate intensity measure. If the transformation renders the output more
+% caps the output SNR to a constant value across these $\sca$. Otherwise, the
-intensity-invariant, then the intensity measure will saturate for sufficiently
+% output SNR will increase monotonically with $\sca$. The trade-off between
-large $\sca$, which caps the output SNR to a constant value across these
+% intensity invariance and SNR refers to the principle that a transformation can
-$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
+% either improve intensity invariance or maintain SNR --- it cannot do both at
-monotonically with $\sca$. The trade-off between intensity invariance and SNR
+% the same time. This principle is most likely not specific to the two mechanisms
-refers to the principle that a transformation can either improve intensity
+% along the model pathway but rather a general property of transformations that
-invariance or maintain SNR --- it cannot do both at the same time. This
+% equalize between different input intensities.
 principle is presumably not specific to the two mechanisms along the model
 pathway but rather a general property of transformations that equalize between
 different input intensities.
-Logarithmic compression and adaptation by highpass filtering is capable of
+% Building a sufficient SNR "buffer":
-equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
+A stridulating grasshopper generates a song with a specific initial intensity,
-output $\adapt(t)$ is a perfectly intensity-invariant representation of song
+which is steadily attenuated as the song propagates through the
-component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
+environment~(\bcite{michelsen1978sound}). A listening grasshopper receives a
-limits the effectiveness of this mechanism to sufficiently large $\sca$. This
+sound signal $\raw(t)$, which is a mixture of the song component $\soc(t)$ with
-means that intensity invariance and SNR interact at the input level, as well.
+scale $\sca$ and the environmental noise component $\noc(t)$. The greater the
-Specifically, the saturation point of $\adapt(t)$ is determined by the input
+distance between sender and receiver, the smaller $\sca$ and hence the lower
-SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
+the SNR of $\raw(t)$ at the position of the receiver. The tympanal bandpass
-$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
+filtering of $\raw(t)$ into $\filt(t)$ likely improves the SNR by attenuating
-$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
+frequencies outside the relevant range of grasshopper songs. The SNR is further
-frequencies outside the relevant range of grasshopper songs. The SNR is then
+improved by the rectification and lowpass filtering of $\filt(t)$ into
-further improved by the rectification and lowpass filtering of $\filt(t)$ into
+$\env(t)$. The lower the cutoff frequency $\fc$ of the lowpass filter, the
-$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
+higher the SNR of $\env(t)$ at a given $\sca$, although $\fc$ must also be
-lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
+sufficiently high to preserve the amplitude dynamics of the song pattern.
-$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
+Overall, the first processing steps along the pathway are not designed to
-amplitude dynamics of the song pattern. The saturation level of $\adapt$,
+achieve intensity invariance but rather to improve the SNR of the song
 representation beyond the initial SNR of $\raw(t)$.
 The first mechanism of intensity invariance consists of logarithmic compression
 and adaptation of $\env(t)$ into $\adapt(t)$. In the absence of $\noc(t)$,
 $\adapt(t)$ is a perfectly intensity-invariant representation of $\soc(t)$. In
 the presence of $\noc(t)$, $\adapt(t)$ is intensity-invariant only for a
 sufficiently high SNR of $\env(t)$. The preceeding SNR improvements from
 $\raw(t)$ to $\env(t)$ thus serve to improve the intensity invariance of
 $\adapt(t)$ by shifting the saturation point towards lower $\sca$. However,
 this effect is limited --- if the SNR of $\raw(t)$ at the receiver's position
 does not allow for a sufficiently high SNR of $\env(t)$, $\adapt(t)$ will not
 be intensity-invariant. The initial song intensity that the sender can achieve
 therefore determines the distance at which $\adapt(t)$ is intensity-invariant
 to the receiver.
 Assuming that intensity invariance of $\adapt(t)$ is required for reliable song
 recognition, 
 This might be a reason why robustness to noise masking is an
 attractive property of male calling songs~(\bcite{einhaupl2011attractiveness}).
 The saturation level of $\adapt$,
 unlike its saturation point, is independent of the SNR of $\env(t)$ because the
 influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
 SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
@@ -1798,35 +1818,34 @@ the saturation level of $f_i(t)$ will be determined by the second mechanism.
 The saturation points of $f_i(t)$ across the set are distributed over a much
 wider range than those of the preceeding kernel responses $c_i(t)$, which
 suggests that the interaction between the two mechanisms is specific to
-individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
+individual kernels. A number of $f_i(t)$ achieve a lower saturation point than
-point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
+the respective $c_i(t)$, whereas some $f_i(t)$ exhibit similar or only
-marginally lower saturation points. This raises the question whether two
+marginally lower saturation points. In these cases, the question arises to what
-consecutive mechanisms of intensity invariance are actually beneficial for the
+extent two consecutive mechanisms of intensity invariance are actually
-overall system.
+beneficial for the overall system.
-Various grasshopper species, especially those with longer songs like \textit{C.
+From a computational perspective, the answer could be that logarithmic
-mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
+compression and adaptation is a necessary preprocessing step towards robust
-at first and then continuously increase the amplitude of their song over time.
+$f_i(t)$ because it works towards a more consistent distribution $\pci$ of
-This slow "ramping" amplitude modulation makes the overall song less periodic
+$c_i(t)$. If $\pci$ is consistent between different songs of the same species,
-despite its temporal regularity. The "ramping" appears more pronounced in
+a static threshold value $\thr$ is sufficient to generate a consistent
-$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
+species-specific feature representation. If $\pci$ is consistent over the
-compression and adaptation during the preprocessing stage might be at least
+course of a song, $f_i(t)$ is constant throughout the song, which extends the
-partially beneficial for mitigating the effect of this amplitude modulation on
+time window for reliable recognition~(Section\,\ref{sec:constant_feat}).
 later representations. However, the adaptation of $\adapt(t)$ can only act on
 certain time scales --- depending on the cutoff frequency of the underlying
 highpass filter --- and is hence not able to compensate for "ramping" across
 the entire duration of a song.
-From a purely functional perspective, the answer could be that logarithmic
+
-compression and adaptation is a necessary preprocessing step towards a robust
+First, the preprocessing results in a more consistent
-feature representation, even if thresholding and temporal averaging alone would
+distribution $\pci$ of $c_i(t)$ between songs of different intensity and in
-be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
+turn allows for the generation of consistent $f_i(t)$ under a static threshold
-improves the temporal regularity of the song pattern in $\adapt(t)$ and
+value $\thr$. Second, this preprocessing improves the temporal regularity of
-$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
+the song pattern by mitigating the slow "ramping" amplitude modulation that is
-song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
+common to many grasshopper songs.
-the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
+
-is essential for the generation of consistent species-specific $f_i(t)$ under a
+This preprocessing likely improves the temporal regularity of the song pattern
-static $\thr$. From a physiological perspective, the answer is likely that
+in $\adapt(t)$ and $c_i(t)$, which is required for constant $f_i(t)$ across the
 duration of a song~(Section\,\ref{sec:constant_feat}).
 From a physiological perspective, the answer is likely that
 neurons possess only a limited firing rate for encoding stimulus intensities
 that can range over several orders of magnitude. Sigmoidal tuning curves over
 logarithmically compressed stimulus intensities are a common property of