Progress with cleaning up "IntInv vs SNR".
This commit is contained in:
147
main.tex
147
main.tex
@@ -1595,7 +1595,7 @@ does not change substantially within $\tstat$.
|
||||
|
||||
% Constraints on the song structure:
|
||||
% Also: Constant model features vs. actual grasshopper (calling) songs:
|
||||
% (Also: Third revision and this section still doesn't sound good)
|
||||
% (Also: Third revision and still far from done and good)
|
||||
Grasshoppers sing by pulling the stridulatory file on the hindlegs across a
|
||||
resonating vein on the forewings~(\bcite{helversen1977stridulatory};
|
||||
\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Different
|
||||
@@ -1660,6 +1660,7 @@ as soon as $f_i(t)$ is within tolerance or wait for $f_i(t)$ to stabilize for
|
||||
additional certainty.
|
||||
|
||||
\subsection{Invariant processing in the grasshopper auditory system}
|
||||
\label{sec:general_inv}
|
||||
|
||||
% Invariance in the general (systemic) sense:
|
||||
The notion of invariance is fundamental for sensory processing systems.
|
||||
@@ -1710,45 +1711,64 @@ time scale-selectivity is reflected by the cutoff frequency $\fc$ of the
|
||||
highpass filter that underlies the adaptation of $\adapt(t)$: Most $\fc$ except
|
||||
the lowest ones are effective in removing the local offset of $\db(t)$ and
|
||||
render $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$
|
||||
preserve the relevant amplitude dynamics of the song pattern. Intensity
|
||||
invariance by thresholding and temporal averaging also has a relevant time
|
||||
scale, which is determined by the averaging interval $\tlp$. However, this time
|
||||
scale is not constrained by the need to preserve the temporal structure of the
|
||||
song pattern but to provide a suitable degree of temporal integration across
|
||||
the song pattern~(Section\,\ref{sec:constant_feat}).
|
||||
preserve the relevant amplitude dynamics of the song pattern. The time scale of
|
||||
intensity invariance by thresholding and temporal averaging is determined by
|
||||
the averaging interval $\tlp$. However, unlike $\fc$, $\tlp$ is not constrained
|
||||
by the need to preserve the song pattern but rather to provide a suitable
|
||||
degree of temporal integration~(Section\,\ref{sec:constant_feat}).
|
||||
|
||||
\subsection{Intensity invariance versus SNR}
|
||||
\subsection{Intensity invariance versus SNR along the model pathway}
|
||||
|
||||
Each processing step along the model pathway is a transformation between input
|
||||
representation and output representation. The intensity of the input is
|
||||
characterized by scale $\sca$. The intensity of the output is characterized by
|
||||
an appropriate intensity measure. If the transformation renders the output more
|
||||
intensity-invariant, then the intensity measure will saturate for sufficiently
|
||||
large $\sca$, which caps the output SNR to a constant value across these
|
||||
$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
|
||||
monotonically with $\sca$. The trade-off between intensity invariance and SNR
|
||||
refers to the principle that a transformation can either improve intensity
|
||||
invariance or maintain SNR --- it cannot do both at the same time. This
|
||||
principle is presumably not specific to the two mechanisms along the model
|
||||
pathway but rather a general property of transformations that equalize between
|
||||
different input intensities.
|
||||
% % Establishing the principle trade-off (should maybe come later?):
|
||||
% The output of a transformation is considered to be intensity-invariant if its
|
||||
% intensity measure saturates for sufficiently large scales $\sca$, which in turn
|
||||
% caps the output SNR to a constant value across these $\sca$. Otherwise, the
|
||||
% output SNR will increase monotonically with $\sca$. The trade-off between
|
||||
% intensity invariance and SNR refers to the principle that a transformation can
|
||||
% either improve intensity invariance or maintain SNR --- it cannot do both at
|
||||
% the same time. This principle is most likely not specific to the two mechanisms
|
||||
% along the model pathway but rather a general property of transformations that
|
||||
% equalize between different input intensities.
|
||||
|
||||
Logarithmic compression and adaptation by highpass filtering is capable of
|
||||
equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
|
||||
output $\adapt(t)$ is a perfectly intensity-invariant representation of song
|
||||
component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
|
||||
limits the effectiveness of this mechanism to sufficiently large $\sca$. This
|
||||
means that intensity invariance and SNR interact at the input level, as well.
|
||||
Specifically, the saturation point of $\adapt(t)$ is determined by the input
|
||||
SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
|
||||
$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
|
||||
$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
|
||||
frequencies outside the relevant range of grasshopper songs. The SNR is then
|
||||
further improved by the rectification and lowpass filtering of $\filt(t)$ into
|
||||
$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
|
||||
lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
|
||||
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
|
||||
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
|
||||
% Building a sufficient SNR "buffer":
|
||||
A stridulating grasshopper generates a song with a specific initial intensity,
|
||||
which is steadily attenuated as the song propagates through the
|
||||
environment~(\bcite{michelsen1978sound}). A listening grasshopper receives a
|
||||
sound signal $\raw(t)$, which is a mixture of the song component $\soc(t)$ with
|
||||
scale $\sca$ and the environmental noise component $\noc(t)$. The greater the
|
||||
distance between sender and receiver, the smaller $\sca$ and hence the lower
|
||||
the SNR of $\raw(t)$ at the position of the receiver. The tympanal bandpass
|
||||
filtering of $\raw(t)$ into $\filt(t)$ likely improves the SNR by attenuating
|
||||
frequencies outside the relevant range of grasshopper songs. The SNR is further
|
||||
improved by the rectification and lowpass filtering of $\filt(t)$ into
|
||||
$\env(t)$. The lower the cutoff frequency $\fc$ of the lowpass filter, the
|
||||
higher the SNR of $\env(t)$ at a given $\sca$, although $\fc$ must also be
|
||||
sufficiently high to preserve the amplitude dynamics of the song pattern.
|
||||
Overall, the first processing steps along the pathway are not designed to
|
||||
achieve intensity invariance but rather to improve the SNR of the song
|
||||
representation beyond the initial SNR of $\raw(t)$.
|
||||
|
||||
The first mechanism of intensity invariance consists of logarithmic compression
|
||||
and adaptation of $\env(t)$ into $\adapt(t)$. In the absence of $\noc(t)$,
|
||||
$\adapt(t)$ is a perfectly intensity-invariant representation of $\soc(t)$. In
|
||||
the presence of $\noc(t)$, $\adapt(t)$ is intensity-invariant only for a
|
||||
sufficiently high SNR of $\env(t)$. The preceeding SNR improvements from
|
||||
$\raw(t)$ to $\env(t)$ thus serve to improve the intensity invariance of
|
||||
$\adapt(t)$ by shifting the saturation point towards lower $\sca$. However,
|
||||
this effect is limited --- if the SNR of $\raw(t)$ at the receiver's position
|
||||
does not allow for a sufficiently high SNR of $\env(t)$, $\adapt(t)$ will not
|
||||
be intensity-invariant. The initial song intensity that the sender can achieve
|
||||
therefore determines the distance at which $\adapt(t)$ is intensity-invariant
|
||||
to the receiver.
|
||||
|
||||
Assuming that intensity invariance of $\adapt(t)$ is required for reliable song
|
||||
recognition,
|
||||
|
||||
|
||||
This might be a reason why robustness to noise masking is an
|
||||
attractive property of male calling songs~(\bcite{einhaupl2011attractiveness}).
|
||||
|
||||
The saturation level of $\adapt$,
|
||||
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
|
||||
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
|
||||
SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
|
||||
@@ -1798,35 +1818,34 @@ the saturation level of $f_i(t)$ will be determined by the second mechanism.
|
||||
The saturation points of $f_i(t)$ across the set are distributed over a much
|
||||
wider range than those of the preceeding kernel responses $c_i(t)$, which
|
||||
suggests that the interaction between the two mechanisms is specific to
|
||||
individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
|
||||
point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
|
||||
marginally lower saturation points. This raises the question whether two
|
||||
consecutive mechanisms of intensity invariance are actually beneficial for the
|
||||
overall system.
|
||||
individual kernels. A number of $f_i(t)$ achieve a lower saturation point than
|
||||
the respective $c_i(t)$, whereas some $f_i(t)$ exhibit similar or only
|
||||
marginally lower saturation points. In these cases, the question arises to what
|
||||
extent two consecutive mechanisms of intensity invariance are actually
|
||||
beneficial for the overall system.
|
||||
|
||||
Various grasshopper species, especially those with longer songs like \textit{C.
|
||||
mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
|
||||
at first and then continuously increase the amplitude of their song over time.
|
||||
This slow "ramping" amplitude modulation makes the overall song less periodic
|
||||
despite its temporal regularity. The "ramping" appears more pronounced in
|
||||
$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
|
||||
compression and adaptation during the preprocessing stage might be at least
|
||||
partially beneficial for mitigating the effect of this amplitude modulation on
|
||||
later representations. However, the adaptation of $\adapt(t)$ can only act on
|
||||
certain time scales --- depending on the cutoff frequency of the underlying
|
||||
highpass filter --- and is hence not able to compensate for "ramping" across
|
||||
the entire duration of a song.
|
||||
From a computational perspective, the answer could be that logarithmic
|
||||
compression and adaptation is a necessary preprocessing step towards robust
|
||||
$f_i(t)$ because it works towards a more consistent distribution $\pci$ of
|
||||
$c_i(t)$. If $\pci$ is consistent between different songs of the same species,
|
||||
a static threshold value $\thr$ is sufficient to generate a consistent
|
||||
species-specific feature representation. If $\pci$ is consistent over the
|
||||
course of a song, $f_i(t)$ is constant throughout the song, which extends the
|
||||
time window for reliable recognition~(Section\,\ref{sec:constant_feat}).
|
||||
|
||||
From a purely functional perspective, the answer could be that logarithmic
|
||||
compression and adaptation is a necessary preprocessing step towards a robust
|
||||
feature representation, even if thresholding and temporal averaging alone would
|
||||
be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
|
||||
improves the temporal regularity of the song pattern in $\adapt(t)$ and
|
||||
$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
|
||||
song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
|
||||
the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
|
||||
is essential for the generation of consistent species-specific $f_i(t)$ under a
|
||||
static $\thr$. From a physiological perspective, the answer is likely that
|
||||
|
||||
First, the preprocessing results in a more consistent
|
||||
distribution $\pci$ of $c_i(t)$ between songs of different intensity and in
|
||||
turn allows for the generation of consistent $f_i(t)$ under a static threshold
|
||||
value $\thr$. Second, this preprocessing improves the temporal regularity of
|
||||
the song pattern by mitigating the slow "ramping" amplitude modulation that is
|
||||
common to many grasshopper songs.
|
||||
|
||||
This preprocessing likely improves the temporal regularity of the song pattern
|
||||
in $\adapt(t)$ and $c_i(t)$, which is required for constant $f_i(t)$ across the
|
||||
duration of a song~(Section\,\ref{sec:constant_feat}).
|
||||
|
||||
From a physiological perspective, the answer is likely that
|
||||
neurons possess only a limited firing rate for encoding stimulus intensities
|
||||
that can range over several orders of magnitude. Sigmoidal tuning curves over
|
||||
logarithmically compressed stimulus intensities are a common property of
|
||||
|
||||
Reference in New Issue
Block a user