Progress with cleaning up "IntInv vs SNR".
This commit is contained in:
147
main.tex
147
main.tex
@@ -1595,7 +1595,7 @@ does not change substantially within $\tstat$.
|
|||||||
|
|
||||||
% Constraints on the song structure:
|
% Constraints on the song structure:
|
||||||
% Also: Constant model features vs. actual grasshopper (calling) songs:
|
% Also: Constant model features vs. actual grasshopper (calling) songs:
|
||||||
% (Also: Third revision and this section still doesn't sound good)
|
% (Also: Third revision and still far from done and good)
|
||||||
Grasshoppers sing by pulling the stridulatory file on the hindlegs across a
|
Grasshoppers sing by pulling the stridulatory file on the hindlegs across a
|
||||||
resonating vein on the forewings~(\bcite{helversen1977stridulatory};
|
resonating vein on the forewings~(\bcite{helversen1977stridulatory};
|
||||||
\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Different
|
\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Different
|
||||||
@@ -1660,6 +1660,7 @@ as soon as $f_i(t)$ is within tolerance or wait for $f_i(t)$ to stabilize for
|
|||||||
additional certainty.
|
additional certainty.
|
||||||
|
|
||||||
\subsection{Invariant processing in the grasshopper auditory system}
|
\subsection{Invariant processing in the grasshopper auditory system}
|
||||||
|
\label{sec:general_inv}
|
||||||
|
|
||||||
% Invariance in the general (systemic) sense:
|
% Invariance in the general (systemic) sense:
|
||||||
The notion of invariance is fundamental for sensory processing systems.
|
The notion of invariance is fundamental for sensory processing systems.
|
||||||
@@ -1710,45 +1711,64 @@ time scale-selectivity is reflected by the cutoff frequency $\fc$ of the
|
|||||||
highpass filter that underlies the adaptation of $\adapt(t)$: Most $\fc$ except
|
highpass filter that underlies the adaptation of $\adapt(t)$: Most $\fc$ except
|
||||||
the lowest ones are effective in removing the local offset of $\db(t)$ and
|
the lowest ones are effective in removing the local offset of $\db(t)$ and
|
||||||
render $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$
|
render $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$
|
||||||
preserve the relevant amplitude dynamics of the song pattern. Intensity
|
preserve the relevant amplitude dynamics of the song pattern. The time scale of
|
||||||
invariance by thresholding and temporal averaging also has a relevant time
|
intensity invariance by thresholding and temporal averaging is determined by
|
||||||
scale, which is determined by the averaging interval $\tlp$. However, this time
|
the averaging interval $\tlp$. However, unlike $\fc$, $\tlp$ is not constrained
|
||||||
scale is not constrained by the need to preserve the temporal structure of the
|
by the need to preserve the song pattern but rather to provide a suitable
|
||||||
song pattern but to provide a suitable degree of temporal integration across
|
degree of temporal integration~(Section\,\ref{sec:constant_feat}).
|
||||||
the song pattern~(Section\,\ref{sec:constant_feat}).
|
|
||||||
|
|
||||||
\subsection{Intensity invariance versus SNR}
|
\subsection{Intensity invariance versus SNR along the model pathway}
|
||||||
|
|
||||||
Each processing step along the model pathway is a transformation between input
|
% % Establishing the principle trade-off (should maybe come later?):
|
||||||
representation and output representation. The intensity of the input is
|
% The output of a transformation is considered to be intensity-invariant if its
|
||||||
characterized by scale $\sca$. The intensity of the output is characterized by
|
% intensity measure saturates for sufficiently large scales $\sca$, which in turn
|
||||||
an appropriate intensity measure. If the transformation renders the output more
|
% caps the output SNR to a constant value across these $\sca$. Otherwise, the
|
||||||
intensity-invariant, then the intensity measure will saturate for sufficiently
|
% output SNR will increase monotonically with $\sca$. The trade-off between
|
||||||
large $\sca$, which caps the output SNR to a constant value across these
|
% intensity invariance and SNR refers to the principle that a transformation can
|
||||||
$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
|
% either improve intensity invariance or maintain SNR --- it cannot do both at
|
||||||
monotonically with $\sca$. The trade-off between intensity invariance and SNR
|
% the same time. This principle is most likely not specific to the two mechanisms
|
||||||
refers to the principle that a transformation can either improve intensity
|
% along the model pathway but rather a general property of transformations that
|
||||||
invariance or maintain SNR --- it cannot do both at the same time. This
|
% equalize between different input intensities.
|
||||||
principle is presumably not specific to the two mechanisms along the model
|
|
||||||
pathway but rather a general property of transformations that equalize between
|
|
||||||
different input intensities.
|
|
||||||
|
|
||||||
Logarithmic compression and adaptation by highpass filtering is capable of
|
% Building a sufficient SNR "buffer":
|
||||||
equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
|
A stridulating grasshopper generates a song with a specific initial intensity,
|
||||||
output $\adapt(t)$ is a perfectly intensity-invariant representation of song
|
which is steadily attenuated as the song propagates through the
|
||||||
component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
|
environment~(\bcite{michelsen1978sound}). A listening grasshopper receives a
|
||||||
limits the effectiveness of this mechanism to sufficiently large $\sca$. This
|
sound signal $\raw(t)$, which is a mixture of the song component $\soc(t)$ with
|
||||||
means that intensity invariance and SNR interact at the input level, as well.
|
scale $\sca$ and the environmental noise component $\noc(t)$. The greater the
|
||||||
Specifically, the saturation point of $\adapt(t)$ is determined by the input
|
distance between sender and receiver, the smaller $\sca$ and hence the lower
|
||||||
SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
|
the SNR of $\raw(t)$ at the position of the receiver. The tympanal bandpass
|
||||||
$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
|
filtering of $\raw(t)$ into $\filt(t)$ likely improves the SNR by attenuating
|
||||||
$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
|
frequencies outside the relevant range of grasshopper songs. The SNR is further
|
||||||
frequencies outside the relevant range of grasshopper songs. The SNR is then
|
improved by the rectification and lowpass filtering of $\filt(t)$ into
|
||||||
further improved by the rectification and lowpass filtering of $\filt(t)$ into
|
$\env(t)$. The lower the cutoff frequency $\fc$ of the lowpass filter, the
|
||||||
$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
|
higher the SNR of $\env(t)$ at a given $\sca$, although $\fc$ must also be
|
||||||
lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
|
sufficiently high to preserve the amplitude dynamics of the song pattern.
|
||||||
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
|
Overall, the first processing steps along the pathway are not designed to
|
||||||
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
|
achieve intensity invariance but rather to improve the SNR of the song
|
||||||
|
representation beyond the initial SNR of $\raw(t)$.
|
||||||
|
|
||||||
|
The first mechanism of intensity invariance consists of logarithmic compression
|
||||||
|
and adaptation of $\env(t)$ into $\adapt(t)$. In the absence of $\noc(t)$,
|
||||||
|
$\adapt(t)$ is a perfectly intensity-invariant representation of $\soc(t)$. In
|
||||||
|
the presence of $\noc(t)$, $\adapt(t)$ is intensity-invariant only for a
|
||||||
|
sufficiently high SNR of $\env(t)$. The preceeding SNR improvements from
|
||||||
|
$\raw(t)$ to $\env(t)$ thus serve to improve the intensity invariance of
|
||||||
|
$\adapt(t)$ by shifting the saturation point towards lower $\sca$. However,
|
||||||
|
this effect is limited --- if the SNR of $\raw(t)$ at the receiver's position
|
||||||
|
does not allow for a sufficiently high SNR of $\env(t)$, $\adapt(t)$ will not
|
||||||
|
be intensity-invariant. The initial song intensity that the sender can achieve
|
||||||
|
therefore determines the distance at which $\adapt(t)$ is intensity-invariant
|
||||||
|
to the receiver.
|
||||||
|
|
||||||
|
Assuming that intensity invariance of $\adapt(t)$ is required for reliable song
|
||||||
|
recognition,
|
||||||
|
|
||||||
|
|
||||||
|
This might be a reason why robustness to noise masking is an
|
||||||
|
attractive property of male calling songs~(\bcite{einhaupl2011attractiveness}).
|
||||||
|
|
||||||
|
The saturation level of $\adapt$,
|
||||||
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
|
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
|
||||||
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
|
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
|
||||||
SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
|
SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
|
||||||
@@ -1798,35 +1818,34 @@ the saturation level of $f_i(t)$ will be determined by the second mechanism.
|
|||||||
The saturation points of $f_i(t)$ across the set are distributed over a much
|
The saturation points of $f_i(t)$ across the set are distributed over a much
|
||||||
wider range than those of the preceeding kernel responses $c_i(t)$, which
|
wider range than those of the preceeding kernel responses $c_i(t)$, which
|
||||||
suggests that the interaction between the two mechanisms is specific to
|
suggests that the interaction between the two mechanisms is specific to
|
||||||
individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
|
individual kernels. A number of $f_i(t)$ achieve a lower saturation point than
|
||||||
point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
|
the respective $c_i(t)$, whereas some $f_i(t)$ exhibit similar or only
|
||||||
marginally lower saturation points. This raises the question whether two
|
marginally lower saturation points. In these cases, the question arises to what
|
||||||
consecutive mechanisms of intensity invariance are actually beneficial for the
|
extent two consecutive mechanisms of intensity invariance are actually
|
||||||
overall system.
|
beneficial for the overall system.
|
||||||
|
|
||||||
Various grasshopper species, especially those with longer songs like \textit{C.
|
From a computational perspective, the answer could be that logarithmic
|
||||||
mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
|
compression and adaptation is a necessary preprocessing step towards robust
|
||||||
at first and then continuously increase the amplitude of their song over time.
|
$f_i(t)$ because it works towards a more consistent distribution $\pci$ of
|
||||||
This slow "ramping" amplitude modulation makes the overall song less periodic
|
$c_i(t)$. If $\pci$ is consistent between different songs of the same species,
|
||||||
despite its temporal regularity. The "ramping" appears more pronounced in
|
a static threshold value $\thr$ is sufficient to generate a consistent
|
||||||
$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
|
species-specific feature representation. If $\pci$ is consistent over the
|
||||||
compression and adaptation during the preprocessing stage might be at least
|
course of a song, $f_i(t)$ is constant throughout the song, which extends the
|
||||||
partially beneficial for mitigating the effect of this amplitude modulation on
|
time window for reliable recognition~(Section\,\ref{sec:constant_feat}).
|
||||||
later representations. However, the adaptation of $\adapt(t)$ can only act on
|
|
||||||
certain time scales --- depending on the cutoff frequency of the underlying
|
|
||||||
highpass filter --- and is hence not able to compensate for "ramping" across
|
|
||||||
the entire duration of a song.
|
|
||||||
|
|
||||||
From a purely functional perspective, the answer could be that logarithmic
|
|
||||||
compression and adaptation is a necessary preprocessing step towards a robust
|
First, the preprocessing results in a more consistent
|
||||||
feature representation, even if thresholding and temporal averaging alone would
|
distribution $\pci$ of $c_i(t)$ between songs of different intensity and in
|
||||||
be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
|
turn allows for the generation of consistent $f_i(t)$ under a static threshold
|
||||||
improves the temporal regularity of the song pattern in $\adapt(t)$ and
|
value $\thr$. Second, this preprocessing improves the temporal regularity of
|
||||||
$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
|
the song pattern by mitigating the slow "ramping" amplitude modulation that is
|
||||||
song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
|
common to many grasshopper songs.
|
||||||
the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
|
|
||||||
is essential for the generation of consistent species-specific $f_i(t)$ under a
|
This preprocessing likely improves the temporal regularity of the song pattern
|
||||||
static $\thr$. From a physiological perspective, the answer is likely that
|
in $\adapt(t)$ and $c_i(t)$, which is required for constant $f_i(t)$ across the
|
||||||
|
duration of a song~(Section\,\ref{sec:constant_feat}).
|
||||||
|
|
||||||
|
From a physiological perspective, the answer is likely that
|
||||||
neurons possess only a limited firing rate for encoding stimulus intensities
|
neurons possess only a limited firing rate for encoding stimulus intensities
|
||||||
that can range over several orders of magnitude. Sigmoidal tuning curves over
|
that can range over several orders of magnitude. Sigmoidal tuning curves over
|
||||||
logarithmically compressed stimulus intensities are a common property of
|
logarithmically compressed stimulus intensities are a common property of
|
||||||
|
|||||||
Reference in New Issue
Block a user