Writing discussion.
This commit is contained in:
431
main.tex
431
main.tex
@@ -104,6 +104,7 @@
|
||||
\newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation
|
||||
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
|
||||
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
|
||||
\newcommand{\pci}{p(c_i,\,\tlp)} % Kernel-specific probability density (lowpass interval)
|
||||
\newcommand{\muf}{\mu_{f_i}} % Average feature value
|
||||
|
||||
\section{Introduction}
|
||||
@@ -258,12 +259,13 @@ initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
|
||||
\bcite{bhavsar2017brain}).
|
||||
|
||||
Functionally, the ascending neurons are the most diverse of the three neuronal
|
||||
populations. Individual ascending neurons possess highly specific response
|
||||
properties that contrast with the rather homogeneous response properties of the
|
||||
preceding receptor neurons and local
|
||||
interneurons~(\bcite{clemens2011efficient}), which indicates a transition from
|
||||
a uniform population-wide processing stream into several parallel branches.
|
||||
Accordingly, the model pathway is divided into two distinct
|
||||
populations. Around 15 to 20 ascending neurons have been identified in the
|
||||
grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
|
||||
ascending neurons possess highly specific response properties that contrast
|
||||
with the rather homogeneous response properties of the preceding receptor
|
||||
neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
|
||||
a transition from a uniform population-wide processing stream into several
|
||||
parallel branches. Accordingly, the model pathway is divided into two distinct
|
||||
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
|
||||
processing steps at the levels of the tympanal membrane, the receptor neurons,
|
||||
and the local interneurons; and operates on one-dimensional signal
|
||||
@@ -754,16 +756,15 @@ This effect is more pronounced for lower $\fc$ of the lowpass filter and is
|
||||
presumably caused by the attenuation of high-frequency components in the
|
||||
signal, which are more prominent in the noise component $\noc(t)$ than in the
|
||||
song component $\soc(t)$. The effect also appears relatively consistent across
|
||||
different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e)
|
||||
that are presumably based on different song structures and frequency spectra.
|
||||
In summary, the standard deviation of $\env(t)$ has never been observed to
|
||||
transition into a saturation regime for larger $\sca$ but rather continues to
|
||||
increase proportionally to $\sca$ for all tested $\fc$, in both the noiseless
|
||||
and the noisy case and across different species. Consequently, the combination
|
||||
of rectification and lowpass filtering does not contribute to intensity
|
||||
invariance. However, this transformation pair does improve the SNR of $\env(t)$
|
||||
relative to $\filt(t)$ and thus provides subsequent processing stages with a
|
||||
more robust input representation and higher input SNR.
|
||||
different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e
|
||||
and appendix Fig.\,\ref{fig:app_rect-lp}). In summary, the standard deviation
|
||||
of $\env(t)$ has never been observed to saturate for larger $\sca$ but rather
|
||||
continues to increase proportionally to $\sca$ for all tested $\fc$, in both
|
||||
the noiseless and the noisy case and across different species. Consequently,
|
||||
the combination of rectification and lowpass filtering does not contribute to
|
||||
intensity invariance. However, this transformation pair does improve the SNR of
|
||||
$\env(t)$ relative to $\filt(t)$ and thus provides subsequent processing stages
|
||||
with a more robust input representation and higher input SNR.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
@@ -883,24 +884,23 @@ $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the
|
||||
effective intensity invariance of $\adapt(t)$ through logarithmic compression
|
||||
and adaptation is limited by the SNR of $\env(t)$: Songs that have already
|
||||
sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
|
||||
subsequent processing steps, which emphasizes the importance of the SNR
|
||||
improvement by rectification and lowpass filtering during the previous
|
||||
processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise
|
||||
regime, transient regime, and saturation regime remains consistent across
|
||||
different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of
|
||||
$\sca$ at which the saturation regime is reached (see appendix
|
||||
Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$
|
||||
within the saturation regime vary considerably between and within species. For
|
||||
subsequent processing steps. The general pattern of noise regime, transient
|
||||
regime, and saturation regime remains consistent across different
|
||||
species~(Fig.\,\ref{fig:log-hp}e). However, the saturation point --- the $\sca$
|
||||
value at which the SNR of $\adapt(t)$ starts to saturate --- and the saturation
|
||||
level --- the constant SNR of $\adapt(t)$ within the saturation regime --- vary
|
||||
considerably between and within species~(appendix
|
||||
Figs.\,\ref{fig:app_log-hp_curves}+\ref{fig:app_log-hp_saturation}). For
|
||||
example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
|
||||
lower maximum SNR of $\adapt(t)$ compared to other species. These differences
|
||||
are not to be underestimated, since the SNR of $\adapt(t)$ within the
|
||||
saturation regime determines the maximum input SNR for subsequent processing
|
||||
steps. In other words, the fact that $\adapt(t)$ eventually reaches a
|
||||
saturation regime is, of course, desirable in the context of intensity
|
||||
invariance, but it also means to pass up on the higher SNR values that are
|
||||
achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
|
||||
Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
|
||||
is a recurring phenomenon that is further addressed in the following sections.
|
||||
lower saturation level compared to other species. These differences are not to
|
||||
be underestimated, since the saturation level of $\adapt(t)$ determines the
|
||||
maximum input SNR for subsequent processing steps. In other words, the fact
|
||||
that $\adapt(t)$ eventually reaches a saturation regime is, of course,
|
||||
desirable in the context of intensity invariance, but it also means to pass up
|
||||
on the higher SNR values that are achieved by $\env(t)$ for the same $\sca$ (up
|
||||
to several orders of magnitude, Fig.\,\ref{fig:log-hp}d). This trade-off
|
||||
between intensity invariance and SNR is a recurring phenomenon that is further
|
||||
addressed in the following sections.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
@@ -1000,24 +1000,17 @@ sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
|
||||
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
|
||||
saturation regime).
|
||||
|
||||
The value of $\mu_f$ in the saturation regime is independent of the precise
|
||||
value of $\Theta$, but the value of $\sca$ at which the saturation regime is
|
||||
reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
|
||||
a threshold value of $\Theta=0$ would be the optimal choice for achieving
|
||||
intensity invariance at the lowest possible $\sca$. In stark contrast, the
|
||||
closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise
|
||||
component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise
|
||||
regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column,
|
||||
and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
|
||||
"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
|
||||
pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song
|
||||
component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
|
||||
$\sca$ to reach the saturation regime. This trade-off between intensity
|
||||
invariance and SNR has already been observed during the previous analysis on
|
||||
logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
|
||||
parameters that determine the SNR of $\adapt(t)$ are much less understood and
|
||||
likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
|
||||
the choice of $\Theta$ and can be more directly manipulated by the system.
|
||||
The saturation level of $f(t)$ is independent of the precise value of $\Theta$,
|
||||
but the saturation point decreases with
|
||||
$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, a threshold value of
|
||||
$\Theta=0$ would be the optimal choice for achieving intensity invariance at
|
||||
the lowest possible $\sca$. In stark contrast, the closer $\Theta$ is to 0, the
|
||||
higher $\mu_f$ in response to the pure noise component $\noc(t)$ and the lower
|
||||
the resulting SNR of $f(t)$ between noise regime and saturation
|
||||
regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
|
||||
Fig.\,\ref{fig:thresh-lp_single}e). This trade-off between intensity invariance
|
||||
and SNR has already been observed during the previous analysis on logarithmic
|
||||
compression and adaptation~(Fig.\,\ref{fig:log-hp}d).
|
||||
|
||||
Finally, the effects of thresholding and temporal averaging must be seen in the
|
||||
context of the previous transformation pair of logarithmic compression and
|
||||
@@ -1102,11 +1095,11 @@ that the songs of each species are eventually represented by distinct points in
|
||||
feature space. However, the species-specific trajectories cross each other at
|
||||
numerous points, which means that the songs of two species --- each at a
|
||||
specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
|
||||
the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and
|
||||
the species: For \textit{C. mollis}, all $\muf$ saturate around the same
|
||||
$\sca$, while \textit{O. rufipes} exhibits considerable variation between the
|
||||
three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$,
|
||||
the stronger the curvature of the trajectory through feature space.
|
||||
the specific saturation point of $f_i(t)$ depends on the species: For
|
||||
\textit{C. mollis}, all $\muf$ saturate around the same $\sca$, while
|
||||
\textit{O. rufipes} exhibits considerable variation between the three $f_i(t)$.
|
||||
The larger the variation in saturation points between $f_i(t)$, the stronger
|
||||
the curvature of the trajectory through feature space.
|
||||
|
||||
In the noisy case, $\muf$ is non-zero even for the smallest
|
||||
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
|
||||
@@ -1121,9 +1114,9 @@ previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
|
||||
trajectories now move a much shorter distance through feature space for a
|
||||
similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
|
||||
and saturation regime, which increases the likelihood of trajectories crossing
|
||||
each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given
|
||||
species are slightly higher in the noisy case, but the variation between
|
||||
$f_i(t)$ remains largely unchanged.
|
||||
each other. Finally, the saturation points of $f_i(t)$ for a given species are
|
||||
slightly higher in the noisy case, but the variation between $f_i(t)$ remains
|
||||
largely unchanged.
|
||||
|
||||
In summary, even a comparably small set of three features $f_i(t)$ can, in
|
||||
principle, represent different species-specific songs at distinct points in
|
||||
@@ -1238,15 +1231,10 @@ broader and is not centered around the single saturation point based on the
|
||||
median but rather shifted towards lower $\sca$. Care must be taken when
|
||||
interpreting the height of either distribution due to the logarithmic scaling
|
||||
of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
|
||||
specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their
|
||||
the saturation points of specific $f_i(t)$ are indeed lower than those of their
|
||||
$c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
|
||||
averaging on intensity invariance is not necessarily nullified by the previous
|
||||
logarithmic compression and adaptation, which means that both mechanisms can,
|
||||
in principle, work together towards an intensity-invariant song representation.
|
||||
% Or does one simply overwrite the other? Can there even be a higher intensity
|
||||
% invariance based on the sum of both effects? Or does one simply kick in for
|
||||
% lower scales than the other and thus dictates the overall intensity
|
||||
% invariance? Whatever, discussion material.
|
||||
logarithmic compression and adaptation.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
@@ -1313,7 +1301,7 @@ representation goes hand in hand with a substantial degree of redundancy and is
|
||||
hardly expected to be present in the actual grasshopper auditory system. But
|
||||
the fact that the saturated $\muf$ are distributed symmetrically around 0.5
|
||||
provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
|
||||
saturation value in the absence of logarithmic
|
||||
saturation level in the absence of logarithmic
|
||||
compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
|
||||
the capping of $\adapt(t)$, as seen during previous
|
||||
analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
|
||||
@@ -1327,8 +1315,8 @@ that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
|
||||
results in a wider range of $\muf$ across the feature set, it should be
|
||||
benefitial for forming species-specific combinations. However, this depends on
|
||||
multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
|
||||
the structure and distribution of the specific song and is hence not
|
||||
guaranteed simply by disabling logarithmic compression.
|
||||
the structure and distribution of the specific song and is hence not guaranteed
|
||||
simply by disabling logarithmic compression.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
@@ -1560,25 +1548,241 @@ functional modelling. Other sensory systems that are either more complex or
|
||||
have not been subject to decades of study will likely not be suitable for this
|
||||
approach yet.
|
||||
|
||||
% \textbf{Song recognition pathway: Grasshopper vs. model:}\\
|
||||
% The model pathway includes a rather large number of Gabor kernels compared to
|
||||
% the 15 to 20 ascending neurons in the grasshopper auditory
|
||||
% system~(\bcite{stumpner1991auditory}).
|
||||
\subsection{Feature representation, temporal averaging, and song design}
|
||||
|
||||
\subsection{Interplay of song representation and song design}
|
||||
The feature set is the final song representation along the model pathway and
|
||||
constitutes the basis for song recognition. Each feature $f_i(t)$ results from
|
||||
the thresholding of the respective kernel response $c_i(t)$ by $\nl$ and the
|
||||
subsequent temporal averaging of binary response $b_i(t)$ by a lowpass filter
|
||||
with extremely low cutoff frequency $\fc$. At a given time point $t$, $f_i(t)$
|
||||
approximately quantifies the proportion of time during which $c_i(t)$ exceeds
|
||||
the threshold value $\thr$ within the averaging interval $\tlp$ specified by
|
||||
$\fc$. The value of $f_i(t)$ is hence determined by $\thr$ with respect to the
|
||||
distribution $\pci$ of $c_i(t)$ and is restricted to the interval $[0,1]$.
|
||||
|
||||
\textbf{The role of repetitive songs for the feature representation:}
|
||||
Most grasshopper songs are produced by stridulation, which refers to the
|
||||
pulling of the serrated stridulatory file on the hindlegs across a resonating
|
||||
vein on the forewings~(\bcite{helversen1977stridulatory};
|
||||
\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Every "tooth" that
|
||||
strikes the vein generates a brief sound pulse; multiple pulses make up a
|
||||
syllable; and the repetition of syllables and pauses results in a
|
||||
characteristic amplitude-modulated waveform pattern.
|
||||
Different species-specific songs are represented by different combinations of
|
||||
feature values, which should preferably be constant for the duration of a song
|
||||
to enable reliable recognition. The fundamental requirement for a constant
|
||||
$f_i(t)$ is that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all
|
||||
$t$, which is fulfilled if $\pci$ is stable across $t$. The most
|
||||
straightforward way to achieve a stable $\pci$ is that $c_i(t)$ is periodic and
|
||||
$\tlp$ is sufficiently long to average over multiple cycles of $c_i(t)$.
|
||||
Song-evoked $c_i(t)$ are indeed approximately periodic, which is largely an
|
||||
inherited property of the song itself. Most grasshopper songs are produced by
|
||||
stridulation, which refers to the pulling of the serrated stridulatory file on
|
||||
the hindlegs across a resonating vein on the
|
||||
forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song};
|
||||
\bcite{helversen1997recognition}). Every "tooth" that strikes the vein
|
||||
generates a brief sound pulse; multiple pulses make up a syllable; and the
|
||||
repetition of syllables and pauses results in a pattern with a high degree of
|
||||
temporal regularity. Accordingly, a robust feature representation in the sense
|
||||
of constant $f_i(t)$ is tightly linked to the mechanism of sound production and
|
||||
the temporal structure of the generated song.
|
||||
|
||||
\subsection{Intensity invariance versus SNR along the auditory pathway}
|
||||
Various grasshopper species, especially those with longer songs like \textit{C.
|
||||
mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
|
||||
at first and then continuously increase the amplitude of their song over time.
|
||||
This slow "ramping" amplitude modulation makes the overall song less periodic
|
||||
despite its temporal regularity. The "ramping" appears more pronounced in
|
||||
$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
|
||||
compression and adaptation during the preprocessing stage might be at least
|
||||
partially beneficial for mitigating the effect of this amplitude modulation on
|
||||
later representations. However, the adaptation of $\adapt(t)$ can only act on
|
||||
certain time scales --- depending on the cutoff frequency of the underlying
|
||||
highpass filter --- and is hence not able to compensate for "ramping" across
|
||||
the entire duration of a song.
|
||||
|
||||
\subsection{Behavior in a natural acoustic environment}
|
||||
Certain grasshopper species like \textit{Chorthippus dorsatus} are known to
|
||||
switch their stridulation pattern in the middle of a
|
||||
song~(\bcite{stumpner1994song}). \textit{C. dorsatus} starts stridulating with
|
||||
both hindlegs in synchrony and thereby generates a pronounced syllable-pause
|
||||
pattern similar to that of \textit{P. parallelus}. For the last part of its
|
||||
song, however, \textit{C. dorsatus} switches to an alternating leg movement,
|
||||
which results in a more continuous but not entirely unstructured rattling
|
||||
sound. It is unclear what this composite design means for the feature
|
||||
representation of \textit{C. dorsatus} songs. In principle, both parts of the
|
||||
song could result in similar $\pci$ despite their different temporal structure,
|
||||
which would allow for consistent $f_i(t)$ across the entire song. However, it
|
||||
appears more likely that only one part of the song encodes species identity,
|
||||
while the other part serves a different purpose such as fitness
|
||||
advertisement~(SOURCE?).
|
||||
|
||||
Finally, the question remains how the choice of an appropriate averaging
|
||||
interval $\tlp$ depends on the duration and temporal structure of a song. The
|
||||
minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$ to ensure a
|
||||
stable $\pci$ and hence a constant $f_i(t)$. The maximum $\tlp$ should not
|
||||
exceed the duration of a song to avoid the inclusion of behaviorally irrelevant
|
||||
information. The longer $\tlp$, the longer $f_i(t)$ takes to stabilize after
|
||||
the onset and before the offset of a song, which narrows the time window for
|
||||
reliable recognition. The duration of species-specific grasshopper songs can
|
||||
range from a few hundred milliseconds (e\,.g \textit{Stethophyma grossum}) to
|
||||
well over a minute (e\,.g. \textit{C. mollis}), so that the optimal $\tlp$ is
|
||||
likely to differ between species.
|
||||
|
||||
\subsection{Sensory invariances in the grasshopper auditory system}
|
||||
|
||||
The notion of invariance is fundamental for sensory processing systems.
|
||||
Invariance, in the general sense, can be described as the property of a
|
||||
transformation to maintain variation across certain meaningful input parameters
|
||||
in its output while discarding variation across other input parameters. This
|
||||
boils down to a selective input-output decorrelation that allows the system to
|
||||
represent only those aspects of the stimulus that are behaviorally relevant to
|
||||
the organism.
|
||||
|
||||
The grasshopper auditory system has to deal with a number of sources of
|
||||
non-informative song variation. For instance, the temporal structure of the
|
||||
song pattern warps with temperature~(\bcite{skovmand1983song}). This also
|
||||
affects certain structural parameters that are essential for song recognition,
|
||||
mainly the duration of syllables and pauses. The auditory system can compensate
|
||||
for this variation by reading out relative temporal relationships rather than
|
||||
absolute time intervals~(\bcite{creutzig2009timescale};
|
||||
\bcite{creutzig2010timescale}). The ratio of syllable duration to pause
|
||||
duration is relatively constant across temperatures and has been shown to be
|
||||
suitable for song recognition~(\bcite{helversen1972gesang}), so that there is
|
||||
likely no need to retain any information about the absolute duration of
|
||||
syllables and pauses.
|
||||
|
||||
The situation is more complex for variations in song intensity. Song intensity
|
||||
at the receiver's position depends mostly on the distance to the sender and is
|
||||
hence not a reliable cue to infer species identity. The auditory system should
|
||||
therefore be invariant to intensity variations to recognize conspecific songs
|
||||
regardless of sender distance. However, song intensity --- specifically, the
|
||||
interaural intensity difference --- is also required for directional hearing,
|
||||
which is essential for phonotaxis~(\bcite{helversen1988interaural}). Conflicts
|
||||
between song recognition and directional hearing are avoided in the auditory
|
||||
system by distributing both functions across two parallel
|
||||
pathways~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}). This is
|
||||
the main reason why our model pathway is focused entirely on song recognition
|
||||
and has no capacity for directional hearing, no matter how relevant it may be
|
||||
to the grasshopper.
|
||||
|
||||
Furthermore, "invariance to variations in song intensity" does not do justice
|
||||
to the full extent of the problem. Intensity is a function of song amplitude
|
||||
within a certain time frame. It can refer to the individual syllables and
|
||||
pauses of the song pattern as well as the entire song --- the former is
|
||||
relevant for song recognition, while the latter is not. Intensity invariance in
|
||||
the current context can therefore be described as time scale-selective
|
||||
sensitivity to the faster amplitude dynamics of the song pattern and
|
||||
simultaneous insensitivity to slower, more sustained amplitude dynamics. In the
|
||||
model pathway, this time scale selectivity is reflected by the cutoff frequency
|
||||
$\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most
|
||||
$\fc$ are effective in removing the local offset of $\db(t)$ and render
|
||||
$\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ will leave the
|
||||
relevant amplitude dynamics of the song pattern intact.
|
||||
|
||||
\subsection{Intensity invariance versus SNR}
|
||||
|
||||
Each processing step along the model pathway is a transformation between input
|
||||
representation and output representation. The intensity of the input is
|
||||
characterized by scale $\sca$. The intensity of the output is characterized by
|
||||
an appropriate intensity measure. If the transformation renders the output more
|
||||
intensity-invariant, then the intensity measure will saturate for sufficiently
|
||||
large $\sca$, which caps the output SNR to a constant value across these
|
||||
$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
|
||||
monotonically with $\sca$. The trade-off between intensity invariance and SNR
|
||||
refers to the principle that a transformation can either improve intensity
|
||||
invariance or maintain SNR --- it cannot do both at the same time. This
|
||||
principle is presumably not specific to the two mechanisms along the model
|
||||
pathway but rather a general property of transformations that equalize between
|
||||
different input intensities.
|
||||
|
||||
Logarithmic compression and adaptation by highpass filtering is capable of
|
||||
equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
|
||||
output $\adapt(t)$ is a perfectly intensity-invariant representation of song
|
||||
component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
|
||||
limits the effectiveness of this mechanism to sufficiently large $\sca$. This
|
||||
means that intensity invariance and SNR interact at the input level, as well.
|
||||
Specifically, the saturation point of $\adapt(t)$ is determined by the input
|
||||
SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
|
||||
$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
|
||||
$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
|
||||
frequencies outside the relevant range of grasshopper songs. The SNR is then
|
||||
further improved by the rectification and lowpass filtering of $\filt(t)$ into
|
||||
$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
|
||||
lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
|
||||
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
|
||||
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
|
||||
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
|
||||
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
|
||||
saturation level and the saturation point of $\adapt(t)$ vary between different
|
||||
species and specific songs. These differences are likely rooted in the way in
|
||||
which logarithmic compression acts on the specific distribution of $\env(t)$,
|
||||
which is determined by $\fc$ and the structure and frequency spectrum of the
|
||||
rectified $\filt(t)$.
|
||||
|
||||
Thresholding and temporal averaging renders feature $f_i(t)$
|
||||
intensity-invariant for sufficiently large $\sca$. The trade-off between
|
||||
intensity invariance and SNR is mediated by threshold value $\thr$. A lower
|
||||
$\thr$ ($\thr\to0$) improves intensity invariance by shifting the saturation
|
||||
point towards lower $\sca$ but also decreases the SNR of $f_i(t)$. The
|
||||
saturation level of $f_i(t)$ is independent of $\thr$ as long as the intensity
|
||||
invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
|
||||
therefore determined solely by the pure-noise response of $f_i(t)$. The
|
||||
distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
|
||||
normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
|
||||
of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
|
||||
$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
|
||||
value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
|
||||
higher saturation point. In this case, any non-zero feature value that is
|
||||
sustained for a sufficient duration could serve as indicator for the presence
|
||||
of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
|
||||
of $\thr$ to the properties of both the species-specific song and the natural
|
||||
noise in a certain habitat.
|
||||
|
||||
|
||||
It seems reasonable to assume that $\thr$ is one of the parameters along the
|
||||
pathway
|
||||
|
||||
Physiologically, it is presumably easier to
|
||||
manipulate $\thr$
|
||||
|
||||
|
||||
It seems reasonable that $\thr$ is easier to
|
||||
manipulate in ev
|
||||
|
||||
|
||||
Furthermore, $\thr$ is presumably a parameter along
|
||||
the pathway that
|
||||
|
||||
|
||||
$\thr$
|
||||
|
||||
|
||||
Furthermore, $\thr$ might be one of the parameters
|
||||
along the pathway
|
||||
|
||||
|
||||
|
||||
% However, the parameters that determine the SNR of $\adapt(t)$ are much less
|
||||
% understood and likely relate to properties of the signal, whereas the SNR of
|
||||
% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
|
||||
% by the system.
|
||||
|
||||
\newpage
|
||||
\textbf{Thresh-LP: Implication for intensity invariance:}\\
|
||||
|
||||
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
||||
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
||||
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
||||
other criteria such as song-noise separation or diversity between features
|
||||
|
||||
- Nonlinear operations can be used to detach representations from graded physical
|
||||
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
||||
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
||||
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
||||
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
||||
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
||||
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
||||
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
||||
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
||||
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
||||
5) Categorical behavioral decision-making requires further nonlinearities\\
|
||||
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
||||
initiation of one behavior over another is categorical (e.g. approach/stay)
|
||||
|
||||
\subsection{Intensity invariance versus intensity invariance}
|
||||
|
||||
\subsection{Implications for behavior in a natural acoustic environment}
|
||||
|
||||
% RIPPED FROM INTRODUCTION:
|
||||
|
||||
@@ -1645,63 +1849,6 @@ operate on unmodified recordings of natural grasshopper songs instead of
|
||||
condensed pulse train approximations, which widens its scope towards more
|
||||
realistic, ecologically relevant scenarios.
|
||||
|
||||
\textbf{Excursion into time-warp invariance:}
|
||||
For instance, the temporal structure of grasshopper songs warps with
|
||||
temperature~(\bcite{skovmand1983song}). The auditory system can compensate for
|
||||
this variability by reading out relative temporal relationships rather than
|
||||
absolute time intervals~(\bcite{creutzig2009timescale};
|
||||
\bcite{creutzig2010timescale}), as those remain relatively constant across
|
||||
different temperatures~(\bcite{helversen1972gesang}).
|
||||
|
||||
\textbf{Definition of invariance (general, systemic):}\\
|
||||
Invariance = Property of a system to maintain a stable output with respect to a
|
||||
set of relevant input parameters (variation to be represented) but irrespective
|
||||
of one or more other parameters (variation to be discarded)
|
||||
$\rightarrow$ Selective input-output decorrelation
|
||||
|
||||
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
|
||||
Intensity invariance = Time scale-selective sensitivity to certain faster
|
||||
amplitude dynamics (song waveform, small-scale AM) and simultaneous
|
||||
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
|
||||
large-scale AM, current overall intensity level)\\
|
||||
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
|
||||
output will be a flat line
|
||||
|
||||
\textbf{Log-HP: Implication for intensity invariance:}\\
|
||||
- Logarithmic scaling is essential for equalizing different song intensities\\
|
||||
$\rightarrow$ Intensity information can be manipulated more easily when in form
|
||||
of a signal offset in log-space than a multiplicative scale in linear space
|
||||
|
||||
- Capability to compensate for intensity variations, i.e. selective amplification
|
||||
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
|
||||
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
|
||||
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
|
||||
|
||||
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
|
||||
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
|
||||
|
||||
\textbf{Thresh-LP: Implication for intensity invariance:}\\
|
||||
- Role of song periodicity for feature representation!
|
||||
|
||||
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
||||
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
||||
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
||||
other criteria such as song-noise separation or diversity between features
|
||||
|
||||
- Nonlinear operations can be used to detach representations from graded physical
|
||||
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
||||
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
||||
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
||||
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
||||
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
||||
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
||||
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
||||
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
||||
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
||||
5) Categorical behavioral decision-making requires further nonlinearities\\
|
||||
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
||||
initiation of one behavior over another is categorical (e.g. approach/stay)
|
||||
|
||||
\newpage
|
||||
\section{Appendix}
|
||||
|
||||
@@ -1716,7 +1863,7 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
|
||||
$\noc(t)$ within the signal envelope $\env(t)$ over scale
|
||||
$\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$
|
||||
(corresponding to the analysis underlying
|
||||
Fig.\,\ref{fig:rect-lp}), using random 100 realizations of
|
||||
Fig.\,\ref{fig:rect-lp}), using 100 random realizations of
|
||||
$\noc(t)$.}
|
||||
\label{fig:app_env-sd}
|
||||
\end{figure}% Referenced.
|
||||
|
||||
Reference in New Issue
Block a user