Writing discussion.
This commit is contained in:
431
main.tex
431
main.tex
@@ -104,6 +104,7 @@
|
|||||||
\newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation
|
\newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation
|
||||||
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
|
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
|
||||||
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
|
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
|
||||||
|
\newcommand{\pci}{p(c_i,\,\tlp)} % Kernel-specific probability density (lowpass interval)
|
||||||
\newcommand{\muf}{\mu_{f_i}} % Average feature value
|
\newcommand{\muf}{\mu_{f_i}} % Average feature value
|
||||||
|
|
||||||
\section{Introduction}
|
\section{Introduction}
|
||||||
@@ -258,12 +259,13 @@ initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
|
|||||||
\bcite{bhavsar2017brain}).
|
\bcite{bhavsar2017brain}).
|
||||||
|
|
||||||
Functionally, the ascending neurons are the most diverse of the three neuronal
|
Functionally, the ascending neurons are the most diverse of the three neuronal
|
||||||
populations. Individual ascending neurons possess highly specific response
|
populations. Around 15 to 20 ascending neurons have been identified in the
|
||||||
properties that contrast with the rather homogeneous response properties of the
|
grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
|
||||||
preceding receptor neurons and local
|
ascending neurons possess highly specific response properties that contrast
|
||||||
interneurons~(\bcite{clemens2011efficient}), which indicates a transition from
|
with the rather homogeneous response properties of the preceding receptor
|
||||||
a uniform population-wide processing stream into several parallel branches.
|
neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
|
||||||
Accordingly, the model pathway is divided into two distinct
|
a transition from a uniform population-wide processing stream into several
|
||||||
|
parallel branches. Accordingly, the model pathway is divided into two distinct
|
||||||
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
|
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
|
||||||
processing steps at the levels of the tympanal membrane, the receptor neurons,
|
processing steps at the levels of the tympanal membrane, the receptor neurons,
|
||||||
and the local interneurons; and operates on one-dimensional signal
|
and the local interneurons; and operates on one-dimensional signal
|
||||||
@@ -754,16 +756,15 @@ This effect is more pronounced for lower $\fc$ of the lowpass filter and is
|
|||||||
presumably caused by the attenuation of high-frequency components in the
|
presumably caused by the attenuation of high-frequency components in the
|
||||||
signal, which are more prominent in the noise component $\noc(t)$ than in the
|
signal, which are more prominent in the noise component $\noc(t)$ than in the
|
||||||
song component $\soc(t)$. The effect also appears relatively consistent across
|
song component $\soc(t)$. The effect also appears relatively consistent across
|
||||||
different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e)
|
different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e
|
||||||
that are presumably based on different song structures and frequency spectra.
|
and appendix Fig.\,\ref{fig:app_rect-lp}). In summary, the standard deviation
|
||||||
In summary, the standard deviation of $\env(t)$ has never been observed to
|
of $\env(t)$ has never been observed to saturate for larger $\sca$ but rather
|
||||||
transition into a saturation regime for larger $\sca$ but rather continues to
|
continues to increase proportionally to $\sca$ for all tested $\fc$, in both
|
||||||
increase proportionally to $\sca$ for all tested $\fc$, in both the noiseless
|
the noiseless and the noisy case and across different species. Consequently,
|
||||||
and the noisy case and across different species. Consequently, the combination
|
the combination of rectification and lowpass filtering does not contribute to
|
||||||
of rectification and lowpass filtering does not contribute to intensity
|
intensity invariance. However, this transformation pair does improve the SNR of
|
||||||
invariance. However, this transformation pair does improve the SNR of $\env(t)$
|
$\env(t)$ relative to $\filt(t)$ and thus provides subsequent processing stages
|
||||||
relative to $\filt(t)$ and thus provides subsequent processing stages with a
|
with a more robust input representation and higher input SNR.
|
||||||
more robust input representation and higher input SNR.
|
|
||||||
|
|
||||||
\begin{figure}[!ht]
|
\begin{figure}[!ht]
|
||||||
\centering
|
\centering
|
||||||
@@ -883,24 +884,23 @@ $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the
|
|||||||
effective intensity invariance of $\adapt(t)$ through logarithmic compression
|
effective intensity invariance of $\adapt(t)$ through logarithmic compression
|
||||||
and adaptation is limited by the SNR of $\env(t)$: Songs that have already
|
and adaptation is limited by the SNR of $\env(t)$: Songs that have already
|
||||||
sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
|
sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
|
||||||
subsequent processing steps, which emphasizes the importance of the SNR
|
subsequent processing steps. The general pattern of noise regime, transient
|
||||||
improvement by rectification and lowpass filtering during the previous
|
regime, and saturation regime remains consistent across different
|
||||||
processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise
|
species~(Fig.\,\ref{fig:log-hp}e). However, the saturation point --- the $\sca$
|
||||||
regime, transient regime, and saturation regime remains consistent across
|
value at which the SNR of $\adapt(t)$ starts to saturate --- and the saturation
|
||||||
different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of
|
level --- the constant SNR of $\adapt(t)$ within the saturation regime --- vary
|
||||||
$\sca$ at which the saturation regime is reached (see appendix
|
considerably between and within species~(appendix
|
||||||
Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$
|
Figs.\,\ref{fig:app_log-hp_curves}+\ref{fig:app_log-hp_saturation}). For
|
||||||
within the saturation regime vary considerably between and within species. For
|
|
||||||
example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
|
example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
|
||||||
lower maximum SNR of $\adapt(t)$ compared to other species. These differences
|
lower saturation level compared to other species. These differences are not to
|
||||||
are not to be underestimated, since the SNR of $\adapt(t)$ within the
|
be underestimated, since the saturation level of $\adapt(t)$ determines the
|
||||||
saturation regime determines the maximum input SNR for subsequent processing
|
maximum input SNR for subsequent processing steps. In other words, the fact
|
||||||
steps. In other words, the fact that $\adapt(t)$ eventually reaches a
|
that $\adapt(t)$ eventually reaches a saturation regime is, of course,
|
||||||
saturation regime is, of course, desirable in the context of intensity
|
desirable in the context of intensity invariance, but it also means to pass up
|
||||||
invariance, but it also means to pass up on the higher SNR values that are
|
on the higher SNR values that are achieved by $\env(t)$ for the same $\sca$ (up
|
||||||
achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
|
to several orders of magnitude, Fig.\,\ref{fig:log-hp}d). This trade-off
|
||||||
Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
|
between intensity invariance and SNR is a recurring phenomenon that is further
|
||||||
is a recurring phenomenon that is further addressed in the following sections.
|
addressed in the following sections.
|
||||||
|
|
||||||
\begin{figure}[!ht]
|
\begin{figure}[!ht]
|
||||||
\centering
|
\centering
|
||||||
@@ -1000,24 +1000,17 @@ sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
|
|||||||
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
|
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
|
||||||
saturation regime).
|
saturation regime).
|
||||||
|
|
||||||
The value of $\mu_f$ in the saturation regime is independent of the precise
|
The saturation level of $f(t)$ is independent of the precise value of $\Theta$,
|
||||||
value of $\Theta$, but the value of $\sca$ at which the saturation regime is
|
but the saturation point decreases with
|
||||||
reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
|
$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, a threshold value of
|
||||||
a threshold value of $\Theta=0$ would be the optimal choice for achieving
|
$\Theta=0$ would be the optimal choice for achieving intensity invariance at
|
||||||
intensity invariance at the lowest possible $\sca$. In stark contrast, the
|
the lowest possible $\sca$. In stark contrast, the closer $\Theta$ is to 0, the
|
||||||
closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise
|
higher $\mu_f$ in response to the pure noise component $\noc(t)$ and the lower
|
||||||
component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise
|
the resulting SNR of $f(t)$ between noise regime and saturation
|
||||||
regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column,
|
regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
|
||||||
and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
|
Fig.\,\ref{fig:thresh-lp_single}e). This trade-off between intensity invariance
|
||||||
"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
|
and SNR has already been observed during the previous analysis on logarithmic
|
||||||
pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song
|
compression and adaptation~(Fig.\,\ref{fig:log-hp}d).
|
||||||
component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
|
|
||||||
$\sca$ to reach the saturation regime. This trade-off between intensity
|
|
||||||
invariance and SNR has already been observed during the previous analysis on
|
|
||||||
logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
|
|
||||||
parameters that determine the SNR of $\adapt(t)$ are much less understood and
|
|
||||||
likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
|
|
||||||
the choice of $\Theta$ and can be more directly manipulated by the system.
|
|
||||||
|
|
||||||
Finally, the effects of thresholding and temporal averaging must be seen in the
|
Finally, the effects of thresholding and temporal averaging must be seen in the
|
||||||
context of the previous transformation pair of logarithmic compression and
|
context of the previous transformation pair of logarithmic compression and
|
||||||
@@ -1102,11 +1095,11 @@ that the songs of each species are eventually represented by distinct points in
|
|||||||
feature space. However, the species-specific trajectories cross each other at
|
feature space. However, the species-specific trajectories cross each other at
|
||||||
numerous points, which means that the songs of two species --- each at a
|
numerous points, which means that the songs of two species --- each at a
|
||||||
specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
|
specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
|
||||||
the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and
|
the specific saturation point of $f_i(t)$ depends on the species: For
|
||||||
the species: For \textit{C. mollis}, all $\muf$ saturate around the same
|
\textit{C. mollis}, all $\muf$ saturate around the same $\sca$, while
|
||||||
$\sca$, while \textit{O. rufipes} exhibits considerable variation between the
|
\textit{O. rufipes} exhibits considerable variation between the three $f_i(t)$.
|
||||||
three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$,
|
The larger the variation in saturation points between $f_i(t)$, the stronger
|
||||||
the stronger the curvature of the trajectory through feature space.
|
the curvature of the trajectory through feature space.
|
||||||
|
|
||||||
In the noisy case, $\muf$ is non-zero even for the smallest
|
In the noisy case, $\muf$ is non-zero even for the smallest
|
||||||
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
|
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
|
||||||
@@ -1121,9 +1114,9 @@ previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
|
|||||||
trajectories now move a much shorter distance through feature space for a
|
trajectories now move a much shorter distance through feature space for a
|
||||||
similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
|
similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
|
||||||
and saturation regime, which increases the likelihood of trajectories crossing
|
and saturation regime, which increases the likelihood of trajectories crossing
|
||||||
each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given
|
each other. Finally, the saturation points of $f_i(t)$ for a given species are
|
||||||
species are slightly higher in the noisy case, but the variation between
|
slightly higher in the noisy case, but the variation between $f_i(t)$ remains
|
||||||
$f_i(t)$ remains largely unchanged.
|
largely unchanged.
|
||||||
|
|
||||||
In summary, even a comparably small set of three features $f_i(t)$ can, in
|
In summary, even a comparably small set of three features $f_i(t)$ can, in
|
||||||
principle, represent different species-specific songs at distinct points in
|
principle, represent different species-specific songs at distinct points in
|
||||||
@@ -1238,15 +1231,10 @@ broader and is not centered around the single saturation point based on the
|
|||||||
median but rather shifted towards lower $\sca$. Care must be taken when
|
median but rather shifted towards lower $\sca$. Care must be taken when
|
||||||
interpreting the height of either distribution due to the logarithmic scaling
|
interpreting the height of either distribution due to the logarithmic scaling
|
||||||
of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
|
of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
|
||||||
specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their
|
the saturation points of specific $f_i(t)$ are indeed lower than those of their
|
||||||
$c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
|
$c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
|
||||||
averaging on intensity invariance is not necessarily nullified by the previous
|
averaging on intensity invariance is not necessarily nullified by the previous
|
||||||
logarithmic compression and adaptation, which means that both mechanisms can,
|
logarithmic compression and adaptation.
|
||||||
in principle, work together towards an intensity-invariant song representation.
|
|
||||||
% Or does one simply overwrite the other? Can there even be a higher intensity
|
|
||||||
% invariance based on the sum of both effects? Or does one simply kick in for
|
|
||||||
% lower scales than the other and thus dictates the overall intensity
|
|
||||||
% invariance? Whatever, discussion material.
|
|
||||||
|
|
||||||
\begin{figure}[!ht]
|
\begin{figure}[!ht]
|
||||||
\centering
|
\centering
|
||||||
@@ -1313,7 +1301,7 @@ representation goes hand in hand with a substantial degree of redundancy and is
|
|||||||
hardly expected to be present in the actual grasshopper auditory system. But
|
hardly expected to be present in the actual grasshopper auditory system. But
|
||||||
the fact that the saturated $\muf$ are distributed symmetrically around 0.5
|
the fact that the saturated $\muf$ are distributed symmetrically around 0.5
|
||||||
provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
|
provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
|
||||||
saturation value in the absence of logarithmic
|
saturation level in the absence of logarithmic
|
||||||
compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
|
compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
|
||||||
the capping of $\adapt(t)$, as seen during previous
|
the capping of $\adapt(t)$, as seen during previous
|
||||||
analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
|
analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
|
||||||
@@ -1327,8 +1315,8 @@ that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
|
|||||||
results in a wider range of $\muf$ across the feature set, it should be
|
results in a wider range of $\muf$ across the feature set, it should be
|
||||||
benefitial for forming species-specific combinations. However, this depends on
|
benefitial for forming species-specific combinations. However, this depends on
|
||||||
multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
|
multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
|
||||||
the structure and distribution of the specific song and is hence not
|
the structure and distribution of the specific song and is hence not guaranteed
|
||||||
guaranteed simply by disabling logarithmic compression.
|
simply by disabling logarithmic compression.
|
||||||
|
|
||||||
\begin{figure}[!ht]
|
\begin{figure}[!ht]
|
||||||
\centering
|
\centering
|
||||||
@@ -1560,25 +1548,241 @@ functional modelling. Other sensory systems that are either more complex or
|
|||||||
have not been subject to decades of study will likely not be suitable for this
|
have not been subject to decades of study will likely not be suitable for this
|
||||||
approach yet.
|
approach yet.
|
||||||
|
|
||||||
% \textbf{Song recognition pathway: Grasshopper vs. model:}\\
|
\subsection{Feature representation, temporal averaging, and song design}
|
||||||
% The model pathway includes a rather large number of Gabor kernels compared to
|
|
||||||
% the 15 to 20 ascending neurons in the grasshopper auditory
|
|
||||||
% system~(\bcite{stumpner1991auditory}).
|
|
||||||
|
|
||||||
\subsection{Interplay of song representation and song design}
|
The feature set is the final song representation along the model pathway and
|
||||||
|
constitutes the basis for song recognition. Each feature $f_i(t)$ results from
|
||||||
|
the thresholding of the respective kernel response $c_i(t)$ by $\nl$ and the
|
||||||
|
subsequent temporal averaging of binary response $b_i(t)$ by a lowpass filter
|
||||||
|
with extremely low cutoff frequency $\fc$. At a given time point $t$, $f_i(t)$
|
||||||
|
approximately quantifies the proportion of time during which $c_i(t)$ exceeds
|
||||||
|
the threshold value $\thr$ within the averaging interval $\tlp$ specified by
|
||||||
|
$\fc$. The value of $f_i(t)$ is hence determined by $\thr$ with respect to the
|
||||||
|
distribution $\pci$ of $c_i(t)$ and is restricted to the interval $[0,1]$.
|
||||||
|
|
||||||
\textbf{The role of repetitive songs for the feature representation:}
|
Different species-specific songs are represented by different combinations of
|
||||||
Most grasshopper songs are produced by stridulation, which refers to the
|
feature values, which should preferably be constant for the duration of a song
|
||||||
pulling of the serrated stridulatory file on the hindlegs across a resonating
|
to enable reliable recognition. The fundamental requirement for a constant
|
||||||
vein on the forewings~(\bcite{helversen1977stridulatory};
|
$f_i(t)$ is that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all
|
||||||
\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Every "tooth" that
|
$t$, which is fulfilled if $\pci$ is stable across $t$. The most
|
||||||
strikes the vein generates a brief sound pulse; multiple pulses make up a
|
straightforward way to achieve a stable $\pci$ is that $c_i(t)$ is periodic and
|
||||||
syllable; and the repetition of syllables and pauses results in a
|
$\tlp$ is sufficiently long to average over multiple cycles of $c_i(t)$.
|
||||||
characteristic amplitude-modulated waveform pattern.
|
Song-evoked $c_i(t)$ are indeed approximately periodic, which is largely an
|
||||||
|
inherited property of the song itself. Most grasshopper songs are produced by
|
||||||
|
stridulation, which refers to the pulling of the serrated stridulatory file on
|
||||||
|
the hindlegs across a resonating vein on the
|
||||||
|
forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song};
|
||||||
|
\bcite{helversen1997recognition}). Every "tooth" that strikes the vein
|
||||||
|
generates a brief sound pulse; multiple pulses make up a syllable; and the
|
||||||
|
repetition of syllables and pauses results in a pattern with a high degree of
|
||||||
|
temporal regularity. Accordingly, a robust feature representation in the sense
|
||||||
|
of constant $f_i(t)$ is tightly linked to the mechanism of sound production and
|
||||||
|
the temporal structure of the generated song.
|
||||||
|
|
||||||
\subsection{Intensity invariance versus SNR along the auditory pathway}
|
Various grasshopper species, especially those with longer songs like \textit{C.
|
||||||
|
mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
|
||||||
|
at first and then continuously increase the amplitude of their song over time.
|
||||||
|
This slow "ramping" amplitude modulation makes the overall song less periodic
|
||||||
|
despite its temporal regularity. The "ramping" appears more pronounced in
|
||||||
|
$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
|
||||||
|
compression and adaptation during the preprocessing stage might be at least
|
||||||
|
partially beneficial for mitigating the effect of this amplitude modulation on
|
||||||
|
later representations. However, the adaptation of $\adapt(t)$ can only act on
|
||||||
|
certain time scales --- depending on the cutoff frequency of the underlying
|
||||||
|
highpass filter --- and is hence not able to compensate for "ramping" across
|
||||||
|
the entire duration of a song.
|
||||||
|
|
||||||
\subsection{Behavior in a natural acoustic environment}
|
Certain grasshopper species like \textit{Chorthippus dorsatus} are known to
|
||||||
|
switch their stridulation pattern in the middle of a
|
||||||
|
song~(\bcite{stumpner1994song}). \textit{C. dorsatus} starts stridulating with
|
||||||
|
both hindlegs in synchrony and thereby generates a pronounced syllable-pause
|
||||||
|
pattern similar to that of \textit{P. parallelus}. For the last part of its
|
||||||
|
song, however, \textit{C. dorsatus} switches to an alternating leg movement,
|
||||||
|
which results in a more continuous but not entirely unstructured rattling
|
||||||
|
sound. It is unclear what this composite design means for the feature
|
||||||
|
representation of \textit{C. dorsatus} songs. In principle, both parts of the
|
||||||
|
song could result in similar $\pci$ despite their different temporal structure,
|
||||||
|
which would allow for consistent $f_i(t)$ across the entire song. However, it
|
||||||
|
appears more likely that only one part of the song encodes species identity,
|
||||||
|
while the other part serves a different purpose such as fitness
|
||||||
|
advertisement~(SOURCE?).
|
||||||
|
|
||||||
|
Finally, the question remains how the choice of an appropriate averaging
|
||||||
|
interval $\tlp$ depends on the duration and temporal structure of a song. The
|
||||||
|
minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$ to ensure a
|
||||||
|
stable $\pci$ and hence a constant $f_i(t)$. The maximum $\tlp$ should not
|
||||||
|
exceed the duration of a song to avoid the inclusion of behaviorally irrelevant
|
||||||
|
information. The longer $\tlp$, the longer $f_i(t)$ takes to stabilize after
|
||||||
|
the onset and before the offset of a song, which narrows the time window for
|
||||||
|
reliable recognition. The duration of species-specific grasshopper songs can
|
||||||
|
range from a few hundred milliseconds (e\,.g \textit{Stethophyma grossum}) to
|
||||||
|
well over a minute (e\,.g. \textit{C. mollis}), so that the optimal $\tlp$ is
|
||||||
|
likely to differ between species.
|
||||||
|
|
||||||
|
\subsection{Sensory invariances in the grasshopper auditory system}
|
||||||
|
|
||||||
|
The notion of invariance is fundamental for sensory processing systems.
|
||||||
|
Invariance, in the general sense, can be described as the property of a
|
||||||
|
transformation to maintain variation across certain meaningful input parameters
|
||||||
|
in its output while discarding variation across other input parameters. This
|
||||||
|
boils down to a selective input-output decorrelation that allows the system to
|
||||||
|
represent only those aspects of the stimulus that are behaviorally relevant to
|
||||||
|
the organism.
|
||||||
|
|
||||||
|
The grasshopper auditory system has to deal with a number of sources of
|
||||||
|
non-informative song variation. For instance, the temporal structure of the
|
||||||
|
song pattern warps with temperature~(\bcite{skovmand1983song}). This also
|
||||||
|
affects certain structural parameters that are essential for song recognition,
|
||||||
|
mainly the duration of syllables and pauses. The auditory system can compensate
|
||||||
|
for this variation by reading out relative temporal relationships rather than
|
||||||
|
absolute time intervals~(\bcite{creutzig2009timescale};
|
||||||
|
\bcite{creutzig2010timescale}). The ratio of syllable duration to pause
|
||||||
|
duration is relatively constant across temperatures and has been shown to be
|
||||||
|
suitable for song recognition~(\bcite{helversen1972gesang}), so that there is
|
||||||
|
likely no need to retain any information about the absolute duration of
|
||||||
|
syllables and pauses.
|
||||||
|
|
||||||
|
The situation is more complex for variations in song intensity. Song intensity
|
||||||
|
at the receiver's position depends mostly on the distance to the sender and is
|
||||||
|
hence not a reliable cue to infer species identity. The auditory system should
|
||||||
|
therefore be invariant to intensity variations to recognize conspecific songs
|
||||||
|
regardless of sender distance. However, song intensity --- specifically, the
|
||||||
|
interaural intensity difference --- is also required for directional hearing,
|
||||||
|
which is essential for phonotaxis~(\bcite{helversen1988interaural}). Conflicts
|
||||||
|
between song recognition and directional hearing are avoided in the auditory
|
||||||
|
system by distributing both functions across two parallel
|
||||||
|
pathways~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}). This is
|
||||||
|
the main reason why our model pathway is focused entirely on song recognition
|
||||||
|
and has no capacity for directional hearing, no matter how relevant it may be
|
||||||
|
to the grasshopper.
|
||||||
|
|
||||||
|
Furthermore, "invariance to variations in song intensity" does not do justice
|
||||||
|
to the full extent of the problem. Intensity is a function of song amplitude
|
||||||
|
within a certain time frame. It can refer to the individual syllables and
|
||||||
|
pauses of the song pattern as well as the entire song --- the former is
|
||||||
|
relevant for song recognition, while the latter is not. Intensity invariance in
|
||||||
|
the current context can therefore be described as time scale-selective
|
||||||
|
sensitivity to the faster amplitude dynamics of the song pattern and
|
||||||
|
simultaneous insensitivity to slower, more sustained amplitude dynamics. In the
|
||||||
|
model pathway, this time scale selectivity is reflected by the cutoff frequency
|
||||||
|
$\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most
|
||||||
|
$\fc$ are effective in removing the local offset of $\db(t)$ and render
|
||||||
|
$\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ will leave the
|
||||||
|
relevant amplitude dynamics of the song pattern intact.
|
||||||
|
|
||||||
|
\subsection{Intensity invariance versus SNR}
|
||||||
|
|
||||||
|
Each processing step along the model pathway is a transformation between input
|
||||||
|
representation and output representation. The intensity of the input is
|
||||||
|
characterized by scale $\sca$. The intensity of the output is characterized by
|
||||||
|
an appropriate intensity measure. If the transformation renders the output more
|
||||||
|
intensity-invariant, then the intensity measure will saturate for sufficiently
|
||||||
|
large $\sca$, which caps the output SNR to a constant value across these
|
||||||
|
$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
|
||||||
|
monotonically with $\sca$. The trade-off between intensity invariance and SNR
|
||||||
|
refers to the principle that a transformation can either improve intensity
|
||||||
|
invariance or maintain SNR --- it cannot do both at the same time. This
|
||||||
|
principle is presumably not specific to the two mechanisms along the model
|
||||||
|
pathway but rather a general property of transformations that equalize between
|
||||||
|
different input intensities.
|
||||||
|
|
||||||
|
Logarithmic compression and adaptation by highpass filtering is capable of
|
||||||
|
equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
|
||||||
|
output $\adapt(t)$ is a perfectly intensity-invariant representation of song
|
||||||
|
component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
|
||||||
|
limits the effectiveness of this mechanism to sufficiently large $\sca$. This
|
||||||
|
means that intensity invariance and SNR interact at the input level, as well.
|
||||||
|
Specifically, the saturation point of $\adapt(t)$ is determined by the input
|
||||||
|
SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
|
||||||
|
$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
|
||||||
|
$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
|
||||||
|
frequencies outside the relevant range of grasshopper songs. The SNR is then
|
||||||
|
further improved by the rectification and lowpass filtering of $\filt(t)$ into
|
||||||
|
$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
|
||||||
|
lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
|
||||||
|
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
|
||||||
|
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
|
||||||
|
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
|
||||||
|
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
|
||||||
|
saturation level and the saturation point of $\adapt(t)$ vary between different
|
||||||
|
species and specific songs. These differences are likely rooted in the way in
|
||||||
|
which logarithmic compression acts on the specific distribution of $\env(t)$,
|
||||||
|
which is determined by $\fc$ and the structure and frequency spectrum of the
|
||||||
|
rectified $\filt(t)$.
|
||||||
|
|
||||||
|
Thresholding and temporal averaging renders feature $f_i(t)$
|
||||||
|
intensity-invariant for sufficiently large $\sca$. The trade-off between
|
||||||
|
intensity invariance and SNR is mediated by threshold value $\thr$. A lower
|
||||||
|
$\thr$ ($\thr\to0$) improves intensity invariance by shifting the saturation
|
||||||
|
point towards lower $\sca$ but also decreases the SNR of $f_i(t)$. The
|
||||||
|
saturation level of $f_i(t)$ is independent of $\thr$ as long as the intensity
|
||||||
|
invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
|
||||||
|
therefore determined solely by the pure-noise response of $f_i(t)$. The
|
||||||
|
distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
|
||||||
|
normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
|
||||||
|
of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
|
||||||
|
$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
|
||||||
|
value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
|
||||||
|
higher saturation point. In this case, any non-zero feature value that is
|
||||||
|
sustained for a sufficient duration could serve as indicator for the presence
|
||||||
|
of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
|
||||||
|
of $\thr$ to the properties of both the species-specific song and the natural
|
||||||
|
noise in a certain habitat.
|
||||||
|
|
||||||
|
|
||||||
|
It seems reasonable to assume that $\thr$ is one of the parameters along the
|
||||||
|
pathway
|
||||||
|
|
||||||
|
Physiologically, it is presumably easier to
|
||||||
|
manipulate $\thr$
|
||||||
|
|
||||||
|
|
||||||
|
It seems reasonable that $\thr$ is easier to
|
||||||
|
manipulate in ev
|
||||||
|
|
||||||
|
|
||||||
|
Furthermore, $\thr$ is presumably a parameter along
|
||||||
|
the pathway that
|
||||||
|
|
||||||
|
|
||||||
|
$\thr$
|
||||||
|
|
||||||
|
|
||||||
|
Furthermore, $\thr$ might be one of the parameters
|
||||||
|
along the pathway
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
% However, the parameters that determine the SNR of $\adapt(t)$ are much less
|
||||||
|
% understood and likely relate to properties of the signal, whereas the SNR of
|
||||||
|
% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
|
||||||
|
% by the system.
|
||||||
|
|
||||||
|
\newpage
|
||||||
|
\textbf{Thresh-LP: Implication for intensity invariance:}\\
|
||||||
|
|
||||||
|
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
||||||
|
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
||||||
|
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
||||||
|
other criteria such as song-noise separation or diversity between features
|
||||||
|
|
||||||
|
- Nonlinear operations can be used to detach representations from graded physical
|
||||||
|
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
||||||
|
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
||||||
|
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
||||||
|
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
||||||
|
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
||||||
|
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
||||||
|
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
||||||
|
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
||||||
|
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
||||||
|
5) Categorical behavioral decision-making requires further nonlinearities\\
|
||||||
|
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
||||||
|
initiation of one behavior over another is categorical (e.g. approach/stay)
|
||||||
|
|
||||||
|
\subsection{Intensity invariance versus intensity invariance}
|
||||||
|
|
||||||
|
\subsection{Implications for behavior in a natural acoustic environment}
|
||||||
|
|
||||||
% RIPPED FROM INTRODUCTION:
|
% RIPPED FROM INTRODUCTION:
|
||||||
|
|
||||||
@@ -1645,63 +1849,6 @@ operate on unmodified recordings of natural grasshopper songs instead of
|
|||||||
condensed pulse train approximations, which widens its scope towards more
|
condensed pulse train approximations, which widens its scope towards more
|
||||||
realistic, ecologically relevant scenarios.
|
realistic, ecologically relevant scenarios.
|
||||||
|
|
||||||
\textbf{Excursion into time-warp invariance:}
|
|
||||||
For instance, the temporal structure of grasshopper songs warps with
|
|
||||||
temperature~(\bcite{skovmand1983song}). The auditory system can compensate for
|
|
||||||
this variability by reading out relative temporal relationships rather than
|
|
||||||
absolute time intervals~(\bcite{creutzig2009timescale};
|
|
||||||
\bcite{creutzig2010timescale}), as those remain relatively constant across
|
|
||||||
different temperatures~(\bcite{helversen1972gesang}).
|
|
||||||
|
|
||||||
\textbf{Definition of invariance (general, systemic):}\\
|
|
||||||
Invariance = Property of a system to maintain a stable output with respect to a
|
|
||||||
set of relevant input parameters (variation to be represented) but irrespective
|
|
||||||
of one or more other parameters (variation to be discarded)
|
|
||||||
$\rightarrow$ Selective input-output decorrelation
|
|
||||||
|
|
||||||
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
|
|
||||||
Intensity invariance = Time scale-selective sensitivity to certain faster
|
|
||||||
amplitude dynamics (song waveform, small-scale AM) and simultaneous
|
|
||||||
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
|
|
||||||
large-scale AM, current overall intensity level)\\
|
|
||||||
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
|
|
||||||
output will be a flat line
|
|
||||||
|
|
||||||
\textbf{Log-HP: Implication for intensity invariance:}\\
|
|
||||||
- Logarithmic scaling is essential for equalizing different song intensities\\
|
|
||||||
$\rightarrow$ Intensity information can be manipulated more easily when in form
|
|
||||||
of a signal offset in log-space than a multiplicative scale in linear space
|
|
||||||
|
|
||||||
- Capability to compensate for intensity variations, i.e. selective amplification
|
|
||||||
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
|
|
||||||
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
|
|
||||||
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
|
|
||||||
|
|
||||||
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
|
|
||||||
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
|
|
||||||
|
|
||||||
\textbf{Thresh-LP: Implication for intensity invariance:}\\
|
|
||||||
- Role of song periodicity for feature representation!
|
|
||||||
|
|
||||||
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
|
||||||
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
|
||||||
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
|
||||||
other criteria such as song-noise separation or diversity between features
|
|
||||||
|
|
||||||
- Nonlinear operations can be used to detach representations from graded physical
|
|
||||||
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
|
||||||
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
|
||||||
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
|
||||||
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
|
||||||
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
|
||||||
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
|
||||||
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
|
||||||
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
|
||||||
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
|
||||||
5) Categorical behavioral decision-making requires further nonlinearities\\
|
|
||||||
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
|
||||||
initiation of one behavior over another is categorical (e.g. approach/stay)
|
|
||||||
|
|
||||||
\newpage
|
\newpage
|
||||||
\section{Appendix}
|
\section{Appendix}
|
||||||
|
|
||||||
@@ -1716,7 +1863,7 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
|
|||||||
$\noc(t)$ within the signal envelope $\env(t)$ over scale
|
$\noc(t)$ within the signal envelope $\env(t)$ over scale
|
||||||
$\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$
|
$\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$
|
||||||
(corresponding to the analysis underlying
|
(corresponding to the analysis underlying
|
||||||
Fig.\,\ref{fig:rect-lp}), using random 100 realizations of
|
Fig.\,\ref{fig:rect-lp}), using 100 random realizations of
|
||||||
$\noc(t)$.}
|
$\noc(t)$.}
|
||||||
\label{fig:app_env-sd}
|
\label{fig:app_env-sd}
|
||||||
\end{figure}% Referenced.
|
\end{figure}% Referenced.
|
||||||
|
|||||||
Reference in New Issue
Block a user