Kind of done drafting the discussion. Needs polishing.
This commit is contained in:
240
main.tex
240
main.tex
@@ -258,14 +258,15 @@ substrate for conspecific song recognition and response
|
|||||||
initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
|
initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
|
||||||
\bcite{bhavsar2017brain}).
|
\bcite{bhavsar2017brain}).
|
||||||
|
|
||||||
Functionally, the ascending neurons are the most diverse of the three neuronal
|
Around 15 to 20 ascending neurons have been identified in the grasshopper
|
||||||
populations. Around 15 to 20 ascending neurons have been identified in the
|
auditory system~(\bcite{stumpner1991auditory}), whose functional
|
||||||
grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
|
characteristics are conserved even between species that are not closely
|
||||||
ascending neurons possess highly specific response properties that contrast
|
related~(\bcite{neuhofer2008evolutionarily}). The population of ascending
|
||||||
with the rather homogeneous response properties of the preceding receptor
|
neurons possesses a diverse range of response properties that contrasts with
|
||||||
neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
|
the rather homogeneous responses of receptor neurons and local
|
||||||
a transition from a uniform population-wide processing stream into several
|
interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a
|
||||||
parallel branches. Accordingly, the model pathway is divided into two distinct
|
uniform population-wide processing stream into several parallel branches.
|
||||||
|
Accordingly, the model pathway is divided into two distinct
|
||||||
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
|
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
|
||||||
processing steps at the levels of the tympanal membrane, the receptor neurons,
|
processing steps at the levels of the tympanal membrane, the receptor neurons,
|
||||||
and the local interneurons; and operates on one-dimensional signal
|
and the local interneurons; and operates on one-dimensional signal
|
||||||
@@ -275,6 +276,26 @@ downstream towards the SEG; and operates on high-dimensional signal
|
|||||||
representations~(Fig.\,\ref{fig:stages_feat}). The details of each
|
representations~(Fig.\,\ref{fig:stages_feat}). The details of each
|
||||||
physiological processing step and its functional approximation are described in
|
physiological processing step and its functional approximation are described in
|
||||||
the following sections.
|
the following sections.
|
||||||
|
|
||||||
|
Around 15 to 20 ascending neurons have been identified in the grasshopper
|
||||||
|
auditory system~(\bcite{stumpner1991auditory}), whose functional
|
||||||
|
characteristics are conserved even between species that are not closely
|
||||||
|
related~(\bcite{neuhofer2008evolutionarily}). The population of ascending
|
||||||
|
neurons possesses a diverse range of response properties that contrasts with
|
||||||
|
the rather homogeneous responses of receptor neurons and local
|
||||||
|
interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a
|
||||||
|
uniform population-wide processing stream into several parallel branches.
|
||||||
|
Accordingly, the model pathway is divided into two distinct
|
||||||
|
stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
|
||||||
|
processing steps at the levels of the tympanal membrane, the receptor neurons,
|
||||||
|
and the local interneurons; and operates on one-dimensional signal
|
||||||
|
representations~(Fig.\,\ref{fig:stages_pre}). The feature extraction stage
|
||||||
|
corresponds to the processing within the ascending neurons and further
|
||||||
|
downstream towards the SEG; and operates on high-dimensional signal
|
||||||
|
representations~(Fig.\,\ref{fig:stages_feat}). The details of each
|
||||||
|
physiological processing step and its functional approximation are described in
|
||||||
|
the following sections.
|
||||||
|
|
||||||
\begin{figure}[!ht]
|
\begin{figure}[!ht]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=\textwidth]{figures/fig_auditory_pathway.pdf}
|
\includegraphics[width=\textwidth]{figures/fig_auditory_pathway.pdf}
|
||||||
@@ -1549,6 +1570,7 @@ have not been subject to decades of study will likely not be suitable for this
|
|||||||
approach yet.
|
approach yet.
|
||||||
|
|
||||||
\subsection{Feature representation, temporal averaging, and song design}
|
\subsection{Feature representation, temporal averaging, and song design}
|
||||||
|
\label{sec:constant_feat}
|
||||||
|
|
||||||
The feature set is the final song representation along the model pathway and
|
The feature set is the final song representation along the model pathway and
|
||||||
constitutes the basis for song recognition. Each feature $f_i(t)$ results from
|
constitutes the basis for song recognition. Each feature $f_i(t)$ results from
|
||||||
@@ -1703,12 +1725,15 @@ lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
|
|||||||
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
|
$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
|
||||||
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
|
amplitude dynamics of the song pattern. The saturation level of $\adapt$,
|
||||||
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
|
unlike its saturation point, is independent of the SNR of $\env(t)$ because the
|
||||||
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
|
influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output
|
||||||
saturation level and the saturation point of $\adapt(t)$ vary between different
|
SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might
|
||||||
species and specific songs. These differences are likely rooted in the way in
|
in parts be a consequence of the logarithm, which compresses different higher
|
||||||
which logarithmic compression acts on the specific distribution of $\env(t)$,
|
intensities but also amplifies lower intensities, including the noise floor.
|
||||||
which is determined by $\fc$ and the structure and frequency spectrum of the
|
Both the saturation level and the saturation point of $\adapt(t)$ vary between
|
||||||
rectified $\filt(t)$.
|
different species and individual songs. These differences are likely rooted in
|
||||||
|
the way in which logarithmic compression acts on the specific distribution of
|
||||||
|
$\env(t)$, which is determined by $\fc$ as well as the temporal structure and
|
||||||
|
frequency spectrum of the rectified $\filt(t)$.
|
||||||
|
|
||||||
Thresholding and temporal averaging renders feature $f_i(t)$
|
Thresholding and temporal averaging renders feature $f_i(t)$
|
||||||
intensity-invariant for sufficiently large $\sca$. The trade-off between
|
intensity-invariant for sufficiently large $\sca$. The trade-off between
|
||||||
@@ -1720,71 +1745,103 @@ invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
|
|||||||
therefore determined solely by the pure-noise response of $f_i(t)$. The
|
therefore determined solely by the pure-noise response of $f_i(t)$. The
|
||||||
distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
|
distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
|
||||||
normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
|
normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
|
||||||
of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
|
of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for higher
|
||||||
$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
|
$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
|
||||||
value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
|
value is 0, which results in an "unlimited" SNR of $f_i(t)$. In this case, any
|
||||||
higher saturation point. In this case, any non-zero feature value that is
|
non-zero feature value that is sustained for a sufficient duration could serve
|
||||||
sustained for a sufficient duration could serve as indicator for the presence
|
as indicator for the presence of $\soc(t)$, although at the cost of a higher
|
||||||
of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
|
saturation point. The maximum of the pure-noise $c_i(t)$ is assumed to be very
|
||||||
of $\thr$ to the properties of both the species-specific song and the natural
|
small due to the various SNR improvements along the pathway, so that the
|
||||||
|
required increase in $\thr$ and hence the saturation point of $f_i(t)$ is not
|
||||||
|
expected to be substantial. However, exploiting the capacity of $f_i(t)$ for
|
||||||
|
arbitrarily high SNR would certainly require a fine evolutionary tuning of
|
||||||
|
$\thr$ to the properties of both the species-specific song and the natural
|
||||||
noise in a certain habitat.
|
noise in a certain habitat.
|
||||||
|
|
||||||
|
|
||||||
It seems reasonable to assume that $\thr$ is one of the parameters along the
|
|
||||||
pathway
|
|
||||||
|
|
||||||
Physiologically, it is presumably easier to
|
|
||||||
manipulate $\thr$
|
|
||||||
|
|
||||||
|
|
||||||
It seems reasonable that $\thr$ is easier to
|
|
||||||
manipulate in ev
|
|
||||||
|
|
||||||
|
|
||||||
Furthermore, $\thr$ is presumably a parameter along
|
|
||||||
the pathway that
|
|
||||||
|
|
||||||
|
|
||||||
$\thr$
|
|
||||||
|
|
||||||
|
|
||||||
Furthermore, $\thr$ might be one of the parameters
|
|
||||||
along the pathway
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
% However, the parameters that determine the SNR of $\adapt(t)$ are much less
|
|
||||||
% understood and likely relate to properties of the signal, whereas the SNR of
|
|
||||||
% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
|
|
||||||
% by the system.
|
|
||||||
|
|
||||||
\newpage
|
\newpage
|
||||||
\textbf{Thresh-LP: Implication for intensity invariance:}\\
|
|
||||||
|
|
||||||
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
|
||||||
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
|
||||||
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
|
||||||
other criteria such as song-noise separation or diversity between features
|
|
||||||
|
|
||||||
- Nonlinear operations can be used to detach representations from graded physical
|
|
||||||
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
|
||||||
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
|
||||||
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
|
||||||
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
|
||||||
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
|
||||||
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
|
||||||
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
|
||||||
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
|
||||||
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
|
||||||
5) Categorical behavioral decision-making requires further nonlinearities\\
|
|
||||||
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
|
||||||
initiation of one behavior over another is categorical (e.g. approach/stay)
|
|
||||||
|
|
||||||
\subsection{Intensity invariance versus intensity invariance}
|
\subsection{Intensity invariance versus intensity invariance}
|
||||||
|
|
||||||
|
Two consecutive mechanisms of intensity invariance do not necessarily add up to
|
||||||
|
a stronger overall intensity invariance. If the first mechanism results in a
|
||||||
|
lower saturation point than the second mechanism by itself, the saturation
|
||||||
|
point of feature $f_i(t)$ will be determined solely by the first mechanism. In
|
||||||
|
this case, the saturation level of $f_i(t)$ will conform to the intensity that
|
||||||
|
$f_i(t)$ can reach for the given saturation point rather than the intrinsic
|
||||||
|
saturation level of $f_i(t)$. Conversely, if the second mechanism results in a
|
||||||
|
lower saturation point than the first mechanism, both the saturation point and
|
||||||
|
the saturation level of $f_i(t)$ will be determined by the second mechanism.
|
||||||
|
The saturation points of $f_i(t)$ across the set are distributed over a much
|
||||||
|
wider range than those of the preceeding kernel responses $c_i(t)$, which
|
||||||
|
suggests that the interaction between the two mechanisms is specific to
|
||||||
|
individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation
|
||||||
|
point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only
|
||||||
|
marginally lower saturation points. This raises the question whether two
|
||||||
|
consecutive mechanisms of intensity invariance are actually beneficial for the
|
||||||
|
overall system.
|
||||||
|
|
||||||
|
From a purely functional perspective, the answer could be that logarithmic
|
||||||
|
compression and adaptation is a necessary preprocessing step towards a robust
|
||||||
|
feature representation, even if thresholding and temporal averaging alone would
|
||||||
|
be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely
|
||||||
|
improves the temporal regularity of the song pattern in $\adapt(t)$ and
|
||||||
|
$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a
|
||||||
|
song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between
|
||||||
|
the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which
|
||||||
|
is essential for the generation of consistent species-specific $f_i(t)$ under a
|
||||||
|
static $\thr$. From a physiological perspective, the answer is likely that
|
||||||
|
neurons possess only a limited firing rate for encoding stimulus intensities
|
||||||
|
that can range over several orders of magnitude. Sigmoidal tuning curves over
|
||||||
|
logarithmically compressed stimulus intensities are a common property of
|
||||||
|
sensory neurons across various modalities~(SOURCE?), and neurons of the
|
||||||
|
grasshopper auditory system are no exception~(\bcite{suga1960peripheral};
|
||||||
|
\bcite{gollisch2002energy}).
|
||||||
|
|
||||||
\subsection{Implications for behavior in a natural acoustic environment}
|
\subsection{Implications for behavior in a natural acoustic environment}
|
||||||
|
|
||||||
% RIPPED FROM INTRODUCTION:
|
Most grasshoppers live in environments that are communally inhabited by
|
||||||
|
numerous individuals from multiple species. Their acoustic environment is
|
||||||
|
characterized by noise from various sources --- abiotic ones like wind and
|
||||||
|
water, but also the songs of both hetero- and conspecifics. This limits the SNR
|
||||||
|
that each individual can achieve for its own song, and hence the effectiveness
|
||||||
|
of the intensity-invariant processing in the auditory system. Producing higher
|
||||||
|
song intensities is not a viable solution to this problem, because these also
|
||||||
|
contribute to the overall noise floor. A possible behavioral solution could be
|
||||||
|
to produce songs in a "turn-taking" manner to avoid the temporal superposition
|
||||||
|
of multiple songs into overly intense signals. This would also prevent the
|
||||||
|
mutual distortion of the respective song pattern. Another solution could be to
|
||||||
|
spatially separate from other nearby grasshoppers to spread the potential noise
|
||||||
|
sources over a larger area. However, according to our analysis based on field
|
||||||
|
recordings as well as previous work on the topic~(\bcite{lang2000acoustic}),
|
||||||
|
reliable song recognition is limited to little more than 1\,m from the sender,
|
||||||
|
so that a grasshopper also cannot afford to stay too far away from its
|
||||||
|
conspecifics. A better solution may hence be to collectively produce songs at
|
||||||
|
lower-than-possible intensities, which would reduce the overall noise floor for
|
||||||
|
all nearby individuals. Importantly, the limitation of intensity invariance by
|
||||||
|
SNR likely applies to all grasshoppers regardless of species, so that the
|
||||||
|
behavioral strategies could be shared among the species that coexist in a given
|
||||||
|
habitat.
|
||||||
|
|
||||||
|
% Because the presumed restriction of song recognition
|
||||||
|
% by means of the noise floor applies to all grasshoppers in a certain area,
|
||||||
|
% these strategies may not be specific to some of the species at this location.
|
||||||
|
% Instead, they must be shared by all grasshopper species that coexist within a
|
||||||
|
% portion of a given habitat, which would provide an important implication for
|
||||||
|
% the evolution of grasshopper songs in communities of multiple species.
|
||||||
|
|
||||||
|
%%% RELICS OF INTRODUCTION %%%
|
||||||
|
% - Nonlinear operations can be used to detach representations from graded physical
|
||||||
|
% stimulus (to fasciliate categorical behavioral decision-making?):\\
|
||||||
|
% 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
||||||
|
% $\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
||||||
|
% 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
||||||
|
% $\rightarrow$ More decorrelated representation, compared to prior stages\\
|
||||||
|
% 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
||||||
|
% $\rightarrow$ Trading a graded scale for two or more categorical states\\
|
||||||
|
% 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
||||||
|
% $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
||||||
|
% 5) Categorical behavioral decision-making requires further nonlinearities\\
|
||||||
|
% $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
||||||
|
% initiation of one behavior over another is categorical (e.g. approach/stay)
|
||||||
|
|
||||||
% Multi-species, multi-individual communally inhabited environments\\
|
% Multi-species, multi-individual communally inhabited environments\\
|
||||||
% - Temporal overlap: Simultaneous singing across individuals/species common\\
|
% - Temporal overlap: Simultaneous singing across individuals/species common\\
|
||||||
@@ -1802,53 +1859,12 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
|
|||||||
% recognize the ones produced by conspecifics, and make appropriate behavioral
|
% recognize the ones produced by conspecifics, and make appropriate behavioral
|
||||||
% decisions based on context (sender identity, song type, mate/rival quality)
|
% decisions based on context (sender identity, song type, mate/rival quality)
|
||||||
|
|
||||||
% How can the auditory system of grasshoppers meet these challenges?\\
|
|
||||||
% - What are the minimum functional processing steps required?\\
|
|
||||||
% - Which known neuronal mechanisms can implement these steps?\\
|
|
||||||
% - Which and how many stages along the auditory pathway contribute?\\
|
|
||||||
% $\rightarrow$ What are the limitations of the system as a whole?
|
|
||||||
|
|
||||||
% How can a human observer conceive a grasshopper's auditory percepts?\\
|
% How can a human observer conceive a grasshopper's auditory percepts?\\
|
||||||
% - How to investigate the workings of the auditory pathway as a whole?\\
|
% - How to investigate the workings of the auditory pathway as a whole?\\
|
||||||
% - How to systematically test effects and interactions of processing parameters?\\
|
% - How to systematically test effects and interactions of processing parameters?\\
|
||||||
% - How to integrate the available knowledge on anatomy, physiology, ethology?\\
|
% - How to integrate the available knowledge on anatomy, physiology, ethology?\\
|
||||||
% $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
|
% $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
|
||||||
|
|
||||||
\textbf{Differences between the model pathway and the previous framework:}
|
|
||||||
In the first step, a bank of parallel linear-nonlinear feature detectors is
|
|
||||||
applied to the input signal. Each feature detector consists of a convolutional
|
|
||||||
filter and a subsequent sigmoidal nonlinearity. The outputs of these feature
|
|
||||||
detectors are temporally averaged to obtain a single feature value per
|
|
||||||
detector, which is then assigned a specific weight. The linear combination of
|
|
||||||
weighted feature values results in a single preference value, that serves as
|
|
||||||
predictor for the behavioral response of the animal to the presented input
|
|
||||||
signal. Our model pathway adopts the general structure of the existing
|
|
||||||
framework but modifies it in several key aspects. The convolutional filters,
|
|
||||||
which have previously been fitted to behavioral data for each individual
|
|
||||||
species~(\bcite{clemens2013computational}), are replaced by a larger, generic
|
|
||||||
set of unfitted Gabor basis functions in order to cover a wide range of
|
|
||||||
possible song features across different species. Gabor functions approximate
|
|
||||||
the general structure of the filters used in the existing framework as well as
|
|
||||||
the filter functions found in various auditory neurons~(\bcite{rokem2006spike};
|
|
||||||
\bcite{clemens2011efficient}; \bcite{clemens2012nonlinear}). The fitted
|
|
||||||
sigmoidal nonlinearities in the existing framework consistently exhibited very
|
|
||||||
steep slopes and are therefore replaced by shifted Heaviside step-functions,
|
|
||||||
which results in a binarization of the feature detector outputs. Another, more
|
|
||||||
substantial modification is that the feature detector outputs are temporally
|
|
||||||
averaged in a way that does not condense them into single feature values but
|
|
||||||
retains their time-varying structure. This is in line with the fact that songs
|
|
||||||
are no discrete units but part of a continuous acoustic stream that the
|
|
||||||
auditory system has to process in real time. Moreover, a time-varying feature
|
|
||||||
representation only stabilizes after a certain delay following the onset of a
|
|
||||||
song, which emphasizes the temporal dynamics of evidence accumulation towards a
|
|
||||||
final categorical decision. The most notable difference between our model
|
|
||||||
pathway and the existing framework, however, lays in the addition of a
|
|
||||||
physiologically inspired preprocessing stage, whose starting point corresponds
|
|
||||||
to the initial reception of airborne sound waves. This allows the model to
|
|
||||||
operate on unmodified recordings of natural grasshopper songs instead of
|
|
||||||
condensed pulse train approximations, which widens its scope towards more
|
|
||||||
realistic, ecologically relevant scenarios.
|
|
||||||
|
|
||||||
\newpage
|
\newpage
|
||||||
\section{Appendix}
|
\section{Appendix}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user