674 lines
36 KiB
TeX
674 lines
36 KiB
TeX
\documentclass[a4paper, 12pt]{article}
|
|
|
|
\usepackage[left=2.5cm,right=2.5cm,top=2cm,bottom=2cm,includeheadfoot]{geometry}
|
|
\usepackage[onehalfspacing]{setspace}
|
|
\usepackage{graphicx}
|
|
\usepackage{svg}
|
|
\usepackage{import}
|
|
\usepackage{float}
|
|
\usepackage{placeins}
|
|
\usepackage{parskip}
|
|
\usepackage{amsmath}
|
|
\usepackage{amssymb}
|
|
\usepackage{subcaption}
|
|
\usepackage[labelfont=bf, textfont=small]{caption}
|
|
\usepackage[separate-uncertainty=true, locale=DE]{siunitx}
|
|
\sisetup{output-exponent-marker=\ensuremath{\mathrm{e}}}
|
|
% \usepackage[capitalize]{cleveref}
|
|
% \crefname{figure}{Fig.}{Figs.}
|
|
% \crefname{equation}{Eq.}{Eqs.}
|
|
% \creflabelformat{equation}{#2#1#3}
|
|
\usepackage[
|
|
backend=biber,
|
|
style=authoryear,
|
|
pluralothers=true,
|
|
maxcitenames=1,
|
|
mincitenames=1
|
|
]{biblatex}
|
|
\addbibresource{cite.bib}
|
|
|
|
\title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system}
|
|
\author{Jona Hartling, Jan Benda}
|
|
\date{}
|
|
|
|
\begin{document}
|
|
\maketitle{}
|
|
|
|
% Text references and citations:
|
|
\newcommand{\bcite}[1]{\mbox{\cite{#1}}}
|
|
% \newcommand{\fref}[1]{\mbox{\cref{#1}}}
|
|
% \newcommand{\fref}[1]{\mbox{Fig.\,\ref{#1}}}
|
|
% \newcommand{\eref}[1]{\mbox{\cref{#1}}}
|
|
% \newcommand{\eref}[1]{\mbox{Eq.\,\ref{#1}}}
|
|
|
|
% Math shorthands - Standard symbols:
|
|
\newcommand{\dec}{\log_{10}} % Logarithm base 10
|
|
\newcommand{\infint}{\int_{-\infty}^{+\infty}} % Indefinite integral
|
|
|
|
% Math shorthands - Spectral filtering:
|
|
\newcommand{\bp}{h_{\text{BP}}(t)} % Bandpass filter function
|
|
\newcommand{\lp}{h_{\text{LP}}(t)} % Lowpass filter function
|
|
\newcommand{\hp}{h_{\text{HP}}(t)} % Highpass filter function
|
|
\newcommand{\fc}{f_{\text{cut}}} % Filter cutoff frequency
|
|
\newcommand{\tlp}{T_{\text{LP}}} % Lowpass filter averaging interval
|
|
\newcommand{\thp}{T_{\text{HP}}} % Highpass filter adaptation interval
|
|
|
|
% Math shorthands - Early representations:
|
|
\newcommand{\raw}{x} % Placeholder input signal
|
|
\newcommand{\filt}{\raw_{\text{filt}}} % Bandpass-filtered signal
|
|
\newcommand{\env}{\raw_{\text{env}}} % Signal envelope
|
|
\newcommand{\db}{\raw_{\text{dB}}} % Logarithmically scaled signal
|
|
\newcommand{\dbref}{\raw_{\text{ref}}} % Decibel reference intensity
|
|
\newcommand{\adapt}{\raw_{\text{adapt}}} % Adapted signal
|
|
|
|
% Math shorthands - Kernel parameters:
|
|
\newcommand{\kw}{\sigma} % Unspecific Gabor kernel width
|
|
\newcommand{\kf}{\omega} % Unspecific Gabor kernel frequency
|
|
\newcommand{\kp}{\phi} % Unspecific Gabor kernel phase
|
|
\newcommand{\kn}{n} % Unspecific Gabor kernel lobe number
|
|
\newcommand{\ks}{s} % Unspecific Gabor kernel sign
|
|
\newcommand{\kwi}{\kw_i} % Specific Gabor kernel width
|
|
\newcommand{\kfi}{\kf_i} % Specific Gabor kernel frequency
|
|
\newcommand{\kpi}{\kp_i} % Specific Gabor kernel phase
|
|
\newcommand{\kni}{\kn_i} % Specific Gabor kernel lobe number
|
|
\newcommand{\ksi}{\ks_i} % Specific Gabor kernel sign
|
|
\newcommand{\rh}{\text{RH}} % Relative Gaussian height for FWRH
|
|
\newcommand{\fwrh}{\text{FWRH}} % Gaussian full-width at relative height
|
|
|
|
% Math shorthands - Threshold nonlinearity:
|
|
\newcommand{\thr}{\Theta_i} % Step function threshold value
|
|
\newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function
|
|
|
|
% Math shorthands - Minor symbols and helpers:
|
|
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song signal variance
|
|
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise signal variance
|
|
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
|
|
\newcommand{\pclp}{p(c_i,\,\tlp)} % Probability density (lowpass interval)
|
|
|
|
\section{Exploring a grasshopper's sensory world}
|
|
|
|
% Why functional models of sensory systems?
|
|
Our scientific understanding of sensory processing systems results from the
|
|
distributed accumulation of anatomical, physiological and ethological evidence.
|
|
This process is undoubtedly without alternative; however, it leaves us with the
|
|
challenge of integrating the available fragments into a coherent whole in order
|
|
to address issues such as the interaction between individual system components,
|
|
the functional limitations of the system overall, or taxonomic comparisons
|
|
between systems that process the same sensory modality. Any unified framework
|
|
that captures the essential functional aspects of a given sensory system thus
|
|
has the potential to deepen our current understanding and fasciliate systematic
|
|
investigations. However, building such a framework is a challenging task. It
|
|
requires a wealth of existing knowledge of the system and the signals it
|
|
operates on, a clearly defined scope, and careful reduction, abstraction, and
|
|
formalization of the underlying structures and mechanisms.
|
|
|
|
% Why the grasshopper auditory system?
|
|
% Why focus on song recognition among other auditory functions?
|
|
One sensory system about which extensive information has been gathered over the
|
|
years is the auditory system of grasshoppers~(\textit{Acrididae}). Grasshoppers
|
|
rely on their sense of hearing primarily for intraspecific communication, which
|
|
includes mate attraction~(\bcite{helversen1972gesang}) and
|
|
evaluation~(\bcite{stange2012grasshopper}), sender
|
|
localization~(\bcite{helversen1988interaural}), courtship
|
|
display~(\bcite{elsner1968neuromuskularen}), rival
|
|
deterrence~(\bcite{greenfield1993acoustic}), and loss-of-signal predator
|
|
alarm~(SOURCE). In accordance with this rich behavioral repertoire,
|
|
grasshoppers have evolved a variety of sound production mechanisms to generate
|
|
acoustic communication signals for different contexts and ranges using their
|
|
wings, hindlegs, or mandibles~(\bcite{otte1970comparative}). Among the most
|
|
conspicuous acoustic signals of grasshoppers are their species-specific calling
|
|
songs, which broadcast the presence of the singing individual --- mostly the
|
|
males of the species --- to potential mates within range. These songs are
|
|
usually more characteristic of a species than morphological
|
|
traits~(\bcite{tishechkin2016acoustic}; \bcite{tarasova2021eurasius}), which
|
|
can vary greatly within species~(\bcite{rowell1972variable};
|
|
\bcite{kohler2017morphological}). The reliance on songs to mediate reproduction
|
|
represents a strong evolutionary driving force, that resulted in a massive
|
|
species diversification~(\bcite{vedenina2011speciation};
|
|
\bcite{sevastianov2023evolution}), with over 6800 recognized grasshopper
|
|
species in the \textit{Acrididae} family~(\bcite{cigliano2024orthoptera}). It
|
|
is this diversity of species, and the crucial role of acoustic communication in
|
|
its emergence, that makes the grasshopper auditory system an intriguing
|
|
candidate for attempting to construct a functional model framework. As a
|
|
necessary reduction, the model we propose here focuses on the pathway
|
|
responsible for the recognition of species-specific calling songs, disregarding
|
|
other essential auditory functions such as directional
|
|
hearing~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes};
|
|
\bcite{helversen1988interaural}).
|
|
|
|
% What are the signals the auditory system is supposed to recognize?
|
|
% Why is intensity invariance important for song recognition?
|
|
% (Obviously, split this paragraph)
|
|
To understand the functional challenges faced by the grasshopper auditory
|
|
system, one has to understand the properties of the songs it is designed to
|
|
recognize. Grasshopper songs are amplitude-modulated broad-band acoustic
|
|
signals. Most songs are produced by stridulation, during which the animal pulls
|
|
the serrated stridulatory file on its hindlegs across a resonating vein on the
|
|
forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song};
|
|
\bcite{helversen1997recognition}). Every tooth that strikes the vein generates
|
|
a brief pulse of sound. Multiple pulses make up a syllable; and the alternation
|
|
of syllables and relatively quiet pauses forms a characteristic, through noisy,
|
|
waveform pattern. Song recognition depends on certain temporal and structural
|
|
parameters of this pattern, such as the duration of syllables and
|
|
pauses~(\bcite{helversen1972gesang}), the slope of pulse
|
|
onsets~(\bcite{helversen1993absolute}), and the accentuation of syllable onsets
|
|
relative to the preceeding pause~(\bcite{balakrishnan2001song};
|
|
\bcite{helversen2004acoustic}). The amplitude modulation of the song is
|
|
sufficient for recognition~(\bcite{helversen1997recognition}). However, the
|
|
essential recognition cues can vary considerably with external physical
|
|
factors, which requires the auditory system to be invariant to such variations
|
|
in order to reliably recognize songs under different conditions. For instance,
|
|
the temporal structure of grasshopper songs warps with
|
|
temperature~(\bcite{skovmand1983song}). The auditory system can compensate for
|
|
this variability by reading out relative temporal relationships rather than
|
|
absolute time intervals~(\bcite{creutzig2009timescale};
|
|
\bcite{creutzig2010timescale}), as those remain relatively constant across
|
|
different temperatures~(\bcite{helversen1972gesang}). Another, perhaps even
|
|
more fundamental external source of song variability lays in the attenuation of
|
|
sound intensity with increasing distance to the sender. Sound attenuation
|
|
depends on both the frequency content of the signal and the vegetation of the
|
|
habitat~(\bcite{michelsen1978sound}). For the receiving auditory system, this
|
|
has two major implications. First, the amplitude dynamics of the song pattern
|
|
are steadily degraded over distance, which limits the effective communication
|
|
range of grasshoppers to~\mbox{1\,-\,2\,m} in their typical grassland
|
|
habitats~(\bcite{lang2000acoustic}). Second, the overall intensity level of
|
|
songs at the receiver's position varies depending on the location of the
|
|
sender, which should ideally not affect the recognition of the song pattern.
|
|
This neccessitates that the auditory system achieves a certain degree of
|
|
intensity invariance --- a time scale-selective sensitivity to faster amplitude
|
|
dynamics and simultaneous insensitivity to slower, more sustained amplitude
|
|
dynamics. Intensity invariance in different auditory systems is often
|
|
associated with neuronal adaptation~(\bcite{benda2008spike};
|
|
\bcite{barbour2011intensity}; \bcite{ozeri2018fast}; more
|
|
general:~\bcite{benda2021neural}). In the grasshopper auditory system, a number
|
|
of neuron types along the processing chain exhibit spike-frequency adaptation
|
|
in response to sustained stimulus
|
|
intensities~(\bcite{romer1976informationsverarbeitung};
|
|
\bcite{gollisch2004input}; \bcite{hildebrandt2009origin};
|
|
\bcite{clemens2010intensity}; \bcite{fisch2012channel}) and thus likely
|
|
contribute to the emergence of intensity-invariant song representations. This
|
|
means that intensity invariance is not the result of a single processing step
|
|
but rather a gradual process, in which different neuronal populations
|
|
contribute to varying degrees~(\bcite{clemens2010intensity}) and by different
|
|
mechanisms~(\bcite{hildebrandt2009origin}). Approximating this process within a
|
|
functional model framework thus requires a considerable amount of
|
|
simplification. In this work, we demonstrate that even a small number of basic
|
|
physiologically inspired signal transformations --- specifically, pairs of
|
|
nonlinear and linear operations --- is sufficient to achieve a meaningful
|
|
degree of intensity invariance.
|
|
|
|
% How can song recognition be modelled functionally (feat. Jan Clemens & Co.)?
|
|
% How did we expand on the previous framework?
|
|
% (Still can't stand some of this paragraph's structure and wording...)
|
|
Invariance to non-informative song variations is crucial for reliable song
|
|
recognition; however, it is not sufficient to this end. In order to recognize a
|
|
conspecific song as such, the auditory system needs to extract sufficiently
|
|
informative features of the song pattern and then integrate the gathered
|
|
information into a final categorical percept. Previous authors have proposed a
|
|
functional model framework that describes this process --- feature extraction,
|
|
evidence accumulation, and categorical decision making --- in both
|
|
crickets~(\bcite{clemens2013computational}; \bcite{hennig2014time}) and
|
|
grasshoppers~(\bcite{clemens2013feature}; review on
|
|
both:~\bcite{ronacher2015computational}). Their framework provides a
|
|
comprehensible and biologically plausible account of the computational
|
|
mechanisms required for species-specific song recognition, which has served as
|
|
the inspiration for the development of the model pathway we propose here. The
|
|
existing framework relies on pulse trains as input signals, which were designed
|
|
to capture the essential structural properties of natural song
|
|
envelopes~(\bcite{clemens2013feature}). In the first step, a bank of parallel
|
|
linear-nonlinear feature detectors is applied to the input signal. Each feature
|
|
detector consists of a convolutional filter and a subsequent sigmoidal
|
|
nonlinearity. The outputs of these feature detectors are temporally averaged to
|
|
obtain a single feature value per detector, which is then assigned a specific
|
|
weight. The linear combination of weighted feature values results in a single
|
|
preference value, that serves as predictor for the behavioral response of the
|
|
animal to the presented input signal. Our model pathway adopts the general
|
|
structure of the existing framework but modifies it in several key aspects. The
|
|
convolutional filters, which have previously been fitted to behavioral data for
|
|
each individual species~(\bcite{clemens2013computational}), are replaced by a
|
|
larger, generic set of unfitted Gabor basis functions in order to cover a wide
|
|
range of possible song features across different species. Gabor functions
|
|
approximate the general structure of the filters used in the existing framework
|
|
as well as the filter functions found in various auditory
|
|
neurons~(\bcite{rokem2006spike}; \bcite{clemens2011efficient};
|
|
\bcite{clemens2012nonlinear}). The fitted sigmoidal nonlinearities in the
|
|
existing framework consistently exhibited very steep slopes and are therefore
|
|
replaced by shifted Heaviside step-functions, which results in a binarization
|
|
of the feature detector outputs. Another, more substantial modification is that
|
|
the feature detector outputs are temporally averaged in a way that does not
|
|
condense them into single feature values but retains their time-varying
|
|
structure. This is in line with the fact that songs are no discrete units but
|
|
part of a continuous acoustic stream that the auditory system has to process in
|
|
real time. Moreover, a time-varying feature representation only stabilizes
|
|
after a certain delay following the onset of a song, which emphasizes the
|
|
temporal dynamics of evidence accumulation towards a final categorical
|
|
decision. The most notable difference between our model pathway and the
|
|
existing framework, however, lays in the addition of a physiologically inspired
|
|
preprocessing stage, whose starting point corresponds to the initial reception
|
|
of airborne sound waves. This allows the model to operate on unmodified
|
|
recordings of natural grasshopper songs instead of condensed pulse train
|
|
approximations, which widens its scope towards more realistic, ecologically
|
|
relevant scenarios. For instance, we were able to investigate the contribution
|
|
of different processing stages to the emergence of intensity-invariant song
|
|
representations based on actual field recordings of songs at different
|
|
distances from the sender.
|
|
% Forgive me, it's friday.
|
|
In the following, we outline the structure of the proposed model of the
|
|
grasshopper auditory pathway, from the initial reception of sound waves up to
|
|
the generation of a high-dimensional, time-varying feature representation that
|
|
is suitable for species-specific song recognition. We provide a side-by-side
|
|
account of the known physiological processing steps and their functional
|
|
approximation by basic mathematical operations. We then elaborate on two key
|
|
mechanisms that drive the emergence of intensity-invariant song representations
|
|
within the auditory pathway.
|
|
|
|
% SCRAPPED UNTIL FURTHER NOTICE:
|
|
% Multi-species, multi-individual communally inhabited environments\\
|
|
% - Temporal overlap: Simultaneous singing across individuals/species common\\
|
|
% - Frequency overlap: Little speciation into frequency bands (likely unused)\\
|
|
% - "Biotic noise": Hetero-/conspecifics ("Another one's songs are my noise")\\
|
|
% - "Abiotic noise": Wind, water, vegetation, anthropogenic\\
|
|
% - Effects of habitat structure on sound propagation (landscape - soundscape)\\
|
|
% $\rightarrow$ Sensory constraints imposed by the (acoustic) environment
|
|
|
|
% Cluster of auditory challenges (interlocking constraints $\rightarrow$ tight coupling):\\
|
|
% From continuous acoustic input, generate neuronal representations that...\\
|
|
% 1)...allow for the separation of relevant (song) events from ambient noise floor\\
|
|
% 2)...compensate for behaviorally non-informative song variability (invariances)\\
|
|
% 3)...carry sufficient information to characterize different song patterns,
|
|
% recognize the ones produced by conspecifics, and make appropriate behavioral
|
|
% decisions based on context (sender identity, song type, mate/rival quality)
|
|
|
|
% How can the auditory system of grasshoppers meet these challenges?\\
|
|
% - What are the minimum functional processing steps required?\\
|
|
% - Which known neuronal mechanisms can implement these steps?\\
|
|
% - Which and how many stages along the auditory pathway contribute?\\
|
|
% $\rightarrow$ What are the limitations of the system as a whole?
|
|
|
|
% How can a human observer conceive a grasshopper's auditory percepts?\\
|
|
% - How to investigate the workings of the auditory pathway as a whole?\\
|
|
% - How to systematically test effects and interactions of processing parameters?\\
|
|
% - How to integrate the available knowledge on anatomy, physiology, ethology?\\
|
|
% $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
|
|
|
|
\section{Developing a functional model of the\\grasshopper song recognition pathway}
|
|
|
|
% Too long (no splitting, only pruning).
|
|
The essence of constructing a functional model of a given system is to gain a
|
|
sufficient understanding of the system's essential structural components and
|
|
their presumed functional roles; and to then build a formal framework of
|
|
manageable complexity around these two aspects. Anatomically, the organization
|
|
of the grasshopper song recognition pathway can be outlined as a feed-forward
|
|
network of three consecutive neuronal
|
|
populations~(Fig.\,\mbox{\ref{fig:pathway}a-c}): Peripheral auditory receptor
|
|
neurons, whose axons enter the ventral nerve cord at the level of the
|
|
metathoracic ganglion; local interneurons that remain exclusively within the
|
|
thoracic region of the ventral nerve cord; and ascending neurons projecting
|
|
from the thoracic region towards the supraesophageal
|
|
ganglion~(\bcite{rehbein1974structure}; \bcite{rehbein1976auditory};
|
|
\bcite{eichendorf1980projections}). The input to the network originates at the
|
|
tympanal membrane, which acts as acoustic receiver and is coupled to the
|
|
dendritic endings of the receptor neurons~(\bcite{gray1960fine}). The outputs
|
|
from the network converge in the supraesophageal ganglion, which is presumed to
|
|
harbor the neuronal substrate for conspecific song recognition and response
|
|
initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
|
|
\bcite{bhavsar2017brain}). Functionally, the ascending neurons are the most
|
|
diverse of the three populations along the pathway. Individual ascending
|
|
neurons possess highly specific response properties that contrast with the
|
|
rather homogeneous response properties of the preceding receptor neurons and
|
|
local interneurons~(\bcite{clemens2011efficient}), indicating a transition from
|
|
a uniform population-wide processing stream into several parallel branches.
|
|
Based on these anatomical and physiological considerations, the overall
|
|
structure of the model pathway is divided into two distinct
|
|
stages~(Fig.\,\ref{fig:pathway}d). The preprocessing stage incorporates the
|
|
known physiological processing steps at the levels of the tympanal membrane,
|
|
the receptor neurons, and the local interneurons; and operates on
|
|
one-dimensional signal representations. The feature extraction stage
|
|
corresponds to the processing within the ascending neurons and further
|
|
downstream towards the supraesophageal ganglion; and operates on
|
|
high-dimensional signal representations. The details of each physiological
|
|
processing step and its functional approximation within the two stages are
|
|
outlined in the following sections.
|
|
|
|
\begin{figure}[!ht]
|
|
\centering
|
|
\def\svgwidth{\textwidth}
|
|
\import{figures/}{fig_auditory_pathway.pdf_tex}
|
|
\caption{\textbf{Schematic organisation of the song recognition pathway in
|
|
grasshoppers compared to the structure of the functional
|
|
model pathway.}
|
|
\textbf{a}:~Simplified course of the pathway in the
|
|
grasshopper, from the tympanal membrane over receptor
|
|
neurons, local interneurons, and ascending neurons further
|
|
towards the supraesophageal ganglion.
|
|
\textbf{b}:~Schematic of synaptic connections between
|
|
the three neuronal populations within the metathoracic
|
|
ganglion.
|
|
\textbf{c}:~Network representation of neuronal connectivity.
|
|
\textbf{d}:~Flow diagram of the different signal
|
|
representations and transformations along the model
|
|
pathway. All representations are time-varying. 1st half:
|
|
Preprocessing stage (one-dimensional). 2nd half: Feature
|
|
extraction stage (high-dimensional).
|
|
}
|
|
\label{fig:pathway}
|
|
\end{figure}
|
|
|
|
\subsection{Population-driven signal preprocessing}
|
|
|
|
Grasshoppers receive airborne sound waves by a tympanal organ at either side of
|
|
the body. The tympanal membrane acts as a mechanical resonance filter for
|
|
sound-induced vibrations~(\bcite{windmill2008time}; \bcite{malkin2014energy}).
|
|
Vibrations that fall within specific frequency bands are focused on different
|
|
membrane areas, while others are attenuated. This processing step can be
|
|
approximated by an initial bandpass filter
|
|
\begin{equation}
|
|
\filt(t)\,=\,\raw(t)\,*\,\bp, \qquad \fc\,=\,5\,\text{kHz},\,30\,\text{kHz}
|
|
\label{eq:bandpass}
|
|
\end{equation}
|
|
applied to the acoustic input signal $\raw(t)$. The auditory receptor neurons
|
|
transduce the vibrations of the tympanal membrane into sequences of action
|
|
potentials. Thereby, they encode the amplitude modulation, or envelope, of the
|
|
signal~(\bcite{machens2001discrimination}), which likely involves a rectifying
|
|
nonlinearity~(\bcite{machens2001representation}). This can be modelled as
|
|
full-wave rectification followed by lowpass filtering
|
|
\begin{equation}
|
|
\env(t)\,=\,|\filt(t)|\,*\,\lp, \qquad \fc\,=\,500\,\text{Hz}
|
|
\label{eq:env}
|
|
\end{equation}
|
|
of the tympanal signal $\filt(t)$. Furthermore, the receptors exhibit a
|
|
sigmoidal response curve over logarithmically compressed intensity
|
|
levels~(\bcite{suga1960peripheral}; \bcite{gollisch2002energy}). In the model
|
|
pathway, logarithmic compression is achieved by conversion to decibel scale
|
|
\begin{equation}
|
|
\db(t)\,=\,10\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,\max[\env(t)]
|
|
\label{eq:log}
|
|
\end{equation}
|
|
relative to the maximum intensity $\dbref$ of the signal envelope $\env(t)$.
|
|
Both the receptor neurons~(\bcite{romer1976informationsverarbeitung};
|
|
\bcite{gollisch2004input}; \bcite{fisch2012channel}) and, on a larger scale,
|
|
the subsequent local interneurons~(\bcite{hildebrandt2009origin};
|
|
\bcite{clemens2010intensity}) adapt their firing rates in response to sustained
|
|
stimulus intensity levels, which allows for the robust encoding of faster
|
|
amplitude modulations against a slowly changing overall baseline intensity.
|
|
Functionally, the adaptation mechanism resembles a highpass filter
|
|
\begin{equation}
|
|
\adapt(t)\,=\,\db(t)\,*\,\hp, \qquad \fc\,=\,10\,\text{Hz}
|
|
\label{eq:highpass}
|
|
\end{equation}
|
|
over the logarithmically scaled envelope $\db(t)$. This processing step
|
|
concludes the preprocessing stage of the model pathway. The resulting
|
|
intensity-adapted envelope $\adapt(t)$ is then passed on from the local
|
|
interneurons to the ascending neurons, where it serves as the basis for the
|
|
following feature extraction stage.
|
|
|
|
\subsection{Feature extraction by individual neurons}
|
|
|
|
The ascending neurons extract and encode a number of different features of the
|
|
preprocessed signal. As a population, they hence represent the signal in a
|
|
higher-dimensional space than the preceding receptor neurons and local
|
|
interneurons. Each ascending neuron is assumed to scan the signal for a
|
|
specific template pattern, which can be thought of as a kernel of a particular
|
|
structure and on a particular time scale. This process, known as template
|
|
matching, can be modelled as a convolution
|
|
\begin{equation}
|
|
c_i(t)\,=\,\adapt(t)\,*\,k_i(t)
|
|
= \infint \adapt(\tau)\,\cdot\,k_i(t\,-\,\tau)\,d\tau
|
|
\label{eq:conv}
|
|
\end{equation}
|
|
of the intensity-adapted envelope $\adapt(t)$ with a kernel $k_i(t)$ per
|
|
ascending neuron. We use Gabor kernels as basis functions for creating
|
|
different template patterns. An arbitrary one-dimensional, real Gabor kernel is
|
|
generated by multiplication of a Gaussian envelope and a sinusoidal carrier
|
|
\begin{equation}
|
|
k_i(t,\,\kwi,\,\kfi,\,\kpi)\,=\,e^{-\frac{t^{2}}{2{\kwi}^{2}}}\,\cdot\,\sin(\kfi\,t\,+\,\kpi), \qquad \kfi\,=\,2\pi f_{sin}
|
|
\label{eq:gabor}
|
|
\end{equation}
|
|
with Gaussian standard deviation or kernel width $\kwi$, carrier frequency
|
|
$\kfi$, and carrier phase $\kpi$. Different combinations of $\kw$, $\kf$, and
|
|
$\kp$ result in Gabor kernels with different lobe number $\kn$, which is the
|
|
number of half-periods of the carrier that fit under the Gaussian envelope
|
|
within reasonable limits of attenuation. These limits are a matter of
|
|
definition, since the Gaussian function never fully decays to zero. A good
|
|
measure is the Gaussian full-width at relative height, which can be calculated
|
|
as
|
|
\begin{equation}
|
|
\fwrh(\kw,\,\rh)\,=\,2\,\cdot\,\sqrt{-2\,\ln \rh}\cdot\,\kw, \qquad \rh\,\in\,(0,\,1]
|
|
\end{equation}
|
|
With this, an appropriate carrier frequency $\kf$ for obtaining a Gabor kernel
|
|
with width $\kw$ and a desired lobe number $\kn$ can be approximated as
|
|
\begin{equation}
|
|
\kf(\kn,\,\fwrh)\,=\,\frac{0.5\,\cdot\,\kn\,+\,0.5}{\fwrh}
|
|
\end{equation}
|
|
We restrict the Gabor kernels to be either even functions~(mirror-symmetric,
|
|
uneven $\kn$) or odd functions~(point-symmetric, even $\kn$). Under this
|
|
condition, phase $\kp$ is related to lobe number $\kn$ by
|
|
\begin{equation}
|
|
\kp(\kn,\,\ks)\,=\,0.5\,\cdot\,(1\,-\,\text{mod}[\kn,\,2]\,+\,\ks)
|
|
\label{eq:gabor_phase}
|
|
\end{equation}
|
|
which results in the specific phase values shown in
|
|
Table\,\mbox{\ref{tab:gabor_phase}}.
|
|
\FloatBarrier
|
|
\begin{table}[!ht]
|
|
\centering
|
|
\captionsetup{width=.55\textwidth}
|
|
\caption{}
|
|
\begin{tabular}{|ccc|}
|
|
\hline
|
|
sign $\ks$ & even $\kn$ & odd $\kn$\\
|
|
\hline
|
|
+1 & $+\pi\,/\,2$ & $\pi$\\
|
|
-1 & $-\pi\,/\,2$ & $0$\\
|
|
\hline
|
|
\end{tabular}
|
|
\label{tab:gabor_phase}
|
|
\end{table}
|
|
\FloatBarrier
|
|
|
|
|
|
\textbf{Stage-specific processing steps and functional approximations:}
|
|
|
|
Thresholding nonlinearity in ascending neurons (or further downstream)\\
|
|
- Binarization of AN response traces into "relevant" vs. "irrelevant"\\
|
|
$\rightarrow$ Shifted Heaviside step-function $\nl$ (or steep sigmoid threshold?)
|
|
%
|
|
\begin{equation}
|
|
b_i(t,\,\thr)\,=\,\begin{cases}
|
|
\;1, \quad c_i(t)\,>\,\thr\\
|
|
\;0, \quad c_i(t)\,\leq\,\thr
|
|
\end{cases}
|
|
\label{eq:binary}
|
|
\end{equation}
|
|
%
|
|
Temporal averaging by neurons of the central brain\\
|
|
- Finalized set of slowly changing kernel-specific features (one per AN)\\
|
|
- Different species-specific song patterns are characterized by a distinct combination
|
|
of feature values $\rightarrow$ Clusters in high-dimensional feature space\\
|
|
$\rightarrow$ Lowpass filter 1 Hz
|
|
%
|
|
\begin{equation}
|
|
f_i(t)\,=\,b_i(t)\,*\,\lp, \qquad \fc\,=\,1\,\text{Hz}
|
|
\label{eq:lowpass}
|
|
\end{equation}
|
|
%
|
|
\section{Two mechanisms driving the emergence of intensity-invariant song representation}
|
|
|
|
\textbf{Definition of invariance (general, systemic):}\\
|
|
Invariance = Property of a system to maintain a stable output with respect to a
|
|
set of relevant input parameters (variation to be represented) but irrespective
|
|
of one or more other parameters (variation to be discarded)
|
|
$\rightarrow$ Selective input-output decorrelation
|
|
|
|
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
|
|
Intensity invariance = Time scale-selective sensitivity to certain faster
|
|
amplitude dynamics (song waveform, small-scale AM) and simultaneous
|
|
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
|
|
large-scale AM, current overall intensity level)\\
|
|
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
|
|
output will be a flat line
|
|
|
|
\subsection{Logarithmic scaling \& spike-frequency adaptation}
|
|
|
|
Envelope $\env(t)$ $\xrightarrow{\text{dB}}$ Logarithmic $\db(t)$ $\xrightarrow{\hp}$ Adapted $\adapt(t)$
|
|
|
|
- Rewrite signal envelope $\env(t)$ (Eq.\,\ref{eq:env}) as a synthetic mixture:\\
|
|
1) Song signal $s(t)$ ($\svar=1$) with variable multiplicative scale $\alpha\geq0$\\
|
|
2) Fixed-scale additive noise $\eta(t)$ ($\nvar=1$)
|
|
%
|
|
\begin{equation}
|
|
\env(t)\,=\,\alpha\,\cdot\,s(t)\,+\,\eta(t),\qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
|
|
\label{eq:toy_env}
|
|
\end{equation}
|
|
%
|
|
- Signal-to-noise ratio (SNR): Ratio of variances of synthetic mixture
|
|
$\env(t)$ with ($\alpha>0$) and without ($\alpha=0$) song signal $s(t)$, assuming $s(t)\perp\eta(t)$
|
|
%
|
|
\begin{equation}
|
|
\text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1
|
|
\label{eq:toy_snr}
|
|
\end{equation}
|
|
%
|
|
\textbf{Logarithmic component:}\\
|
|
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
|
|
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
|
|
%
|
|
\begin{equation}
|
|
\begin{split}
|
|
\db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
|
|
&=\,\log \frac{\alpha}{\dbref}\,+\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
|
|
\end{split}
|
|
\label{eq:toy_log}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\
|
|
$\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\
|
|
$\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\
|
|
$\rightarrow$ Normalization by $\dbref$ applies equally to all terms (no individual effects)
|
|
|
|
\textbf{Adaptation component:}\\
|
|
- Highpass filter over $\db(t)$ (Eq.\,\ref{eq:highpass}) can
|
|
be approximated as subtraction of the local signal offset within a suitable time
|
|
interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$)
|
|
%
|
|
\begin{equation}
|
|
\begin{split}
|
|
\adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
|
|
\end{split}
|
|
\label{eq:toy_highpass}
|
|
\end{equation}
|
|
%
|
|
\textbf{Implication for intensity invariance:}\\
|
|
- Logarithmic scaling is essential for equalizing different song intensities\\
|
|
$\rightarrow$ Intensity information can be manipulated more easily when in form
|
|
of a signal offset in log-space than a multiplicative scale in linear space
|
|
|
|
- Scale $\alpha$ can only be redistributed, not entirely eliminated from $\adapt(t)$\\
|
|
$\rightarrow$ Turn initial scaling of song $s(t)$ by $\alpha$ into scaling of noise $\eta(t)$ by $\frac{1}{\alpha}$
|
|
|
|
- Capability to compensate for intensity variations, i.e. selective amplification
|
|
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
|
|
$\alpha\gg1$: Attenuation of $\eta(t)$ term $\rightarrow$ $s(t)$ dominates $\adapt(t)$\\
|
|
$\alpha\approx1$ Negligible effect on $\eta(t)$ term $\rightarrow$ $\adapt(t)=\log[s(t)+\eta(t)]$\\
|
|
$\alpha\ll1$: Amplification of $\eta(t)$ term $\rightarrow$ $\eta(t)$ dominates $\adapt(t)$\\
|
|
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
|
|
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
|
|
|
|
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
|
|
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
|
|
|
|
\subsection{Threshold nonlinearity \& temporal averaging}
|
|
|
|
Convolved $c_i(t)$ $\xrightarrow{\nl}$ Binary $b_i(t)$ $\xrightarrow{\lp}$ Feature $f_i(t)$
|
|
|
|
\textbf{Thresholding component:}\\
|
|
- Within an observed time interval $T$, $c_i(t)$ follows probability density $\pc$\\
|
|
- Within $T$, $c_i(t)$ exceeds threshold value $\thr$ for time $T_1$ ($T_1+T_0=T$)\\
|
|
- Threshold $\nl$ splits $\pc$ around $\thr$ in two complementary parts
|
|
%
|
|
\begin{equation}
|
|
\int_{\thr}^{+\infty} p(c_i,T)\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} p(c_i,T)\,dc_i\,=\,\frac{T_1}{T}
|
|
\label{eq:pdf_split}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ Semi-definite integral over right-sided portion of split $\pc$ gives ratio
|
|
of time $T_1$ where $c_i(t)>\thr$ to total time $T$ due to normalization of $\pc$
|
|
%
|
|
\begin{equation}
|
|
\infint \pc\,dc_i\,=\,1
|
|
\label{eq:pdf}
|
|
\end{equation}
|
|
%
|
|
\textbf{Averaging component:}\\
|
|
- Lowpass filter over binary response $b_i(t)$ (Eq.\,\ref{eq:lowpass}) can be
|
|
approximated as temporal averaging over a suitable time interval $\tlp$ ($\tlp > \frac{1}{\fc}$)\\
|
|
- Within $\tlp$, $b_i(t)$ takes a value of 1 ($c_i(t)>\thr$) for time $T_1$ ($T_1+T_0=\tlp$)
|
|
%
|
|
\begin{equation}
|
|
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}
|
|
\label{eq:feat_avg}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ Temporal averaging over $b_i(t)\in[0,1]$ (Eq.\,\ref{eq:binary}) gives
|
|
ratio of time $T_1$ where $c_i(t)>\thr$ to total averaging interval $\tlp$\\
|
|
$\rightarrow$ Feature $f_i(t)$ approximately represents supra-threshold fraction of $\tlp$
|
|
|
|
\textbf{Combined result:}\\
|
|
- Feature $f_i(t)$ can be linked to the distribution of $c_i(t)$ using Eqs.\,\ref{eq:pdf_split} \& \ref{eq:feat_avg}
|
|
%
|
|
\begin{equation}
|
|
f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
|
|
\label{eq:feat_prop}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ Because the integral over a probability density is a cumulative
|
|
probability, the value of feature $f_i(t)$ (temporal compression of $b_i(t)$)
|
|
at every time point $t$ signifies the probability that convolution output
|
|
$c_i(t)$ exceeds the threshold value $\thr$ during the corresponding averaging
|
|
interval $\tlp$
|
|
|
|
\textbf{Implication for intensity invariance:}\\
|
|
- Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
|
|
template waveform $k_i(t)$ and signal $\adapt(t)$ centered at time point $t$\\
|
|
$\rightarrow$ Based on amplitudes on a graded scale
|
|
|
|
- Feature $f_i(t)$ quantifies the probability that amplitudes of $c_i(t)$
|
|
exceed threshold value $\thr$ within interval $\tlp$ around time point $t$\\
|
|
$\rightarrow$ Based on binned amplitudes corresponding to one of two categorical states
|
|
$\rightarrow$ Deliberate loss of precise amplitude information\\
|
|
$\rightarrow$ Emphasis on temporal structure (ratio of $T_1$ over $\tlp$)
|
|
|
|
- Thresholding of $c_i(t)$ and subsequent temporal averaging of $b_i(t)$ to
|
|
obtain $f_i(t)$ constitutes a remapping of an amplitude-encoding quantity into a
|
|
duty cycle-encoding quantity, mediated by threshold function $\nl$
|
|
|
|
- Different scales of $c_i(t)$ can result in similar $T_1$ segments depending
|
|
on the magnitude of the derivative of $c_i(t)$ in temporal proximity to time
|
|
points at which $c_i(t)$ crosses threshold value $\thr$\\
|
|
$\rightarrow$ The steeper the slope of $c_i(t)$, the less $T_1$ changes with scale variations\\
|
|
$\rightarrow$ If $T_1$ is invariant to scale variation in $c_i(t)$, then so is $f_i(t)$
|
|
|
|
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
|
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
|
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
|
other criteria such as song-noise separation or diversity between features
|
|
|
|
- Nonlinear operations can be used to detach representations from graded physical
|
|
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
|
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
|
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
|
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
|
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
|
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
|
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
|
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
|
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
|
5) Categorical behavioral decision-making requires further nonlinearities\\
|
|
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
|
initiation of one behavior over another is categorical (e.g. approach/stay)
|
|
|
|
\section{Discriminating species-specific song\\patterns in feature space}
|
|
|
|
\section{Conclusions \& outlook}
|
|
|
|
\end{document} |