530 lines
25 KiB
TeX
530 lines
25 KiB
TeX
\documentclass[a4paper, 12pt]{article}
|
|
|
|
\usepackage[left=2.5cm,right=2.5cm,top=2cm,bottom=2cm,includeheadfoot]{geometry}
|
|
\usepackage[onehalfspacing]{setspace}
|
|
\usepackage{graphicx}
|
|
\usepackage{svg}
|
|
\usepackage{import}
|
|
\usepackage{float}
|
|
\usepackage{placeins}
|
|
\usepackage{parskip}
|
|
\usepackage{amsmath}
|
|
\usepackage{amssymb}
|
|
\usepackage[separate-uncertainty=true, locale=DE]{siunitx}
|
|
\sisetup{output-exponent-marker=\ensuremath{\mathrm{e}}}
|
|
% \usepackage[capitalize]{cleveref}
|
|
% \crefname{figure}{Fig.}{Figs.}
|
|
% \crefname{equation}{Eq.}{Eqs.}
|
|
% \creflabelformat{equation}{#2#1#3}
|
|
\usepackage[
|
|
backend=biber,
|
|
style=authoryear,
|
|
pluralothers=true,
|
|
maxcitenames=1,
|
|
mincitenames=1
|
|
]{biblatex}
|
|
\addbibresource{cite.bib}
|
|
|
|
\title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system}
|
|
\author{Jona Hartling, Jan Benda}
|
|
\date{}
|
|
|
|
\begin{document}
|
|
\maketitle{}
|
|
|
|
% Text references and citations:
|
|
\newcommand{\bcite}[1]{\mbox{\cite{#1}}}
|
|
% \newcommand{\fref}[1]{\mbox{\cref{#1}}}
|
|
% \newcommand{\fref}[1]{\mbox{Fig.\,\ref{#1}}}
|
|
% \newcommand{\eref}[1]{\mbox{\cref{#1}}}
|
|
% \newcommand{\eref}[1]{\mbox{Eq.\,\ref{#1}}}
|
|
|
|
% Math shorthands - Standard symbols:
|
|
\newcommand{\dec}{\log_{10}} % Logarithm base 10
|
|
\newcommand{\infint}{\int_{-\infty}^{+\infty}} % Indefinite integral
|
|
|
|
% Math shorthands - Spectral filtering:
|
|
\newcommand{\bp}{h_{\text{BP}}(t)} % Bandpass filter function
|
|
\newcommand{\lp}{h_{\text{LP}}(t)} % Lowpass filter function
|
|
\newcommand{\hp}{h_{\text{HP}}(t)} % Highpass filter function
|
|
\newcommand{\fc}{f_{\text{cut}}} % Filter cutoff frequency
|
|
\newcommand{\tlp}{T_{\text{LP}}} % Lowpass filter averaging interval
|
|
\newcommand{\thp}{T_{\text{HP}}} % Highpass filter adaptation interval
|
|
|
|
% Math shorthands - Early representations:
|
|
\newcommand{\raw}{x} % Placeholder input signal
|
|
\newcommand{\filt}{\raw_{\text{filt}}} % Bandpass-filtered signal
|
|
\newcommand{\env}{\raw_{\text{env}}} % Signal envelope
|
|
\newcommand{\db}{\raw_{\text{dB}}} % Logarithmically scaled signal
|
|
\newcommand{\dbref}{\raw_{\text{ref}}} % Decibel reference intensity
|
|
\newcommand{\adapt}{\raw_{\text{adapt}}} % Adapted signal
|
|
|
|
% Math shorthands - Kernel parameters:
|
|
\newcommand{\ks}{\sigma_i} % Gabor kernel width
|
|
\newcommand{\kf}{f_i} % Gabor kernel frequency
|
|
\newcommand{\kp}{\phi_i} % Gabor kernel phase
|
|
|
|
% Math shorthands - Threshold nonlinearity:
|
|
\newcommand{\thr}{\Theta_i} % Step function threshold value
|
|
\newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function
|
|
|
|
% Math shorthands - Minor symbols and helpers:
|
|
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song signal variance
|
|
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise signal variance
|
|
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
|
|
\newcommand{\pclp}{p(c_i,\,\tlp)} % Probability density (lowpass interval)
|
|
|
|
\section{Exploring a grasshopper's sensory world}
|
|
|
|
Our scientific understanding of sensory processing systems results from the
|
|
distributed accumulation of anatomical, physiological and ethological evidence.
|
|
This process is undoubtedly without alternative; however, it leaves us with the
|
|
challenge of integrating the available fragments into a coherent whole in order
|
|
to address issues such as the interaction between individual system components,
|
|
the functional limitations of the system overall, or taxonomic comparisons of
|
|
systems that process the same sensory modality. Any unified framework that
|
|
captures the essential functional aspects of a given sensory system thus has
|
|
the potential to deepen our current understanding and fasciliate systematic
|
|
investigations. However, building such a framework is a challenging task. It
|
|
requires a wealth of existing knowledge of the system and the signals it
|
|
operates on, a clearly defined scope, and careful reduction, abstraction, and
|
|
formalization of the underlying anatomical structures and physiological
|
|
mechanisms.
|
|
|
|
One sensory system about which extensive information has been gathered over the
|
|
years is the auditory system of grasshoppers~(\textit{Acrididae}). Grasshoppers
|
|
rely on auditory processing primarily for intraspecific communication, which
|
|
includes mate attraction and evaluation~(\bcite{helversen1972gesang}), sender
|
|
localization~(\bcite{helversen1988interaural}), courtship display~(SOURCE),
|
|
rival deterrence~(\bcite{greenfield1993acoustic}), and loss-of-signal predator
|
|
alarm~(SOURCE). The different behavioral contexts are met with
|
|
|
|
Different acustic signals are used for different behavioral
|
|
contexts and communication ranges
|
|
|
|
Depending on the behavioral context and the communication range,
|
|
|
|
|
|
Grasshoppers generate their most conspicious acoustic signals
|
|
---~commonly referred to as "songs"~--- by stridulation.
|
|
|
|
|
|
Different acoustic signals may be generated using different
|
|
body parts ---~wings, hindlegs, or mandibles~---
|
|
|
|
|
|
Different acoustic signals may be generated using different
|
|
body parts ---~wings, hindlegs, or mandibles~--- but the most conspicious
|
|
|
|
|
|
The required acoustic signals for different contexts and ranges
|
|
may be generated using different body parts ---~wings, hindlegs, or
|
|
mandibles~--- but the most common sound production mechanism is stridulation,
|
|
during which the animal pulls the serrated stridulatory file on its hindlegs
|
|
across a resonating vein on the forewings. The resulting "songs"
|
|
|
|
|
|
The reliance on acoustic communication signals represents a strong evolutionary
|
|
driving force, that resulted in a massive species diversification among
|
|
grasshoppers~(\bcite{vedenina2011speciation},
|
|
\bcite{sevastianov2023evolution}).
|
|
|
|
|
|
|
|
Grasshoppers produce their most conspicious acoustic signals
|
|
---~commonly referred to as "songs"~--- by stridulation, during which the
|
|
animal rubs the serrated stridulatory file on its hindleg across a resonating
|
|
vein on the forewing.
|
|
|
|
Among the several thousand recognized grasshopper
|
|
species~(\bcite{cigliano2018orthoptera}), diverse species-specific sound
|
|
repertoires and production mechanisms
|
|
|
|
|
|
|
|
Strong dependence on acoustic signals for ranged communication\\
|
|
- Diverse species-specific sound repertoires and production mechanisms\\
|
|
- Different contexts/ranges: Stridulatory, mandibular, wings, walking sounds\\
|
|
- Mate attraction/evaluation, rival deterrence, loss-of-signal predator alarm\\
|
|
$\rightarrow$ Elaborate acoustic behaviors co-depend on reliable auditory perception
|
|
|
|
Songs = Amplitude-modulated (AM) broad-band acoustic signals\\
|
|
- Generated by stridulatory movement of hindlegs against forewings\\
|
|
- Shorter time scales: Characteristic temporal waveform pattern\\
|
|
- Longer time scales: High degree of periodicity (pattern repetition)\\
|
|
- Sound propagation: Signal intensity varies strongly with distance to sender\\
|
|
- Ectothermy: Temporal structure warps with temperature\\
|
|
$\rightarrow$ Sensory constraints imposed by properties of the acoustic signal itself
|
|
|
|
Multi-species, multi-individual communally inhabited environments\\
|
|
- Temporal overlap: Simultaneous singing across individuals/species common\\
|
|
- Frequency overlap: No/hardly any niche speciation into frequency bands\\
|
|
- "Biotic noise": Hetero-/conspecifics ("Another one's songs are my noise")\\
|
|
- "Abiotic noise": Wind, water, vegetation, anthropogenic\\
|
|
- Effects of habitat structure on sound propagation (landscape - soundscape)\\
|
|
$\rightarrow$ Sensory constraints imposed by the (acoustic) environment
|
|
|
|
Cluster of auditory challenges (interlocking constraints $\rightarrow$ tight coupling):\\
|
|
From continuous acoustic input, generate neuronal representations that...\\
|
|
1)...allow for the separation of relevant (song) events from ambient noise floor\\
|
|
2)...compensate for behaviorally non-informative song variability (invariances)\\
|
|
3)...carry sufficient information to characterize different song patterns,
|
|
recognize the ones produced by conspecifics, and make appropriate behavioral
|
|
decisions based on context (sender identity, song type, mate/rival quality)
|
|
|
|
How can the auditory system of grasshoppers meet these challenges?\\
|
|
- What are the minimum functional processing steps required?\\
|
|
- Which known neuronal mechanisms can implement these steps?\\
|
|
- Which and how many stages along the auditory pathway contribute?\\
|
|
$\rightarrow$ What are the limitations of the system as a whole?
|
|
|
|
How can a human observer conceive a grasshopper's auditory percepts?\\
|
|
- How to investigate the workings of the auditory pathway as a whole?\\
|
|
- How to systematically test effects and interactions of processing parameters?\\
|
|
- How to integrate the available knowledge on anatomy, physiology, ethology?\\
|
|
$\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
|
|
|
|
\textbf{Precursor work for model construction (special thanks to authors):}
|
|
|
|
Linear-nonlinear modelling of behavioral responses to artificial songs\\
|
|
- Feature expansion as implemented in our model: Major contribution!\\
|
|
- Bank of linear filters, nonlinearity, temporal integration, feature weighting\\
|
|
$\rightarrow$ \cite{clemens2013computational} (crickets)\\
|
|
$\rightarrow$ \cite{clemens2013feature} (grasshoppers)\\
|
|
$\rightarrow$ \cite{ronacher2015computational}\\
|
|
\textbf{Own advancements/key differences}:\\
|
|
1) Used boxcar functions as artificial "songs" (focus on few key parameters)\\
|
|
$\rightarrow$ Now actual, variable songs (as naturalistic as possible)\\
|
|
2) Fitted filters to behavioral data\\
|
|
$\rightarrow$ More general, simpler, unfitted formalized Gabor filter bank
|
|
|
|
\section{Developing a functional model of\\the grasshopper auditory pathway}
|
|
|
|
% Either pick up in intro and/or discussion, or move entirely:
|
|
The grasshopper auditory system has been studied extensively over the past
|
|
decades; and a corresponding number of involved neuron types has been
|
|
described~(\bcite{rehbein1974structure}; \bcite{kalmring1975afferent};
|
|
\bcite{rehbein1976auditory}; \bcite{eichendorf1980projections}). The functional
|
|
model we propose here focuses on the pathway responsible for song recognition
|
|
and assumes a strict feed-forward organization of three consecutive neuronal
|
|
populations: Peripheral auditory receptor neurons~\mbox{(1st order)}, local
|
|
interneurons of the metathoracic ganglion~\mbox{(2nd order)}, and ascending
|
|
neurons~\mbox{(3rd order)} projecting towards the supraesophageal ganglion.
|
|
|
|
Previous authors have reported a marked increase in response heterogenity
|
|
within the population of ascending neurons compared to receptors and local
|
|
interneurons, which exhibit almost identical filter characteristics,
|
|
respectively~(\bcite{clemens2011efficient}). Based on these findings, the model
|
|
pathway can be divided into two distinct portions~(Fig.\,\ref{fig:pathway}c+d).
|
|
In the preprocessing portion, generated
|
|
|
|
The preprocessing portion comprises the tympanal membrane, receptors, and
|
|
local interneurons. The different signal representations
|
|
|
|
Due to the similar response properties within the involved
|
|
|
|
|
|
1) "Pre-split portion" of the auditory pathway:\\
|
|
Tympanal membrane $\rightarrow$ Receptor neurons $\rightarrow$ Local interneurons
|
|
|
|
Similar response/filter properties within receptor/interneuron populations (\cite{clemens2011efficient})\\
|
|
$\rightarrow$ One population-wide response trace per stage (no "single-cell resolution")
|
|
|
|
2) "Post-split portion" of the auditory pathway:\\
|
|
Ascending neurons (AN) $\rightarrow$ Central brain neurons
|
|
|
|
Diverse response/filter properties within AN population (\cite{clemens2011efficient})\\
|
|
- Pathway splitting into several parallel branches\\
|
|
- Expansion into a decorrelated higher-dimensional sound representation\\
|
|
$\rightarrow$ Individual neuron-specific response traces from this stage onwards
|
|
|
|
\begin{figure}[!ht]
|
|
\centering
|
|
\def\svgwidth{\textwidth}
|
|
\import{figures/}{fig_auditory_pathway.pdf_tex}
|
|
\caption[Grasshopper auditory system]{\textbf{The auditory system of
|
|
grasshoppers.}}
|
|
\label{fig:pathway}
|
|
\end{figure}
|
|
\FloatBarrier
|
|
|
|
\subsection{Population-driven signal pre-processing}
|
|
|
|
Grasshoppers receive airborne sound waves by a tympanal organ at each side of
|
|
the thorax~(Fig.\,\ref{fig:pathway}a). The tympanal membrane acts as a
|
|
mechanical resonance filter: Vibrations that fall within specific frequency
|
|
bands are focused on different membrane areas, while others are
|
|
attenuated~(\bcite{michelsen1971frequency}; \bcite{windmill2008time};
|
|
\bcite{malkin2014energy}). This processing step can be approximated by an
|
|
initial bandpass filter
|
|
\begin{equation}
|
|
\filt(t)\,=\,\raw(t)\,*\,\bp, \qquad \fc\,=\,5\,\text{kHz},\,30\,\text{kHz}
|
|
\label{eq:bandpass}
|
|
\end{equation}
|
|
applied to the acoustic input signal $\raw(t)$. The auditory receptor neurons
|
|
connect directly to the tympanal membrane~(Fig.\,\ref{fig:pathway}a). Besides
|
|
performing the mechano-electrical transduction, the receptor population is
|
|
substrate to several known processing steps. First, the receptors extract the
|
|
signal envelope~(\bcite{machens2001discrimination}), which likely involves a
|
|
rectifying nonlinearity~(\bcite{machens2001representation}). This can be
|
|
modelled as full-wave rectification followed by lowpass filtering
|
|
\begin{equation}
|
|
\env(t)\,=\,|\filt(t)|\,*\,\lp, \qquad \fc\,=\,500\,\text{Hz}
|
|
\label{eq:env}
|
|
\end{equation}
|
|
of the tympanal signal $\filt(t)$. Furthermore, the receptors exhibit a
|
|
sigmoidal response curve over logarithmically compressed intensity
|
|
levels~(\bcite{suga1960peripheral}; \bcite{gollisch2002energy}). In the model,
|
|
logarithmic compression is achieved by conversion to decibel scale
|
|
\begin{equation}
|
|
\db(t)\,=\,10\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,\max[\env(t)]
|
|
\label{eq:log}
|
|
\end{equation}
|
|
relative to the maximum intensity $\dbref$ of the signal envelope $\env(t)$.
|
|
Next, the axons of the receptor neurons project into the metathoracic ganglion,
|
|
where they synapse onto local interneurons~(Fig.\,\ref{fig:pathway}b). Both the
|
|
local interneurons~(\bcite{hildebrandt2009origin};
|
|
\bcite{clemens2010intensity}) and, to a lesser extent, the receptors
|
|
themselves~(\bcite{fisch2012channel}) display spike-frequency adaptation in
|
|
response to sustained stimulus intensity levels. This mechanism allows for the
|
|
robust encoding of faster amplitude modulations against a slowly changing
|
|
overall baseline intensity. Functionally, this processing step resembles a
|
|
highpass filter
|
|
\begin{equation}
|
|
\adapt(t)\,=\,\db(t)\,*\,\hp, \qquad \fc\,=\,10\,\text{Hz}
|
|
\label{eq:highpass}
|
|
\end{equation}
|
|
over the logarithmically scaled envelope $\db(t)$. The projections of the local
|
|
interneurons remain within the metathoracic ganglion and synapse onto a small
|
|
number of ascending neurons~(Fig.\,\ref{fig:pathway}b), which marks the
|
|
transition between the preprocessing stream and the parallel processing stream
|
|
of the model pathway.
|
|
|
|
\subsection{Feature extraction by individual neurons}
|
|
|
|
The small population of ascending neurons
|
|
|
|
|
|
|
|
\textbf{Stage-specific processing steps and functional approximations:}
|
|
|
|
Template matching by individual ANs\\
|
|
- Filter base (STA approximations): Set of Gabor kernels\\
|
|
- Gabor parameters: $\ks, \kp, \kf$ $\rightarrow$ Determines kernel sign and lobe number
|
|
%
|
|
\begin{equation}
|
|
k_i(t,\,\ks,\,\kf,\,\kp)\,=\,e^{-\frac{t^{2}}{2{\ks}^{2}}}\,\cdot\,\sin(2\pi\kf\,\cdot\,t\,+\,\phi_i)
|
|
\label{eq:gabor}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ Separate convolution with each member of the kernel set
|
|
%
|
|
\begin{equation}
|
|
c_i(t)\,=\,\adapt(t)\,*\,k_i(t)
|
|
= \infint \adapt(\tau)\,\cdot\,k_i(t\,-\,\tau)\,d\tau
|
|
\label{eq:conv}
|
|
\end{equation}
|
|
%
|
|
Thresholding nonlinearity in ascending neurons (or further downstream)\\
|
|
- Binarization of AN response traces into "relevant" vs. "irrelevant"\\
|
|
$\rightarrow$ Shifted Heaviside step-function $\nl$ (or steep sigmoid threshold?)
|
|
%
|
|
\begin{equation}
|
|
b_i(t,\,\thr)\,=\,\begin{cases}
|
|
\;1, \quad c_i(t)\,>\,\thr\\
|
|
\;0, \quad c_i(t)\,\leq\,\thr
|
|
\end{cases}
|
|
\label{eq:binary}
|
|
\end{equation}
|
|
%
|
|
Temporal averaging by neurons of the central brain\\
|
|
- Finalized set of slowly changing kernel-specific features (one per AN)\\
|
|
- Different species-specific song patterns are characterized by a distinct combination
|
|
of feature values $\rightarrow$ Clusters in high-dimensional feature space\\
|
|
$\rightarrow$ Lowpass filter 1 Hz
|
|
%
|
|
\begin{equation}
|
|
f_i(t)\,=\,b_i(t)\,*\,\lp, \qquad \fc\,=\,1\,\text{Hz}
|
|
\label{eq:lowpass}
|
|
\end{equation}
|
|
%
|
|
\section{Two mechanisms driving the emergence of intensity-invariant song representation}
|
|
|
|
\textbf{Definition of invariance (general, systemic):}\\
|
|
Invariance = Property of a system to maintain a stable output with respect to a
|
|
set of relevant input parameters (variation to be represented) but irrespective
|
|
of one or more other parameters (variation to be discarded)
|
|
$\rightarrow$ Selective input-output decorrelation
|
|
|
|
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
|
|
Intensity invariance = Time scale-selective sensitivity to certain faster
|
|
amplitude dynamics (song waveform, small-scale AM) and simultaneous
|
|
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
|
|
large-scale AM, current overall intensity level)\\
|
|
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
|
|
output will be a flat line
|
|
|
|
\subsection{Logarithmic scaling \& spike-frequency adaptation}
|
|
|
|
Envelope $\env(t)$ $\xrightarrow{\text{dB}}$ Logarithmic $\db(t)$ $\xrightarrow{\hp}$ Adapted $\adapt(t)$
|
|
|
|
- Rewrite signal envelope $\env(t)$ (Eq.\,\ref{eq:env}) as a synthetic mixture:\\
|
|
1) Song signal $s(t)$ ($\svar=1$) with variable multiplicative scale $\alpha\geq0$\\
|
|
2) Fixed-scale additive noise $\eta(t)$ ($\nvar=1$)
|
|
%
|
|
\begin{equation}
|
|
\env(t)\,=\,\alpha\,\cdot\,s(t)\,+\,\eta(t),\qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
|
|
\label{eq:toy_env}
|
|
\end{equation}
|
|
%
|
|
- Signal-to-noise ratio (SNR): Ratio of variances of synthetic mixture
|
|
$\env(t)$ with ($\alpha>0$) and without ($\alpha=0$) song signal $s(t)$, assuming $s(t)\perp\eta(t)$
|
|
%
|
|
\begin{equation}
|
|
\text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1
|
|
\label{eq:toy_snr}
|
|
\end{equation}
|
|
%
|
|
\textbf{Logarithmic component:}\\
|
|
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
|
|
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
|
|
%
|
|
\begin{equation}
|
|
\begin{split}
|
|
\db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
|
|
&=\,\log \frac{\alpha}{\dbref}\,+\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
|
|
\end{split}
|
|
\label{eq:toy_log}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\
|
|
$\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\
|
|
$\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\
|
|
$\rightarrow$ Normalization by $\dbref$ applies equally to all terms (no individual effects)
|
|
|
|
\textbf{Adaptation component:}\\
|
|
- Highpass filter over $\db(t)$ (Eq.\,\ref{eq:highpass}) can
|
|
be approximated as subtraction of the local signal offset within a suitable time
|
|
interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$)
|
|
%
|
|
\begin{equation}
|
|
\begin{split}
|
|
\adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
|
|
\end{split}
|
|
\label{eq:toy_highpass}
|
|
\end{equation}
|
|
%
|
|
\textbf{Implication for intensity invariance:}\\
|
|
- Logarithmic scaling is essential for equalizing different song intensities\\
|
|
$\rightarrow$ Intensity information can be manipulated more easily when in form
|
|
of a signal offset in log-space than a multiplicative scale in linear space
|
|
|
|
- Scale $\alpha$ can only be redistributed, not entirely eliminated from $\adapt(t)$\\
|
|
$\rightarrow$ Turn initial scaling of song $s(t)$ by $\alpha$ into scaling of noise $\eta(t)$ by $\frac{1}{\alpha}$
|
|
|
|
- Capability to compensate for intensity variations, i.e. selective amplification
|
|
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
|
|
$\alpha\gg1$: Attenuation of $\eta(t)$ term $\rightarrow$ $s(t)$ dominates $\adapt(t)$\\
|
|
$\alpha\approx1$ Negligible effect on $\eta(t)$ term $\rightarrow$ $\adapt(t)=\log[s(t)+\eta(t)]$\\
|
|
$\alpha\ll1$: Amplification of $\eta(t)$ term $\rightarrow$ $\eta(t)$ dominates $\adapt(t)$\\
|
|
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
|
|
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
|
|
|
|
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
|
|
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
|
|
|
|
\subsection{Threshold nonlinearity \& temporal averaging}
|
|
|
|
Convolved $c_i(t)$ $\xrightarrow{\nl}$ Binary $b_i(t)$ $\xrightarrow{\lp}$ Feature $f_i(t)$
|
|
|
|
\textbf{Thresholding component:}\\
|
|
- Within an observed time interval $T$, $c_i(t)$ follows probability density $\pc$\\
|
|
- Within $T$, $c_i(t)$ exceeds threshold value $\thr$ for time $T_1$ ($T_1+T_0=T$)\\
|
|
- Threshold $\nl$ splits $\pc$ around $\thr$ in two complementary parts
|
|
%
|
|
\begin{equation}
|
|
\int_{\thr}^{+\infty} p(c_i,T)\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} p(c_i,T)\,dc_i\,=\,\frac{T_1}{T}
|
|
\label{eq:pdf_split}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ Semi-definite integral over right-sided portion of split $\pc$ gives ratio
|
|
of time $T_1$ where $c_i(t)>\thr$ to total time $T$ due to normalization of $\pc$
|
|
%
|
|
\begin{equation}
|
|
\infint \pc\,dc_i\,=\,1
|
|
\label{eq:pdf}
|
|
\end{equation}
|
|
%
|
|
\textbf{Averaging component:}\\
|
|
- Lowpass filter over binary response $b_i(t)$ (Eq.\,\ref{eq:lowpass}) can be
|
|
approximated as temporal averaging over a suitable time interval $\tlp$ ($\tlp > \frac{1}{\fc}$)\\
|
|
- Within $\tlp$, $b_i(t)$ takes a value of 1 ($c_i(t)>\thr$) for time $T_1$ ($T_1+T_0=\tlp$)
|
|
%
|
|
\begin{equation}
|
|
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}
|
|
\label{eq:feat_avg}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ Temporal averaging over $b_i(t)\in[0,1]$ (Eq.\,\ref{eq:binary}) gives
|
|
ratio of time $T_1$ where $c_i(t)>\thr$ to total averaging interval $\tlp$\\
|
|
$\rightarrow$ Feature $f_i(t)$ approximately represents supra-threshold fraction of $\tlp$
|
|
|
|
\textbf{Combined result:}\\
|
|
- Feature $f_i(t)$ can be linked to the distribution of $c_i(t)$ using Eqs.\,\ref{eq:pdf_split} \& \ref{eq:feat_avg}
|
|
%
|
|
\begin{equation}
|
|
f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
|
|
\label{eq:feat_prop}
|
|
\end{equation}
|
|
%
|
|
$\rightarrow$ Because the integral over a probability density is a cumulative
|
|
probability, the value of feature $f_i(t)$ (temporal compression of $b_i(t)$)
|
|
at every time point $t$ signifies the probability that convolution output
|
|
$c_i(t)$ exceeds the threshold value $\thr$ during the corresponding averaging
|
|
interval $\tlp$
|
|
|
|
\textbf{Implication for intensity invariance:}\\
|
|
- Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
|
|
template waveform $k_i(t)$ and signal $\adapt(t)$ centered at time point $t$\\
|
|
$\rightarrow$ Based on amplitudes on a graded scale
|
|
|
|
- Feature $f_i(t)$ quantifies the probability that amplitudes of $c_i(t)$
|
|
exceed threshold value $\thr$ within interval $\tlp$ around time point $t$\\
|
|
$\rightarrow$ Based on binned amplitudes corresponding to one of two categorical states
|
|
$\rightarrow$ Deliberate loss of precise amplitude information\\
|
|
$\rightarrow$ Emphasis on temporal structure (ratio of $T_1$ over $\tlp$)
|
|
|
|
- Thresholding of $c_i(t)$ and subsequent temporal averaging of $b_i(t)$ to
|
|
obtain $f_i(t)$ constitutes a remapping of an amplitude-encoding quantity into a
|
|
duty cycle-encoding quantity, mediated by threshold function $\nl$
|
|
|
|
- Different scales of $c_i(t)$ can result in similar $T_1$ segments depending
|
|
on the magnitude of the derivative of $c_i(t)$ in temporal proximity to time
|
|
points at which $c_i(t)$ crosses threshold value $\thr$\\
|
|
$\rightarrow$ The steeper the slope of $c_i(t)$, the less $T_1$ changes with scale variations\\
|
|
$\rightarrow$ If $T_1$ is invariant to scale variation in $c_i(t)$, then so is $f_i(t)$
|
|
|
|
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
|
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
|
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
|
other criteria such as song-noise separation or diversity between features
|
|
|
|
- Nonlinear operations can be used to detach representations from graded physical
|
|
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
|
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
|
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
|
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
|
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
|
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
|
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
|
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
|
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
|
5) Categorical behavioral decision-making requires further nonlinearities\\
|
|
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
|
initiation of one behavior over another is categorical (e.g. approach/stay)
|
|
|
|
\section{Discriminating species-specific song\\patterns in feature space}
|
|
|
|
\section{Conclusions \& outlook}
|
|
|
|
\end{document} |