paper_2025/main.tex

\documentclass[a4paper, 12pt]{article}

\usepackage[left=2.5cm,right=2.5cm,top=2cm,bottom=2cm,includeheadfoot]{geometry}
\usepackage[onehalfspacing]{setspace}
\usepackage{graphicx}
\usepackage{svg}
\usepackage{import}
\usepackage{float}
\usepackage{placeins}
\usepackage{parskip}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage[separate-uncertainty=true, locale=DE]{siunitx}
\sisetup{output-exponent-marker=\ensuremath{\mathrm{e}}}
% \usepackage[capitalize]{cleveref}
% \crefname{figure}{Fig.}{Figs.}
% \crefname{equation}{Eq.}{Eqs.}
% \creflabelformat{equation}{#2#1#3}
\usepackage[
    backend=biber,
    style=authoryear,
    pluralothers=true,
    maxcitenames=1,
    mincitenames=1
    ]{biblatex}
\addbibresource{cite.bib}

\title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system}
\author{Jona Hartling, Jan Benda}
\date{}

\begin{document}
\maketitle{}

% Text references and citations:
\newcommand{\bcite}[1]{\mbox{\cite{#1}}}
% \newcommand{\fref}[1]{\mbox{\cref{#1}}}
% \newcommand{\fref}[1]{\mbox{Fig.\,\ref{#1}}}
% \newcommand{\eref}[1]{\mbox{\cref{#1}}}
% \newcommand{\eref}[1]{\mbox{Eq.\,\ref{#1}}}

% Math shorthands - Standard symbols:
\newcommand{\dec}{\log_{10}} % Logarithm base 10
\newcommand{\infint}{\int_{-\infty}^{+\infty}} % Indefinite integral

% Math shorthands - Spectral filtering:
\newcommand{\bp}{h_{\text{BP}}(t)} % Bandpass filter function
\newcommand{\lp}{h_{\text{LP}}(t)} % Lowpass filter function
\newcommand{\hp}{h_{\text{HP}}(t)} % Highpass filter function
\newcommand{\fc}{f_{\text{cut}}} % Filter cutoff frequency
\newcommand{\tlp}{T_{\text{LP}}} % Lowpass filter averaging interval
\newcommand{\thp}{T_{\text{HP}}} % Highpass filter adaptation interval

% Math shorthands - Early representations:
\newcommand{\raw}{x} % Placeholder input signal
\newcommand{\filt}{\raw_{\text{filt}}} % Bandpass-filtered signal
\newcommand{\env}{\raw_{\text{env}}} % Signal envelope
\newcommand{\db}{\raw_{\text{dB}}} % Logarithmically scaled signal
\newcommand{\dbref}{\raw_{\text{ref}}} % Decibel reference intensity
\newcommand{\adapt}{\raw_{\text{adapt}}} % Adapted signal

% Math shorthands - Kernel parameters:
\newcommand{\ks}{\sigma_i} % Gabor kernel width
\newcommand{\kf}{f_i} % Gabor kernel frequency
\newcommand{\kp}{\phi_i} % Gabor kernel phase

% Math shorthands - Threshold nonlinearity:
\newcommand{\thr}{\Theta_i} % Step function threshold value
\newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function

% Math shorthands - Minor symbols and helpers:
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song signal variance
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise signal variance
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
\newcommand{\pclp}{p(c_i,\,\tlp)} % Probability density (lowpass interval)

\section{Exploring a grasshopper's sensory world}

Our scientific understanding of sensory processing systems results from the
distributed accumulation of anatomical, physiological and ethological evidence.
This process is undoubtedly without alternative; however, it leaves us with the
challenge of integrating the available fragments into a coherent whole in order
to address issues such as the interaction between individual system components,
the functional limitations of the system overall, or taxonomic comparisons of
systems that process the same sensory modality. Any unified framework that
captures the essential functional aspects of a given sensory system thus has
the potential to deepen our current understanding and fasciliate systematic
investigations. However, building such a framework is a challenging task. It
requires a wealth of existing knowledge of the system and the signals it
operates on, a clearly defined scope, and careful reduction, abstraction, and
formalization of the underlying anatomical structures and physiological
mechanisms.

One sensory system about which extensive information has been gathered over the
years is the auditory system of grasshoppers~(\textit{Acrididae}). Grasshoppers
rely on auditory processing primarily for intraspecific communication, which
includes mate attraction and evaluation~(\bcite{helversen1972gesang}), sender
localization~(\bcite{helversen1988interaural}), courtship display~(SOURCE),
rival deterrence~(\bcite{greenfield1993acoustic}), and loss-of-signal predator
alarm~(SOURCE). The different behavioral contexts are met with

Different acustic signals are used for different behavioral
contexts and communication ranges

Depending on the behavioral context and the communication range,


Grasshoppers generate their most conspicious acoustic signals
---~commonly referred to as "songs"~--- by stridulation.


Different acoustic signals may be generated using different
body parts ---~wings, hindlegs, or mandibles~---


Different acoustic signals may be generated using different
body parts ---~wings, hindlegs, or mandibles~--- but the most conspicious


The required acoustic signals for different contexts and ranges
may be generated using different body parts ---~wings, hindlegs, or
mandibles~--- but the most common sound production mechanism is stridulation,
during which the animal pulls the serrated stridulatory file on its hindlegs
across a resonating vein on the forewings. The resulting "songs"


The reliance on acoustic communication signals represents a strong evolutionary
driving force, that resulted in a massive species diversification among
grasshoppers~(\bcite{vedenina2011speciation},
\bcite{sevastianov2023evolution}).


Grasshoppers produce their most conspicious acoustic signals
---~commonly referred to as "songs"~--- by stridulation, during which the
animal rubs the serrated stridulatory file on its hindleg across a resonating
vein on the forewing.

Among the several thousand recognized grasshopper
species~(\bcite{cigliano2018orthoptera}), diverse species-specific sound
repertoires and production mechanisms


Strong dependence on acoustic signals for ranged communication\\
- Diverse species-specific sound repertoires and production mechanisms\\
- Different contexts/ranges: Stridulatory, mandibular, wings, walking sounds\\
- Mate attraction/evaluation, rival deterrence, loss-of-signal predator alarm\\
$\rightarrow$ Elaborate acoustic behaviors co-depend on reliable auditory perception

Songs = Amplitude-modulated (AM) broad-band acoustic signals\\
- Generated by stridulatory movement of hindlegs against forewings\\
- Shorter time scales: Characteristic temporal waveform pattern\\
- Longer time scales: High degree of periodicity (pattern repetition)\\
- Sound propagation: Signal intensity varies strongly with distance to sender\\
- Ectothermy: Temporal structure warps with temperature\\
$\rightarrow$ Sensory constraints imposed by properties of the acoustic signal itself

Multi-species, multi-individual communally inhabited environments\\
- Temporal overlap: Simultaneous singing across individuals/species common\\
- Frequency overlap: No/hardly any niche speciation into frequency bands\\
- "Biotic noise": Hetero-/conspecifics ("Another one's songs are my noise")\\
- "Abiotic noise": Wind, water, vegetation, anthropogenic\\
- Effects of habitat structure on sound propagation (landscape - soundscape)\\
$\rightarrow$ Sensory constraints imposed by the (acoustic) environment

Cluster of auditory challenges (interlocking constraints $\rightarrow$ tight coupling):\\
From continuous acoustic input, generate neuronal representations that...\\
1)...allow for the separation of relevant (song) events from ambient noise floor\\
2)...compensate for behaviorally non-informative song variability (invariances)\\
3)...carry sufficient information to characterize different song patterns,
recognize the ones produced by conspecifics, and make appropriate behavioral
decisions based on context (sender identity, song type, mate/rival quality)

How can the auditory system of grasshoppers meet these challenges?\\
- What are the minimum functional processing steps required?\\
- Which known neuronal mechanisms can implement these steps?\\
- Which and how many stages along the auditory pathway contribute?\\
$\rightarrow$ What are the limitations of the system as a whole?

How can a human observer conceive a grasshopper's auditory percepts?\\
- How to investigate the workings of the auditory pathway as a whole?\\
- How to systematically test effects and interactions of processing parameters?\\
- How to integrate the available knowledge on anatomy, physiology, ethology?\\
$\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework

\textbf{Precursor work for model construction (special thanks to authors):}

Linear-nonlinear modelling of behavioral responses to artificial songs\\
- Feature expansion as implemented in our model: Major contribution!\\
- Bank of linear filters, nonlinearity, temporal integration, feature weighting\\
$\rightarrow$ \cite{clemens2013computational} (crickets)\\
$\rightarrow$ \cite{clemens2013feature} (grasshoppers)\\
$\rightarrow$ \cite{ronacher2015computational}\\
\textbf{Own advancements/key differences}:\\
1) Used boxcar functions as artificial "songs" (focus on few key parameters)\\
$\rightarrow$ Now actual, variable songs (as naturalistic as possible)\\
2) Fitted filters to behavioral data\\
$\rightarrow$ More general, simpler, unfitted formalized Gabor filter bank

\section{Developing a functional model of\\the grasshopper auditory pathway}

% Either pick up in intro and/or discussion, or move entirely:
The grasshopper auditory system has been studied extensively over the past
decades; and a corresponding number of involved neuron types has been
described~(\bcite{rehbein1974structure}; \bcite{kalmring1975afferent};
\bcite{rehbein1976auditory}; \bcite{eichendorf1980projections}). The functional
model we propose here focuses on the pathway responsible for song recognition
and assumes a strict feed-forward organization of three consecutive neuronal
populations: Peripheral auditory receptor neurons~\mbox{(1st order)}, local
interneurons of the metathoracic ganglion~\mbox{(2nd order)}, and ascending
neurons~\mbox{(3rd order)} projecting towards the supraesophageal ganglion.

Previous authors have reported a marked increase in response heterogenity
within the population of ascending neurons compared to receptors and local
interneurons, which exhibit almost identical filter characteristics,
respectively~(\bcite{clemens2011efficient}). Based on these findings, the model
pathway can be divided into two distinct portions~(Fig.\,\ref{fig:pathway}c+d).
In the preprocessing portion, generated

The preprocessing portion comprises the tympanal membrane, receptors, and
local interneurons. The different signal representations

Due to the similar response properties within the involved


1) "Pre-split portion" of the auditory pathway:\\
Tympanal membrane $\rightarrow$ Receptor neurons $\rightarrow$ Local interneurons

Similar response/filter properties within receptor/interneuron populations (\cite{clemens2011efficient})\\
$\rightarrow$ One population-wide response trace per stage (no "single-cell resolution")

2) "Post-split portion" of the auditory pathway:\\
Ascending neurons (AN) $\rightarrow$ Central brain neurons

Diverse response/filter properties within AN population (\cite{clemens2011efficient})\\
- Pathway splitting into several parallel branches\\
- Expansion into a decorrelated higher-dimensional sound representation\\
$\rightarrow$ Individual neuron-specific response traces from this stage onwards

\begin{figure}[!ht]
    \centering
    \def\svgwidth{\textwidth}
    \import{figures/}{fig_auditory_pathway.pdf_tex}
    \caption[Grasshopper auditory system]{\textbf{The auditory system of
    grasshoppers.}}
    \label{fig:pathway}
\end{figure}
\FloatBarrier

\subsection{Population-driven signal pre-processing}

Grasshoppers receive airborne sound waves by a tympanal organ at each side of
the thorax~(Fig.\,\ref{fig:pathway}a). The tympanal membrane acts as a
mechanical resonance filter: Vibrations that fall within specific frequency
bands are focused on different membrane areas, while others are
attenuated~(\bcite{michelsen1971frequency}; \bcite{windmill2008time};
\bcite{malkin2014energy}). This processing step can be approximated by an
initial bandpass filter
\begin{equation}
    \filt(t)\,=\,\raw(t)\,*\,\bp, \qquad \fc\,=\,5\,\text{kHz},\,30\,\text{kHz}
    \label{eq:bandpass}
\end{equation}
applied to the acoustic input signal $\raw(t)$. The auditory receptor neurons
connect directly to the tympanal membrane~(Fig.\,\ref{fig:pathway}a). Besides
performing the mechano-electrical transduction, the receptor population is
substrate to several known processing steps. First, the receptors extract the
signal envelope~(\bcite{machens2001discrimination}), which likely involves a
rectifying nonlinearity~(\bcite{machens2001representation}). This can be
modelled as full-wave rectification followed by lowpass filtering
\begin{equation}
    \env(t)\,=\,|\filt(t)|\,*\,\lp, \qquad \fc\,=\,500\,\text{Hz}
    \label{eq:env}
\end{equation}
of the tympanal signal $\filt(t)$. Furthermore, the receptors exhibit a
sigmoidal response curve over logarithmically compressed intensity
levels~(\bcite{suga1960peripheral}; \bcite{gollisch2002energy}). In the model,
logarithmic compression is achieved by conversion to decibel scale
\begin{equation}
    \db(t)\,=\,10\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,\max[\env(t)]
    \label{eq:log}
\end{equation}
relative to the maximum intensity $\dbref$ of the signal envelope $\env(t)$.
Next, the axons of the receptor neurons project into the metathoracic ganglion,
where they synapse onto local interneurons~(Fig.\,\ref{fig:pathway}b). Both the
local interneurons~(\bcite{hildebrandt2009origin};
\bcite{clemens2010intensity}) and, to a lesser extent, the receptors
themselves~(\bcite{fisch2012channel}) display spike-frequency adaptation in
response to sustained stimulus intensity levels. This mechanism allows for the
robust encoding of faster amplitude modulations against a slowly changing
overall baseline intensity. Functionally, this processing step resembles a
highpass filter
\begin{equation}
    \adapt(t)\,=\,\db(t)\,*\,\hp, \qquad \fc\,=\,10\,\text{Hz}
    \label{eq:highpass}
\end{equation}
over the logarithmically scaled envelope $\db(t)$. The projections of the local
interneurons remain within the metathoracic ganglion and synapse onto a small
number of ascending neurons~(Fig.\,\ref{fig:pathway}b), which marks the
transition between the preprocessing stream and the parallel processing stream
of the model pathway.

\subsection{Feature extraction by individual neurons}

The small population of ascending neurons


\textbf{Stage-specific processing steps and functional approximations:}

Template matching by individual ANs\\
- Filter base (STA approximations): Set of Gabor kernels\\
- Gabor parameters: $\ks, \kp, \kf$ $\rightarrow$ Determines kernel sign and lobe number
%
\begin{equation}
    k_i(t,\,\ks,\,\kf,\,\kp)\,=\,e^{-\frac{t^{2}}{2{\ks}^{2}}}\,\cdot\,\sin(2\pi\kf\,\cdot\,t\,+\,\phi_i)
    \label{eq:gabor}
\end{equation}
%
$\rightarrow$ Separate convolution with each member of the kernel set
%
\begin{equation}
    c_i(t)\,=\,\adapt(t)\,*\,k_i(t)
    = \infint \adapt(\tau)\,\cdot\,k_i(t\,-\,\tau)\,d\tau
    \label{eq:conv}
\end{equation}
%
Thresholding nonlinearity in ascending neurons (or further downstream)\\
- Binarization of AN response traces into "relevant" vs. "irrelevant"\\
$\rightarrow$ Shifted Heaviside step-function $\nl$ (or steep sigmoid threshold?)
%
\begin{equation}
    b_i(t,\,\thr)\,=\,\begin{cases}
        \;1, \quad c_i(t)\,>\,\thr\\
        \;0, \quad c_i(t)\,\leq\,\thr
    \end{cases}
    \label{eq:binary}
\end{equation}
%
Temporal averaging by neurons of the central brain\\
- Finalized set of slowly changing kernel-specific features (one per AN)\\
- Different species-specific song patterns are characterized by a distinct combination
of feature values $\rightarrow$ Clusters in high-dimensional feature space\\
$\rightarrow$ Lowpass filter 1 Hz
%
\begin{equation}
    f_i(t)\,=\,b_i(t)\,*\,\lp, \qquad \fc\,=\,1\,\text{Hz}
    \label{eq:lowpass}
\end{equation}
%
\section{Two mechanisms driving the emergence of intensity-invariant song representation}

\textbf{Definition of invariance (general, systemic):}\\
Invariance = Property of a system to maintain a stable output with respect to a
set of relevant input parameters (variation to be represented) but irrespective
of one or more other parameters (variation to be discarded)
$\rightarrow$ Selective input-output decorrelation

\textbf{Definition of intensity invariance (context of neurons and songs):}\\
Intensity invariance = Time scale-selective sensitivity to certain faster
amplitude dynamics (song waveform, small-scale AM) and simultaneous
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
large-scale AM, current overall intensity level)\\
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
output will be a flat line

\subsection{Logarithmic scaling \& spike-frequency adaptation}

Envelope $\env(t)$ $\xrightarrow{\text{dB}}$ Logarithmic $\db(t)$ $\xrightarrow{\hp}$ Adapted $\adapt(t)$

- Rewrite signal envelope $\env(t)$ (Eq.\,\ref{eq:env}) as a synthetic mixture:\\
1) Song signal $s(t)$ ($\svar=1$) with variable multiplicative scale $\alpha\geq0$\\
2) Fixed-scale additive noise $\eta(t)$ ($\nvar=1$)
%
\begin{equation}
    \env(t)\,=\,\alpha\,\cdot\,s(t)\,+\,\eta(t),\qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
    \label{eq:toy_env}
\end{equation}
%
- Signal-to-noise ratio (SNR): Ratio of variances of synthetic mixture
$\env(t)$ with ($\alpha>0$) and without ($\alpha=0$) song signal $s(t)$, assuming $s(t)\perp\eta(t)$
%
\begin{equation}
    \text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1
    \label{eq:toy_snr}
\end{equation}
%
\textbf{Logarithmic component:}\\
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
%
\begin{equation}
    \begin{split}
        \db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
        &=\,\log \frac{\alpha}{\dbref}\,+\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
    \end{split}
    \label{eq:toy_log}
\end{equation}
%
$\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\
$\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\
$\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\
$\rightarrow$ Normalization by $\dbref$ applies equally to all terms (no individual effects)

\textbf{Adaptation component:}\\
- Highpass filter over $\db(t)$ (Eq.\,\ref{eq:highpass}) can
be approximated as subtraction of the local signal offset within a suitable time
interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$)
%
\begin{equation}
    \begin{split}
    \adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
    \end{split}
    \label{eq:toy_highpass}
\end{equation}
%
\textbf{Implication for intensity invariance:}\\
- Logarithmic scaling is essential for equalizing different song intensities\\
$\rightarrow$ Intensity information can be manipulated more easily when in form
of a signal offset in log-space than a multiplicative scale in linear space

- Scale $\alpha$ can only be redistributed, not entirely eliminated from $\adapt(t)$\\
$\rightarrow$ Turn initial scaling of song $s(t)$ by $\alpha$ into scaling of noise $\eta(t)$ by $\frac{1}{\alpha}$

- Capability to compensate for intensity variations, i.e. selective amplification
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
$\alpha\gg1$: Attenuation of $\eta(t)$ term $\rightarrow$ $s(t)$ dominates $\adapt(t)$\\
$\alpha\approx1$ Negligible effect on $\eta(t)$ term $\rightarrow$ $\adapt(t)=\log[s(t)+\eta(t)]$\\
$\alpha\ll1$: Amplification of $\eta(t)$ term $\rightarrow$ $\eta(t)$ dominates $\adapt(t)$\\
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$

- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR

\subsection{Threshold nonlinearity \& temporal averaging}

Convolved $c_i(t)$ $\xrightarrow{\nl}$ Binary $b_i(t)$ $\xrightarrow{\lp}$ Feature $f_i(t)$

\textbf{Thresholding component:}\\
- Within an observed time interval $T$, $c_i(t)$ follows probability density $\pc$\\
- Within $T$, $c_i(t)$ exceeds threshold value $\thr$ for time $T_1$ ($T_1+T_0=T$)\\
- Threshold $\nl$ splits $\pc$ around $\thr$ in two complementary parts
%
\begin{equation}
    \int_{\thr}^{+\infty} p(c_i,T)\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} p(c_i,T)\,dc_i\,=\,\frac{T_1}{T}
    \label{eq:pdf_split}
\end{equation}
%
$\rightarrow$ Semi-definite integral over right-sided portion of split $\pc$ gives ratio
of time $T_1$ where $c_i(t)>\thr$ to total time $T$ due to normalization of $\pc$
%
\begin{equation}
    \infint \pc\,dc_i\,=\,1
    \label{eq:pdf}
\end{equation}
%
\textbf{Averaging component:}\\
- Lowpass filter over binary response $b_i(t)$ (Eq.\,\ref{eq:lowpass}) can be
approximated as temporal averaging over a suitable time interval $\tlp$ ($\tlp > \frac{1}{\fc}$)\\
- Within $\tlp$, $b_i(t)$ takes a value of 1 ($c_i(t)>\thr$) for time $T_1$ ($T_1+T_0=\tlp$)
%
\begin{equation}
    f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}
    \label{eq:feat_avg}
\end{equation}
%
$\rightarrow$ Temporal averaging over $b_i(t)\in[0,1]$ (Eq.\,\ref{eq:binary}) gives
ratio of time $T_1$ where $c_i(t)>\thr$ to total averaging interval $\tlp$\\
$\rightarrow$ Feature $f_i(t)$ approximately represents supra-threshold fraction of $\tlp$

\textbf{Combined result:}\\
- Feature $f_i(t)$ can be linked to the distribution of $c_i(t)$ using Eqs.\,\ref{eq:pdf_split} \& \ref{eq:feat_avg}
%
\begin{equation}
    f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
    \label{eq:feat_prop}
\end{equation}
%
$\rightarrow$ Because the integral over a probability density is a cumulative
probability, the value of feature $f_i(t)$ (temporal compression of $b_i(t)$)
at every time point $t$ signifies the probability that convolution output
$c_i(t)$ exceeds the threshold value $\thr$ during the corresponding averaging
interval $\tlp$

\textbf{Implication for intensity invariance:}\\
- Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
template waveform $k_i(t)$ and signal $\adapt(t)$ centered at time point $t$\\
$\rightarrow$ Based on amplitudes on a graded scale

- Feature $f_i(t)$ quantifies the probability that amplitudes of $c_i(t)$
exceed threshold value $\thr$ within interval $\tlp$ around time point $t$\\
$\rightarrow$ Based on binned amplitudes corresponding to one of two categorical states
$\rightarrow$ Deliberate loss of precise amplitude information\\
$\rightarrow$ Emphasis on temporal structure (ratio of $T_1$ over $\tlp$)

- Thresholding of $c_i(t)$ and subsequent temporal averaging of $b_i(t)$ to
obtain $f_i(t)$ constitutes a remapping of an amplitude-encoding quantity into a
duty cycle-encoding quantity, mediated by threshold function $\nl$

- Different scales of $c_i(t)$ can result in similar $T_1$ segments depending
on the magnitude of the derivative of $c_i(t)$ in temporal proximity to time
points at which $c_i(t)$ crosses threshold value $\thr$\\
$\rightarrow$ The steeper the slope of $c_i(t)$, the less $T_1$ changes with scale variations\\
$\rightarrow$ If $T_1$ is invariant to scale variation in $c_i(t)$, then so is $f_i(t)$

- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
other criteria such as song-noise separation or diversity between features

- Nonlinear operations can be used to detach representations from graded physical
stimulus (to fasciliate categorical behavioral decision-making?):\\
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
$\rightarrow$ More decorrelated representation, compared to prior stages\\
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
$\rightarrow$ Trading a graded scale for two or more categorical states\\
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
5) Categorical behavioral decision-making requires further nonlinearities\\
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
initiation of one behavior over another is categorical (e.g. approach/stay)

\section{Discriminating species-specific song\\patterns in feature space}

\section{Conclusions \& outlook}

\end{document}