\documentclass[a4paper, 12pt]{article} \usepackage[left=2.5cm,right=2.5cm,top=2cm,bottom=2cm,includeheadfoot]{geometry} \usepackage[onehalfspacing]{setspace} \usepackage{graphicx} \usepackage{svg} \usepackage{import} \usepackage{float} \usepackage{placeins} \usepackage{parskip} \usepackage{amsmath} \usepackage{amssymb} \usepackage[separate-uncertainty=true, locale=DE]{siunitx} \sisetup{output-exponent-marker=\ensuremath{\mathrm{e}}} % \usepackage[capitalize]{cleveref} % \crefname{figure}{Fig.}{Figs.} % \crefname{equation}{Eq.}{Eqs.} % \creflabelformat{equation}{#2#1#3} \usepackage[ backend=biber, style=authoryear, pluralothers=true, maxcitenames=1, mincitenames=1 ]{biblatex} \addbibresource{cite.bib} \title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system} \author{Jona Hartling, Jan Benda} \date{} \begin{document} \maketitle{} % Text references and citations: \newcommand{\bcite}[1]{\mbox{\cite{#1}}} % \newcommand{\fref}[1]{\mbox{\cref{#1}}} % \newcommand{\fref}[1]{\mbox{Fig.\,\ref{#1}}} % \newcommand{\eref}[1]{\mbox{\cref{#1}}} % \newcommand{\eref}[1]{\mbox{Eq.\,\ref{#1}}} % Math shorthands - Standard symbols: \newcommand{\dec}{\log_{10}} % Logarithm base 10 \newcommand{\infint}{\int_{-\infty}^{+\infty}} % Indefinite integral % Math shorthands - Spectral filtering: \newcommand{\bp}{h_{\text{BP}}(t)} % Bandpass filter function \newcommand{\lp}{h_{\text{LP}}(t)} % Lowpass filter function \newcommand{\hp}{h_{\text{HP}}(t)} % Highpass filter function \newcommand{\fc}{f_{\text{cut}}} % Filter cutoff frequency \newcommand{\tlp}{T_{\text{LP}}} % Lowpass filter averaging interval \newcommand{\thp}{T_{\text{HP}}} % Highpass filter adaptation interval % Math shorthands - Early representations: \newcommand{\raw}{x} % Placeholder input signal \newcommand{\filt}{\raw_{\text{filt}}} % Bandpass-filtered signal \newcommand{\env}{\raw_{\text{env}}} % Signal envelope \newcommand{\db}{\raw_{\text{dB}}} % Logarithmically scaled signal \newcommand{\dbref}{\raw_{\text{ref}}} % Decibel reference intensity \newcommand{\adapt}{\raw_{\text{adapt}}} % Adapted signal % Math shorthands - Kernel parameters: \newcommand{\ks}{\sigma_i} % Gabor kernel width \newcommand{\kf}{f_i} % Gabor kernel frequency \newcommand{\kp}{\phi_i} % Gabor kernel phase % Math shorthands - Threshold nonlinearity: \newcommand{\thr}{\Theta_i} % Step function threshold value \newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function % Math shorthands - Minor symbols and helpers: \newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song signal variance \newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise signal variance \newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval) \newcommand{\pclp}{p(c_i,\,\tlp)} % Probability density (lowpass interval) \section{Exploring a grasshopper's sensory world} % Why functional models of sensory systems? Our scientific understanding of sensory processing systems results from the distributed accumulation of anatomical, physiological and ethological evidence. This process is undoubtedly without alternative; however, it leaves us with the challenge of integrating the available fragments into a coherent whole in order to address issues such as the interaction between individual system components, the functional limitations of the system overall, or taxonomic comparisons between systems that process the same sensory modality. Any unified framework that captures the essential functional aspects of a given sensory system thus has the potential to deepen our current understanding and fasciliate systematic investigations. However, building such a framework is a challenging task. It requires a wealth of existing knowledge of the system and the signals it operates on, a clearly defined scope, and careful reduction, abstraction, and formalization of the underlying structures and mechanisms. % Why the grasshopper auditory system? % Why focus on song recognition among other auditory functions? One sensory system about which extensive information has been gathered over the years is the auditory system of grasshoppers~(\textit{Acrididae}). Grasshoppers rely on their sense of hearing primarily for intraspecific communication, which includes mate attraction~(\bcite{helversen1972gesang}) and evaluation~(\bcite{stange2012grasshopper}), sender localization~(\bcite{helversen1988interaural}), courtship display~(\bcite{elsner1968neuromuskularen}), rival deterrence~(\bcite{greenfield1993acoustic}), and loss-of-signal predator alarm~(SOURCE). In accordance with this rich behavioral repertoire, grasshoppers have evolved a variety of sound production mechanisms to generate acoustic communication signals for different contexts and ranges using their wings, hindlegs, or mandibles~(\bcite{otte1970comparative}). Among the most conspicuous acoustic signals of grasshoppers are their species-specific calling songs, which broadcast the presence of the singing individual --- mostly the males of the species --- to potential mates within range. These songs are usually more characteristic of a species than morphological traits~(\bcite{tishechkin2016acoustic}; \bcite{tarasova2021eurasius}), which can vary greatly within species~(\bcite{rowell1972variable}; \bcite{kohler2017morphological}). The reliance on songs to mediate reproduction represents a strong evolutionary driving force, that resulted in a massive species diversification~(\bcite{vedenina2011speciation}; \bcite{sevastianov2023evolution}), with over 6800 recognized grasshopper species in the \textit{Acrididae} family~(\bcite{cigliano2024orthoptera}). It is this diversity of species, and the crucial role of acoustic communication in its emergence, that makes the grasshopper auditory system an intriguing candidate for attempting to construct a functional model framework. As a necessary reduction, the model we propose here focuses on the pathway responsible for the recognition of species-specific calling songs, disregarding other essential auditory functions such as directional hearing~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}; \bcite{helversen1988interaural}). % What are the signals the auditory system is supposed to recognize? % Why is intensity invariance important for song recognition? % (Obviously, split this paragraph) To understand the functional challenges faced by the grasshopper auditory system, one has to understand the properties of the songs it is designed to recognize. Grasshopper songs are amplitude-modulated broad-band acoustic signals. Most songs are produced by stridulation, during which the animal pulls the serrated stridulatory file on its hindlegs across a resonating vein on the forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song}; \bcite{helversen1997recognition}). Every tooth that strikes the vein generates a brief pulse of sound. Multiple pulses make up a syllable; and the alternation of syllables and relatively quiet pauses forms a characteristic, through noisy, waveform pattern. Song recognition depends on certain temporal and structural parameters of this pattern, such as the duration of syllables and pauses~(\bcite{helversen1972gesang}), the slope of pulse onsets~(\bcite{helversen1993absolute}), and the accentuation of syllable onsets relative to the preceeding pause~(\bcite{balakrishnan2001song}; \bcite{helversen2004acoustic}). The amplitude modulation, or envelope, of the song is sufficient for recognition~(\bcite{helversen1997recognition}). However, the essential recognition cues can vary considerably with external physical factors, which requires the auditory system to be invariant to such variations in order to reliably recognize songs under different conditions. For instance, the temporal structure of grasshopper songs warps with temperature~(\bcite{skovmand1983song}). The auditory system can compensate for this variability by reading out relative temporal relationships rather than absolute time intervals~(\bcite{creutzig2009timescale}; \bcite{creutzig2010timescale}), as those remain relatively constant across different temperatures~(\bcite{helversen1972gesang}). Another, perhaps even more fundamental external source of song variability lays in the attenuation of sound intensity with increasing distance to the sender. Sound attenuation depends on both the frequency content of the signal and the vegetation of the habitat~(\bcite{michelsen1978sound}). For the receiving auditory system, this has two major implications. First, the amplitude dynamics of the song pattern are steadily degraded over distance, which limits the effective communication range of grasshoppers to~\mbox{1\,-\,2\,m} in their typical grassland habitats~(\bcite{lang2000acoustic}). Second, the overall intensity level of songs at the receiver's position varies depending on the location of the sender, which should ideally not affect the recognition of the song pattern. This neccessitates that the auditory system achieves a certain degree of intensity invariance --- a time scale-selective sensitivity to faster amplitude dynamics and simultaneous insensitivity to slower, more sustained amplitude dynamics. Intensity invariance in different auditory systems is often associated with neuronal adaptation~(\bcite{benda2008spike}; \bcite{barbour2011intensity}; \bcite{ozeri2018fast}; more general:~\bcite{benda2021neural}). In the grasshopper auditory system, a number of neuron types along the processing chain exhibit spike-frequency adaptation in response to sustained stimulus intensities~(\bcite{romer1976informationsverarbeitung}; \bcite{gollisch2002energy}; \bcite{hildebrandt2009origin}; \bcite{clemens2010intensity}) and thus likely contribute to the emergence of intensity-invariant song representations. This means that intensity invariance is not the result of a single processing step but rather a gradual process, in which different neuronal populations contribute to varying degrees~(\bcite{clemens2010intensity}) and by different mechanisms~(\bcite{hildebrandt2009origin}). Approximating this process within a functional model framework thus requires a considerable amount of simplification. In this work, we demonstrate that even a small number of basic physiologically inspired signal transformations --- specifically, pairs of nonlinear and linear operations --- is sufficient to achieve a meaningful degree of intensity invariance. % How can song recognition be modelled functionally (feat. Jan Clemens & Co.)? % How did we expand on the previous framework? % (Still can't stand some of this paragraph's structure and wording...) Invariance to non-informative song variations is crucial for reliable song recognition; however, it is not sufficient to this end. In order to recognize a conspecific song as such, the auditory system needs to extract sufficiently informative features of the song pattern and then integrate the gathered information into a final categorical percept. Previous authors have proposed a functional model framework that describes this process --- feature extraction, evidence accumulation, and categorical decision making --- in both crickets~(\bcite{clemens2013computational}; \bcite{hennig2014time}) and grasshoppers~(\bcite{clemens2013feature}; review on both:~\bcite{ronacher2015computational}). Their framework provides a comprehensible and biologically plausible account of the computational mechanisms required for species-specific song recognition, which has served as the inspiration for the development of the model pathway we propose here. The existing framework relies on pulse trains as input signals, which were designed to capture the essential structural properties of natural song envelopes~(\bcite{clemens2013feature}). In the first step, a bank of parallel linear-nonlinear feature detectors is applied to the input signal. Each feature detector consists of a convolutional filter and a subsequent sigmoidal nonlinearity. The outputs of these feature detectors are temporally averaged to obtain a single feature value per detector, which is then assigned a specific weight. The linear combination of weighted feature values results in a single preference value, that serves as predictor for the behavioral response of the animal to the presented input signal. Our model pathway adopts the general structure of the existing framework but modifies it in several key aspects. The convolutional filters, which have previously been fitted to behavioral data for each individual species~(\bcite{clemens2013computational}), are replaced by a larger, generic set of unfitted Gabor basis functions in order to cover a wide range of possible song features across different species. Gabor functions approximate the general structure of the filters used in the existing framework as well as the filter functions found in various auditory neurons~(\bcite{rokem2006spike}; \bcite{clemens2011efficient}; \bcite{clemens2012nonlinear}). The fitted sigmoidal nonlinearities in the existing framework consistently exhibited very steep slopes and are therefore replaced by shifted Heaviside step-functions, which results in a binarization of the feature detector outputs. Another, more substantial modification is that the feature detector outputs are temporally averaged in a way that does not condense them into single feature values but retains their time-varying structure. This is in line with the fact that songs are no discrete units but part of a continuous acoustic stream that the auditory system has to process in real time. Moreover, a time-varying feature representation only stabilizes after a certain delay following the onset of a song, which emphasizes the temporal dynamics of evidence accumulation towards a final categorical decision. The most notable difference between our model pathway and the existing framework, however, lays in the addition of a physiologically inspired preprocessing stage, whose starting point corresponds to the initial reception of airborne sound waves. This allows the model to operate on unmodified recordings of natural grasshopper songs instead of condensed pulse train approximations, which widens its scope towards more realistic, ecologically relevant scenarios. For instance, we were able to investigate the contribution of different processing stages to the emergence of intensity-invariant song representations based on actual field recordings of songs at different distances from the sender. % Forgive me, it's friday. In the following, we outline the structure of the proposed model of the grasshopper auditory pathway, from the initial reception of sound waves up to the generation of a high-dimensional, time-varying feature representation that is suitable for species-specific song recognition. We provide a side-by-side account of the known physiological processing steps and their functional approximation by basic mathematical operations. We then elaborate on two key mechanisms that drive the emergence of intensity-invariant song representations within the auditory pathway. % SCRAPPED UNTIL FURTHER NOTICE: % Multi-species, multi-individual communally inhabited environments\\ % - Temporal overlap: Simultaneous singing across individuals/species common\\ % - Frequency overlap: Little speciation into frequency bands (likely unused)\\ % - "Biotic noise": Hetero-/conspecifics ("Another one's songs are my noise")\\ % - "Abiotic noise": Wind, water, vegetation, anthropogenic\\ % - Effects of habitat structure on sound propagation (landscape - soundscape)\\ % $\rightarrow$ Sensory constraints imposed by the (acoustic) environment % Cluster of auditory challenges (interlocking constraints $\rightarrow$ tight coupling):\\ % From continuous acoustic input, generate neuronal representations that...\\ % 1)...allow for the separation of relevant (song) events from ambient noise floor\\ % 2)...compensate for behaviorally non-informative song variability (invariances)\\ % 3)...carry sufficient information to characterize different song patterns, % recognize the ones produced by conspecifics, and make appropriate behavioral % decisions based on context (sender identity, song type, mate/rival quality) % How can the auditory system of grasshoppers meet these challenges?\\ % - What are the minimum functional processing steps required?\\ % - Which known neuronal mechanisms can implement these steps?\\ % - Which and how many stages along the auditory pathway contribute?\\ % $\rightarrow$ What are the limitations of the system as a whole? % How can a human observer conceive a grasshopper's auditory percepts?\\ % - How to investigate the workings of the auditory pathway as a whole?\\ % - How to systematically test effects and interactions of processing parameters?\\ % - How to integrate the available knowledge on anatomy, physiology, ethology?\\ % $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework \section{Developing a functional model of the\\grasshopper song recognition pathway} The essence of constructing a functional model of a given system is to gain a sufficient understanding of the system's essential structural components and the functional roles they might fulfill; and to then build a formal framework of manageable complexity around these two aspects. In essence, constructing a functional modelling of a system means to build a formal framework of manageable complexity around the system's essential structural components and the functional roles they might fulfill. In essence, the development of a functional model means building a formal framework of manageable complexity around a system's essential structural components and the functional roles they might fulfill. Anatomically, the organization of the grasshopper song recognition pathway can be outlined as a hierarchical feed-forward network of three consecutive neuronal populations~(Fig.\,\ref{fig:pathway}a-c): Peripheral auditory receptor neurons, whose axons enter the ventral nerve cord at the level of the metathoracic ganglion; local interneurons that remain exclusively within the thoracic region of the ventral nerve cord; and ascending neurons projecting from the thoracic region towards the supraesophageal ganglion~(\bcite{rehbein1974structure}; \bcite{rehbein1976auditory}; \bcite{eichendorf1980projections}). The input to the network originates from The input to the network originates from sound-induced vibrations of the tympanal membrane on each side of the thorax, which are transduced into electro-chemical signals by the receptor neurons. The output from the network converges somewhere in the supraesophageal ganglion, where the recognition of conspecific songs is presumed to take place~(\bcite{romer1985responses}; \bcite{ronacher1986routes}; \bcite{bauer1987separate}; \bcite{bhavsar2017brain}). Functionally, the ascending neuron population is characterized by a marked increase in response heterogenity compared to the preceding receptor neurons and local interneurons, which exhibit relatively homogeneous response properties across their respective populations~(\bcite{clemens2011efficient}). Based on these considerations, the organisation of the model pathway~(Fig.\,\ref{fig:pathway}d) comprises two distinct overall stages: 1) "Pre-split portion" of the auditory pathway:\\ Tympanal membrane $\rightarrow$ Receptor neurons $\rightarrow$ Local interneurons Similar response/filter properties within receptor/interneuron populations (\cite{clemens2011efficient})\\ $\rightarrow$ One population-wide response trace per stage (no "single-cell resolution") 2) "Post-split portion" of the auditory pathway:\\ Ascending neurons (AN) $\rightarrow$ Central brain neurons Diverse response/filter properties within AN population (\cite{clemens2011efficient})\\ - Pathway splitting into several parallel branches\\ - Expansion into a decorrelated higher-dimensional sound representation\\ $\rightarrow$ Individual neuron-specific response traces from this stage onwards \begin{figure}[!ht] \centering \def\svgwidth{\textwidth} \import{figures/}{fig_auditory_pathway.pdf_tex} \caption{\textbf{Schematic organisation of the song recognition pathway in grasshoppers compared to the structure of the model pathway.} \textbf{a}:~Course of the pathway in the grasshopper, from the tympanal membrane over receptor neurons (1st order), local interneurons (2nd order) of the metathoracic ganglion, and ascending neurons (3rd order) further towards the central brain. \textbf{b}:~Connections between the three neuronal populations within the metathoracic ganglion. \textbf{c}:~Network representation of neuronal connectivity. \textbf{d}:~Flow diagram of the different signal representations (boxes) and transformations (arrows) along the model pathway. The pathway consists of a population-wide preprocessing stream followed by several parallel feature extraction streams. } \label{fig:pathway} \end{figure} \subsection{Population-driven signal preprocessing} Grasshoppers receive airborne sound waves by a tympanal organ at each side of the thorax~(Fig.\,\ref{fig:pathway}a). The tympanal membrane acts as a mechanical resonance filter: Vibrations that fall within specific frequency bands are focused on different membrane areas, while others are attenuated~(\bcite{michelsen1971frequency}; \bcite{windmill2008time}; \bcite{malkin2014energy}). This processing step can be approximated by an initial bandpass filter \begin{equation} \filt(t)\,=\,\raw(t)\,*\,\bp, \qquad \fc\,=\,5\,\text{kHz},\,30\,\text{kHz} \label{eq:bandpass} \end{equation} applied to the acoustic input signal $\raw(t)$. The auditory receptor neurons connect directly to the tympanal membrane~(Fig.\,\ref{fig:pathway}a). Besides performing the mechano-electrical transduction, the receptor population is substrate to several known processing steps. First, the receptors extract the signal envelope~(\bcite{machens2001discrimination}), which likely involves a rectifying nonlinearity~(\bcite{machens2001representation}). This can be modelled as full-wave rectification followed by lowpass filtering \begin{equation} \env(t)\,=\,|\filt(t)|\,*\,\lp, \qquad \fc\,=\,500\,\text{Hz} \label{eq:env} \end{equation} of the tympanal signal $\filt(t)$. Furthermore, the receptors exhibit a sigmoidal response curve over logarithmically compressed intensity levels~(\bcite{suga1960peripheral}; \bcite{gollisch2002energy}). In the model, logarithmic compression is achieved by conversion to decibel scale \begin{equation} \db(t)\,=\,10\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,\max[\env(t)] \label{eq:log} \end{equation} relative to the maximum intensity $\dbref$ of the signal envelope $\env(t)$. Next, the axons of the receptor neurons project into the metathoracic ganglion, where they synapse onto local interneurons~(Fig.\,\ref{fig:pathway}b). Both the local interneurons~(\bcite{hildebrandt2009origin}; \bcite{clemens2010intensity}) and, to a lesser extent, the receptors themselves~(\bcite{fisch2012channel}) display spike-frequency adaptation in response to sustained stimulus intensity levels. This mechanism allows for the robust encoding of faster amplitude modulations against a slowly changing overall baseline intensity. Functionally, this processing step resembles a highpass filter \begin{equation} \adapt(t)\,=\,\db(t)\,*\,\hp, \qquad \fc\,=\,10\,\text{Hz} \label{eq:highpass} \end{equation} over the logarithmically scaled envelope $\db(t)$. The projections of the local interneurons remain within the metathoracic ganglion and synapse onto a small number of ascending neurons~(Fig.\,\ref{fig:pathway}b), which marks the transition between the preprocessing stream and the parallel processing stream of the model pathway. \subsection{Feature extraction by individual neurons} The small population of ascending neurons \textbf{Stage-specific processing steps and functional approximations:} Template matching by individual ANs\\ - Filter base (STA approximations): Set of Gabor kernels\\ - Gabor parameters: $\ks, \kp, \kf$ $\rightarrow$ Determines kernel sign and lobe number % \begin{equation} k_i(t,\,\ks,\,\kf,\,\kp)\,=\,e^{-\frac{t^{2}}{2{\ks}^{2}}}\,\cdot\,\sin(2\pi\kf\,\cdot\,t\,+\,\phi_i) \label{eq:gabor} \end{equation} % $\rightarrow$ Separate convolution with each member of the kernel set % \begin{equation} c_i(t)\,=\,\adapt(t)\,*\,k_i(t) = \infint \adapt(\tau)\,\cdot\,k_i(t\,-\,\tau)\,d\tau \label{eq:conv} \end{equation} % Thresholding nonlinearity in ascending neurons (or further downstream)\\ - Binarization of AN response traces into "relevant" vs. "irrelevant"\\ $\rightarrow$ Shifted Heaviside step-function $\nl$ (or steep sigmoid threshold?) % \begin{equation} b_i(t,\,\thr)\,=\,\begin{cases} \;1, \quad c_i(t)\,>\,\thr\\ \;0, \quad c_i(t)\,\leq\,\thr \end{cases} \label{eq:binary} \end{equation} % Temporal averaging by neurons of the central brain\\ - Finalized set of slowly changing kernel-specific features (one per AN)\\ - Different species-specific song patterns are characterized by a distinct combination of feature values $\rightarrow$ Clusters in high-dimensional feature space\\ $\rightarrow$ Lowpass filter 1 Hz % \begin{equation} f_i(t)\,=\,b_i(t)\,*\,\lp, \qquad \fc\,=\,1\,\text{Hz} \label{eq:lowpass} \end{equation} % \section{Two mechanisms driving the emergence of intensity-invariant song representation} \textbf{Definition of invariance (general, systemic):}\\ Invariance = Property of a system to maintain a stable output with respect to a set of relevant input parameters (variation to be represented) but irrespective of one or more other parameters (variation to be discarded) $\rightarrow$ Selective input-output decorrelation \textbf{Definition of intensity invariance (context of neurons and songs):}\\ Intensity invariance = Time scale-selective sensitivity to certain faster amplitude dynamics (song waveform, small-scale AM) and simultaneous insensitivity to slower, more sustained amplitude dynamics (transient baseline, large-scale AM, current overall intensity level)\\ $\rightarrow$ Without time scale selectivity, any fully intensity-invariant output will be a flat line \subsection{Logarithmic scaling \& spike-frequency adaptation} Envelope $\env(t)$ $\xrightarrow{\text{dB}}$ Logarithmic $\db(t)$ $\xrightarrow{\hp}$ Adapted $\adapt(t)$ - Rewrite signal envelope $\env(t)$ (Eq.\,\ref{eq:env}) as a synthetic mixture:\\ 1) Song signal $s(t)$ ($\svar=1$) with variable multiplicative scale $\alpha\geq0$\\ 2) Fixed-scale additive noise $\eta(t)$ ($\nvar=1$) % \begin{equation} \env(t)\,=\,\alpha\,\cdot\,s(t)\,+\,\eta(t),\qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R} \label{eq:toy_env} \end{equation} % - Signal-to-noise ratio (SNR): Ratio of variances of synthetic mixture $\env(t)$ with ($\alpha>0$) and without ($\alpha=0$) song signal $s(t)$, assuming $s(t)\perp\eta(t)$ % \begin{equation} \text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1 \label{eq:toy_snr} \end{equation} % \textbf{Logarithmic component:}\\ - Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\ - Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws % \begin{equation} \begin{split} \db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\ &=\,\log \frac{\alpha}{\dbref}\,+\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig] \end{split} \label{eq:toy_log} \end{equation} % $\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\ $\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\ $\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\ $\rightarrow$ Normalization by $\dbref$ applies equally to all terms (no individual effects) \textbf{Adaptation component:}\\ - Highpass filter over $\db(t)$ (Eq.\,\ref{eq:highpass}) can be approximated as subtraction of the local signal offset within a suitable time interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$) % \begin{equation} \begin{split} \adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig] \end{split} \label{eq:toy_highpass} \end{equation} % \textbf{Implication for intensity invariance:}\\ - Logarithmic scaling is essential for equalizing different song intensities\\ $\rightarrow$ Intensity information can be manipulated more easily when in form of a signal offset in log-space than a multiplicative scale in linear space - Scale $\alpha$ can only be redistributed, not entirely eliminated from $\adapt(t)$\\ $\rightarrow$ Turn initial scaling of song $s(t)$ by $\alpha$ into scaling of noise $\eta(t)$ by $\frac{1}{\alpha}$ - Capability to compensate for intensity variations, i.e. selective amplification of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\ $\alpha\gg1$: Attenuation of $\eta(t)$ term $\rightarrow$ $s(t)$ dominates $\adapt(t)$\\ $\alpha\approx1$ Negligible effect on $\eta(t)$ term $\rightarrow$ $\adapt(t)=\log[s(t)+\eta(t)]$\\ $\alpha\ll1$: Amplification of $\eta(t)$ term $\rightarrow$ $\eta(t)$ dominates $\adapt(t)$\\ $\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\ $\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$ - Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\ $\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR \subsection{Threshold nonlinearity \& temporal averaging} Convolved $c_i(t)$ $\xrightarrow{\nl}$ Binary $b_i(t)$ $\xrightarrow{\lp}$ Feature $f_i(t)$ \textbf{Thresholding component:}\\ - Within an observed time interval $T$, $c_i(t)$ follows probability density $\pc$\\ - Within $T$, $c_i(t)$ exceeds threshold value $\thr$ for time $T_1$ ($T_1+T_0=T$)\\ - Threshold $\nl$ splits $\pc$ around $\thr$ in two complementary parts % \begin{equation} \int_{\thr}^{+\infty} p(c_i,T)\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} p(c_i,T)\,dc_i\,=\,\frac{T_1}{T} \label{eq:pdf_split} \end{equation} % $\rightarrow$ Semi-definite integral over right-sided portion of split $\pc$ gives ratio of time $T_1$ where $c_i(t)>\thr$ to total time $T$ due to normalization of $\pc$ % \begin{equation} \infint \pc\,dc_i\,=\,1 \label{eq:pdf} \end{equation} % \textbf{Averaging component:}\\ - Lowpass filter over binary response $b_i(t)$ (Eq.\,\ref{eq:lowpass}) can be approximated as temporal averaging over a suitable time interval $\tlp$ ($\tlp > \frac{1}{\fc}$)\\ - Within $\tlp$, $b_i(t)$ takes a value of 1 ($c_i(t)>\thr$) for time $T_1$ ($T_1+T_0=\tlp$) % \begin{equation} f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp} \label{eq:feat_avg} \end{equation} % $\rightarrow$ Temporal averaging over $b_i(t)\in[0,1]$ (Eq.\,\ref{eq:binary}) gives ratio of time $T_1$ where $c_i(t)>\thr$ to total averaging interval $\tlp$\\ $\rightarrow$ Feature $f_i(t)$ approximately represents supra-threshold fraction of $\tlp$ \textbf{Combined result:}\\ - Feature $f_i(t)$ can be linked to the distribution of $c_i(t)$ using Eqs.\,\ref{eq:pdf_split} \& \ref{eq:feat_avg} % \begin{equation} f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp) \label{eq:feat_prop} \end{equation} % $\rightarrow$ Because the integral over a probability density is a cumulative probability, the value of feature $f_i(t)$ (temporal compression of $b_i(t)$) at every time point $t$ signifies the probability that convolution output $c_i(t)$ exceeds the threshold value $\thr$ during the corresponding averaging interval $\tlp$ \textbf{Implication for intensity invariance:}\\ - Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of template waveform $k_i(t)$ and signal $\adapt(t)$ centered at time point $t$\\ $\rightarrow$ Based on amplitudes on a graded scale - Feature $f_i(t)$ quantifies the probability that amplitudes of $c_i(t)$ exceed threshold value $\thr$ within interval $\tlp$ around time point $t$\\ $\rightarrow$ Based on binned amplitudes corresponding to one of two categorical states $\rightarrow$ Deliberate loss of precise amplitude information\\ $\rightarrow$ Emphasis on temporal structure (ratio of $T_1$ over $\tlp$) - Thresholding of $c_i(t)$ and subsequent temporal averaging of $b_i(t)$ to obtain $f_i(t)$ constitutes a remapping of an amplitude-encoding quantity into a duty cycle-encoding quantity, mediated by threshold function $\nl$ - Different scales of $c_i(t)$ can result in similar $T_1$ segments depending on the magnitude of the derivative of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$ crosses threshold value $\thr$\\ $\rightarrow$ The steeper the slope of $c_i(t)$, the less $T_1$ changes with scale variations\\ $\rightarrow$ If $T_1$ is invariant to scale variation in $c_i(t)$, then so is $f_i(t)$ - Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\ $\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\ $\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for other criteria such as song-noise separation or diversity between features - Nonlinear operations can be used to detach representations from graded physical stimulus (to fasciliate categorical behavioral decision-making?):\\ 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\ $\rightarrow$ Closely following the AM of the acoustic stimulus\\ 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\ $\rightarrow$ More decorrelated representation, compared to prior stages\\ 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\ $\rightarrow$ Trading a graded scale for two or more categorical states\\ 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\ $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\ 5) Categorical behavioral decision-making requires further nonlinearities\\ $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed), initiation of one behavior over another is categorical (e.g. approach/stay) \section{Discriminating species-specific song\\patterns in feature space} \section{Conclusions \& outlook} \end{document}