diff --git a/main.pdf b/main.pdf index 958ec77..e54edb2 100644 Binary files a/main.pdf and b/main.pdf differ diff --git a/main.tex b/main.tex index 3f04e56..28c51aa 100644 --- a/main.tex +++ b/main.tex @@ -258,14 +258,15 @@ substrate for conspecific song recognition and response initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate}; \bcite{bhavsar2017brain}). -Functionally, the ascending neurons are the most diverse of the three neuronal -populations. Around 15 to 20 ascending neurons have been identified in the -grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual -ascending neurons possess highly specific response properties that contrast -with the rather homogeneous response properties of the preceding receptor -neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates -a transition from a uniform population-wide processing stream into several -parallel branches. Accordingly, the model pathway is divided into two distinct +Around 15 to 20 ascending neurons have been identified in the grasshopper +auditory system~(\bcite{stumpner1991auditory}), whose functional +characteristics are conserved even between species that are not closely +related~(\bcite{neuhofer2008evolutionarily}). The population of ascending +neurons possesses a diverse range of response properties that contrasts with +the rather homogeneous responses of receptor neurons and local +interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a +uniform population-wide processing stream into several parallel branches. +Accordingly, the model pathway is divided into two distinct stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the processing steps at the levels of the tympanal membrane, the receptor neurons, and the local interneurons; and operates on one-dimensional signal @@ -275,6 +276,26 @@ downstream towards the SEG; and operates on high-dimensional signal representations~(Fig.\,\ref{fig:stages_feat}). The details of each physiological processing step and its functional approximation are described in the following sections. + +Around 15 to 20 ascending neurons have been identified in the grasshopper +auditory system~(\bcite{stumpner1991auditory}), whose functional +characteristics are conserved even between species that are not closely +related~(\bcite{neuhofer2008evolutionarily}). The population of ascending +neurons possesses a diverse range of response properties that contrasts with +the rather homogeneous responses of receptor neurons and local +interneurons~(\bcite{clemens2011efficient}), which suggests a transition from a +uniform population-wide processing stream into several parallel branches. +Accordingly, the model pathway is divided into two distinct +stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the +processing steps at the levels of the tympanal membrane, the receptor neurons, +and the local interneurons; and operates on one-dimensional signal +representations~(Fig.\,\ref{fig:stages_pre}). The feature extraction stage +corresponds to the processing within the ascending neurons and further +downstream towards the SEG; and operates on high-dimensional signal +representations~(Fig.\,\ref{fig:stages_feat}). The details of each +physiological processing step and its functional approximation are described in +the following sections. + \begin{figure}[!ht] \centering \includegraphics[width=\textwidth]{figures/fig_auditory_pathway.pdf} @@ -1549,6 +1570,7 @@ have not been subject to decades of study will likely not be suitable for this approach yet. \subsection{Feature representation, temporal averaging, and song design} +\label{sec:constant_feat} The feature set is the final song representation along the model pathway and constitutes the basis for song recognition. Each feature $f_i(t)$ results from @@ -1703,12 +1725,15 @@ lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given $\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant amplitude dynamics of the song pattern. The saturation level of $\adapt$, unlike its saturation point, is independent of the SNR of $\env(t)$ because the -influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the -saturation level and the saturation point of $\adapt(t)$ vary between different -species and specific songs. These differences are likely rooted in the way in -which logarithmic compression acts on the specific distribution of $\env(t)$, -which is determined by $\fc$ and the structure and frequency spectrum of the -rectified $\filt(t)$. +influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output +SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might +in parts be a consequence of the logarithm, which compresses different higher +intensities but also amplifies lower intensities, including the noise floor. +Both the saturation level and the saturation point of $\adapt(t)$ vary between +different species and individual songs. These differences are likely rooted in +the way in which logarithmic compression acts on the specific distribution of +$\env(t)$, which is determined by $\fc$ as well as the temporal structure and +frequency spectrum of the rectified $\filt(t)$. Thresholding and temporal averaging renders feature $f_i(t)$ intensity-invariant for sufficiently large $\sca$. The trade-off between @@ -1720,71 +1745,103 @@ invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is therefore determined solely by the pure-noise response of $f_i(t)$. The distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value -of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger +of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for higher $\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature -value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a -higher saturation point. In this case, any non-zero feature value that is -sustained for a sufficient duration could serve as indicator for the presence -of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning -of $\thr$ to the properties of both the species-specific song and the natural +value is 0, which results in an "unlimited" SNR of $f_i(t)$. In this case, any +non-zero feature value that is sustained for a sufficient duration could serve +as indicator for the presence of $\soc(t)$, although at the cost of a higher +saturation point. The maximum of the pure-noise $c_i(t)$ is assumed to be very +small due to the various SNR improvements along the pathway, so that the +required increase in $\thr$ and hence the saturation point of $f_i(t)$ is not +expected to be substantial. However, exploiting the capacity of $f_i(t)$ for +arbitrarily high SNR would certainly require a fine evolutionary tuning of +$\thr$ to the properties of both the species-specific song and the natural noise in a certain habitat. - -It seems reasonable to assume that $\thr$ is one of the parameters along the -pathway - -Physiologically, it is presumably easier to -manipulate $\thr$ - - -It seems reasonable that $\thr$ is easier to -manipulate in ev - - -Furthermore, $\thr$ is presumably a parameter along -the pathway that - - -$\thr$ - - -Furthermore, $\thr$ might be one of the parameters -along the pathway - - - -% However, the parameters that determine the SNR of $\adapt(t)$ are much less -% understood and likely relate to properties of the signal, whereas the SNR of -% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated -% by the system. - \newpage -\textbf{Thresh-LP: Implication for intensity invariance:}\\ - -- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\ -$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\ -$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for -other criteria such as song-noise separation or diversity between features - -- Nonlinear operations can be used to detach representations from graded physical -stimulus (to fasciliate categorical behavioral decision-making?):\\ -1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\ -$\rightarrow$ Closely following the AM of the acoustic stimulus\\ -2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\ -$\rightarrow$ More decorrelated representation, compared to prior stages\\ -3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\ -$\rightarrow$ Trading a graded scale for two or more categorical states\\ -4) Represent stimulus properties under relevance constraint: $f_i(t)$\\ -$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\ -5) Categorical behavioral decision-making requires further nonlinearities\\ -$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed), -initiation of one behavior over another is categorical (e.g. approach/stay) - \subsection{Intensity invariance versus intensity invariance} +Two consecutive mechanisms of intensity invariance do not necessarily add up to +a stronger overall intensity invariance. If the first mechanism results in a +lower saturation point than the second mechanism by itself, the saturation +point of feature $f_i(t)$ will be determined solely by the first mechanism. In +this case, the saturation level of $f_i(t)$ will conform to the intensity that +$f_i(t)$ can reach for the given saturation point rather than the intrinsic +saturation level of $f_i(t)$. Conversely, if the second mechanism results in a +lower saturation point than the first mechanism, both the saturation point and +the saturation level of $f_i(t)$ will be determined by the second mechanism. +The saturation points of $f_i(t)$ across the set are distributed over a much +wider range than those of the preceeding kernel responses $c_i(t)$, which +suggests that the interaction between the two mechanisms is specific to +individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation +point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only +marginally lower saturation points. This raises the question whether two +consecutive mechanisms of intensity invariance are actually beneficial for the +overall system. + +From a purely functional perspective, the answer could be that logarithmic +compression and adaptation is a necessary preprocessing step towards a robust +feature representation, even if thresholding and temporal averaging alone would +be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely +improves the temporal regularity of the song pattern in $\adapt(t)$ and +$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a +song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between +the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which +is essential for the generation of consistent species-specific $f_i(t)$ under a +static $\thr$. From a physiological perspective, the answer is likely that +neurons possess only a limited firing rate for encoding stimulus intensities +that can range over several orders of magnitude. Sigmoidal tuning curves over +logarithmically compressed stimulus intensities are a common property of +sensory neurons across various modalities~(SOURCE?), and neurons of the +grasshopper auditory system are no exception~(\bcite{suga1960peripheral}; +\bcite{gollisch2002energy}). + \subsection{Implications for behavior in a natural acoustic environment} -% RIPPED FROM INTRODUCTION: +Most grasshoppers live in environments that are communally inhabited by +numerous individuals from multiple species. Their acoustic environment is +characterized by noise from various sources --- abiotic ones like wind and +water, but also the songs of both hetero- and conspecifics. This limits the SNR +that each individual can achieve for its own song, and hence the effectiveness +of the intensity-invariant processing in the auditory system. Producing higher +song intensities is not a viable solution to this problem, because these also +contribute to the overall noise floor. A possible behavioral solution could be +to produce songs in a "turn-taking" manner to avoid the temporal superposition +of multiple songs into overly intense signals. This would also prevent the +mutual distortion of the respective song pattern. Another solution could be to +spatially separate from other nearby grasshoppers to spread the potential noise +sources over a larger area. However, according to our analysis based on field +recordings as well as previous work on the topic~(\bcite{lang2000acoustic}), +reliable song recognition is limited to little more than 1\,m from the sender, +so that a grasshopper also cannot afford to stay too far away from its +conspecifics. A better solution may hence be to collectively produce songs at +lower-than-possible intensities, which would reduce the overall noise floor for +all nearby individuals. Importantly, the limitation of intensity invariance by +SNR likely applies to all grasshoppers regardless of species, so that the +behavioral strategies could be shared among the species that coexist in a given +habitat. + +% Because the presumed restriction of song recognition +% by means of the noise floor applies to all grasshoppers in a certain area, +% these strategies may not be specific to some of the species at this location. +% Instead, they must be shared by all grasshopper species that coexist within a +% portion of a given habitat, which would provide an important implication for +% the evolution of grasshopper songs in communities of multiple species. + +%%% RELICS OF INTRODUCTION %%% +% - Nonlinear operations can be used to detach representations from graded physical +% stimulus (to fasciliate categorical behavioral decision-making?):\\ +% 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\ +% $\rightarrow$ Closely following the AM of the acoustic stimulus\\ +% 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\ +% $\rightarrow$ More decorrelated representation, compared to prior stages\\ +% 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\ +% $\rightarrow$ Trading a graded scale for two or more categorical states\\ +% 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\ +% $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\ +% 5) Categorical behavioral decision-making requires further nonlinearities\\ +% $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed), +% initiation of one behavior over another is categorical (e.g. approach/stay) % Multi-species, multi-individual communally inhabited environments\\ % - Temporal overlap: Simultaneous singing across individuals/species common\\ @@ -1802,53 +1859,12 @@ initiation of one behavior over another is categorical (e.g. approach/stay) % recognize the ones produced by conspecifics, and make appropriate behavioral % decisions based on context (sender identity, song type, mate/rival quality) -% How can the auditory system of grasshoppers meet these challenges?\\ -% - What are the minimum functional processing steps required?\\ -% - Which known neuronal mechanisms can implement these steps?\\ -% - Which and how many stages along the auditory pathway contribute?\\ -% $\rightarrow$ What are the limitations of the system as a whole? - % How can a human observer conceive a grasshopper's auditory percepts?\\ % - How to investigate the workings of the auditory pathway as a whole?\\ % - How to systematically test effects and interactions of processing parameters?\\ % - How to integrate the available knowledge on anatomy, physiology, ethology?\\ % $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework -\textbf{Differences between the model pathway and the previous framework:} -In the first step, a bank of parallel linear-nonlinear feature detectors is -applied to the input signal. Each feature detector consists of a convolutional -filter and a subsequent sigmoidal nonlinearity. The outputs of these feature -detectors are temporally averaged to obtain a single feature value per -detector, which is then assigned a specific weight. The linear combination of -weighted feature values results in a single preference value, that serves as -predictor for the behavioral response of the animal to the presented input -signal. Our model pathway adopts the general structure of the existing -framework but modifies it in several key aspects. The convolutional filters, -which have previously been fitted to behavioral data for each individual -species~(\bcite{clemens2013computational}), are replaced by a larger, generic -set of unfitted Gabor basis functions in order to cover a wide range of -possible song features across different species. Gabor functions approximate -the general structure of the filters used in the existing framework as well as -the filter functions found in various auditory neurons~(\bcite{rokem2006spike}; -\bcite{clemens2011efficient}; \bcite{clemens2012nonlinear}). The fitted -sigmoidal nonlinearities in the existing framework consistently exhibited very -steep slopes and are therefore replaced by shifted Heaviside step-functions, -which results in a binarization of the feature detector outputs. Another, more -substantial modification is that the feature detector outputs are temporally -averaged in a way that does not condense them into single feature values but -retains their time-varying structure. This is in line with the fact that songs -are no discrete units but part of a continuous acoustic stream that the -auditory system has to process in real time. Moreover, a time-varying feature -representation only stabilizes after a certain delay following the onset of a -song, which emphasizes the temporal dynamics of evidence accumulation towards a -final categorical decision. The most notable difference between our model -pathway and the existing framework, however, lays in the addition of a -physiologically inspired preprocessing stage, whose starting point corresponds -to the initial reception of airborne sound waves. This allows the model to -operate on unmodified recordings of natural grasshopper songs instead of -condensed pulse train approximations, which widens its scope towards more -realistic, ecologically relevant scenarios. - \newpage \section{Appendix}