diff --git a/main.pdf b/main.pdf index 3a0ca0d..958ec77 100644 Binary files a/main.pdf and b/main.pdf differ diff --git a/main.tex b/main.tex index d65a287..3f04e56 100644 --- a/main.tex +++ b/main.tex @@ -104,6 +104,7 @@ \newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation \newcommand{\pc}{p(c,\,T)} % Probability density (general interval) \newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval) +\newcommand{\pci}{p(c_i,\,\tlp)} % Kernel-specific probability density (lowpass interval) \newcommand{\muf}{\mu_{f_i}} % Average feature value \section{Introduction} @@ -258,12 +259,13 @@ initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate}; \bcite{bhavsar2017brain}). Functionally, the ascending neurons are the most diverse of the three neuronal -populations. Individual ascending neurons possess highly specific response -properties that contrast with the rather homogeneous response properties of the -preceding receptor neurons and local -interneurons~(\bcite{clemens2011efficient}), which indicates a transition from -a uniform population-wide processing stream into several parallel branches. -Accordingly, the model pathway is divided into two distinct +populations. Around 15 to 20 ascending neurons have been identified in the +grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual +ascending neurons possess highly specific response properties that contrast +with the rather homogeneous response properties of the preceding receptor +neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates +a transition from a uniform population-wide processing stream into several +parallel branches. Accordingly, the model pathway is divided into two distinct stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the processing steps at the levels of the tympanal membrane, the receptor neurons, and the local interneurons; and operates on one-dimensional signal @@ -754,16 +756,15 @@ This effect is more pronounced for lower $\fc$ of the lowpass filter and is presumably caused by the attenuation of high-frequency components in the signal, which are more prominent in the noise component $\noc(t)$ than in the song component $\soc(t)$. The effect also appears relatively consistent across -different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e) -that are presumably based on different song structures and frequency spectra. -In summary, the standard deviation of $\env(t)$ has never been observed to -transition into a saturation regime for larger $\sca$ but rather continues to -increase proportionally to $\sca$ for all tested $\fc$, in both the noiseless -and the noisy case and across different species. Consequently, the combination -of rectification and lowpass filtering does not contribute to intensity -invariance. However, this transformation pair does improve the SNR of $\env(t)$ -relative to $\filt(t)$ and thus provides subsequent processing stages with a -more robust input representation and higher input SNR. +different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e +and appendix Fig.\,\ref{fig:app_rect-lp}). In summary, the standard deviation +of $\env(t)$ has never been observed to saturate for larger $\sca$ but rather +continues to increase proportionally to $\sca$ for all tested $\fc$, in both +the noiseless and the noisy case and across different species. Consequently, +the combination of rectification and lowpass filtering does not contribute to +intensity invariance. However, this transformation pair does improve the SNR of +$\env(t)$ relative to $\filt(t)$ and thus provides subsequent processing stages +with a more robust input representation and higher input SNR. \begin{figure}[!ht] \centering @@ -883,24 +884,23 @@ $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the effective intensity invariance of $\adapt(t)$ through logarithmic compression and adaptation is limited by the SNR of $\env(t)$: Songs that have already sunken into the noise floor at the level of $\env(t)$ cannot be recovered by -subsequent processing steps, which emphasizes the importance of the SNR -improvement by rectification and lowpass filtering during the previous -processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise -regime, transient regime, and saturation regime remains consistent across -different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of -$\sca$ at which the saturation regime is reached (see appendix -Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$ -within the saturation regime vary considerably between and within species. For +subsequent processing steps. The general pattern of noise regime, transient +regime, and saturation regime remains consistent across different +species~(Fig.\,\ref{fig:log-hp}e). However, the saturation point --- the $\sca$ +value at which the SNR of $\adapt(t)$ starts to saturate --- and the saturation +level --- the constant SNR of $\adapt(t)$ within the saturation regime --- vary +considerably between and within species~(appendix +Figs.\,\ref{fig:app_log-hp_curves}+\ref{fig:app_log-hp_saturation}). For example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably -lower maximum SNR of $\adapt(t)$ compared to other species. These differences -are not to be underestimated, since the SNR of $\adapt(t)$ within the -saturation regime determines the maximum input SNR for subsequent processing -steps. In other words, the fact that $\adapt(t)$ eventually reaches a -saturation regime is, of course, desirable in the context of intensity -invariance, but it also means to pass up on the higher SNR values that are -achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude, -Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR -is a recurring phenomenon that is further addressed in the following sections. +lower saturation level compared to other species. These differences are not to +be underestimated, since the saturation level of $\adapt(t)$ determines the +maximum input SNR for subsequent processing steps. In other words, the fact +that $\adapt(t)$ eventually reaches a saturation regime is, of course, +desirable in the context of intensity invariance, but it also means to pass up +on the higher SNR values that are achieved by $\env(t)$ for the same $\sca$ (up +to several orders of magnitude, Fig.\,\ref{fig:log-hp}d). This trade-off +between intensity invariance and SNR is a recurring phenomenon that is further +addressed in the following sections. \begin{figure}[!ht] \centering @@ -1000,24 +1000,17 @@ sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e, saturation regime). -The value of $\mu_f$ in the saturation regime is independent of the precise -value of $\Theta$, but the value of $\sca$ at which the saturation regime is -reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, -a threshold value of $\Theta=0$ would be the optimal choice for achieving -intensity invariance at the lowest possible $\sca$. In stark contrast, the -closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise -component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise -regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, -and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an -"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the -pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song -component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher -$\sca$ to reach the saturation regime. This trade-off between intensity -invariance and SNR has already been observed during the previous analysis on -logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the -parameters that determine the SNR of $\adapt(t)$ are much less understood and -likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on -the choice of $\Theta$ and can be more directly manipulated by the system. +The saturation level of $f(t)$ is independent of the precise value of $\Theta$, +but the saturation point decreases with +$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, a threshold value of +$\Theta=0$ would be the optimal choice for achieving intensity invariance at +the lowest possible $\sca$. In stark contrast, the closer $\Theta$ is to 0, the +higher $\mu_f$ in response to the pure noise component $\noc(t)$ and the lower +the resulting SNR of $f(t)$ between noise regime and saturation +regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and +Fig.\,\ref{fig:thresh-lp_single}e). This trade-off between intensity invariance +and SNR has already been observed during the previous analysis on logarithmic +compression and adaptation~(Fig.\,\ref{fig:log-hp}d). Finally, the effects of thresholding and temporal averaging must be seen in the context of the previous transformation pair of logarithmic compression and @@ -1102,11 +1095,11 @@ that the songs of each species are eventually represented by distinct points in feature space. However, the species-specific trajectories cross each other at numerous points, which means that the songs of two species --- each at a specific $\sca$ --- can result in the same combination of $\muf$. Furthermore, -the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and -the species: For \textit{C. mollis}, all $\muf$ saturate around the same -$\sca$, while \textit{O. rufipes} exhibits considerable variation between the -three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$, -the stronger the curvature of the trajectory through feature space. +the specific saturation point of $f_i(t)$ depends on the species: For +\textit{C. mollis}, all $\muf$ saturate around the same $\sca$, while +\textit{O. rufipes} exhibits considerable variation between the three $f_i(t)$. +The larger the variation in saturation points between $f_i(t)$, the stronger +the curvature of the trajectory through feature space. In the noisy case, $\muf$ is non-zero even for the smallest $\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise @@ -1121,9 +1114,9 @@ previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the trajectories now move a much shorter distance through feature space for a similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime and saturation regime, which increases the likelihood of trajectories crossing -each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given -species are slightly higher in the noisy case, but the variation between -$f_i(t)$ remains largely unchanged. +each other. Finally, the saturation points of $f_i(t)$ for a given species are +slightly higher in the noisy case, but the variation between $f_i(t)$ remains +largely unchanged. In summary, even a comparably small set of three features $f_i(t)$ can, in principle, represent different species-specific songs at distinct points in @@ -1238,15 +1231,10 @@ broader and is not centered around the single saturation point based on the median but rather shifted towards lower $\sca$. Care must be taken when interpreting the height of either distribution due to the logarithmic scaling of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that -specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their +the saturation points of specific $f_i(t)$ are indeed lower than those of their $c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal averaging on intensity invariance is not necessarily nullified by the previous -logarithmic compression and adaptation, which means that both mechanisms can, -in principle, work together towards an intensity-invariant song representation. -% Or does one simply overwrite the other? Can there even be a higher intensity -% invariance based on the sum of both effects? Or does one simply kick in for -% lower scales than the other and thus dictates the overall intensity -% invariance? Whatever, discussion material. +logarithmic compression and adaptation. \begin{figure}[!ht] \centering @@ -1313,7 +1301,7 @@ representation goes hand in hand with a substantial degree of redundancy and is hardly expected to be present in the actual grasshopper auditory system. But the fact that the saturated $\muf$ are distributed symmetrically around 0.5 provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic -saturation value in the absence of logarithmic +saturation level in the absence of logarithmic compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by the capping of $\adapt(t)$, as seen during previous analyses~(Fig.\,\ref{fig:thresh-lp_single}f and @@ -1327,8 +1315,8 @@ that it allows $f_i(t)$ to reach its intrinsic saturation value. If this results in a wider range of $\muf$ across the feature set, it should be benefitial for forming species-specific combinations. However, this depends on multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as -the structure and distribution of the specific song and is hence not -guaranteed simply by disabling logarithmic compression. +the structure and distribution of the specific song and is hence not guaranteed +simply by disabling logarithmic compression. \begin{figure}[!ht] \centering @@ -1560,25 +1548,241 @@ functional modelling. Other sensory systems that are either more complex or have not been subject to decades of study will likely not be suitable for this approach yet. -% \textbf{Song recognition pathway: Grasshopper vs. model:}\\ -% The model pathway includes a rather large number of Gabor kernels compared to -% the 15 to 20 ascending neurons in the grasshopper auditory -% system~(\bcite{stumpner1991auditory}). +\subsection{Feature representation, temporal averaging, and song design} -\subsection{Interplay of song representation and song design} +The feature set is the final song representation along the model pathway and +constitutes the basis for song recognition. Each feature $f_i(t)$ results from +the thresholding of the respective kernel response $c_i(t)$ by $\nl$ and the +subsequent temporal averaging of binary response $b_i(t)$ by a lowpass filter +with extremely low cutoff frequency $\fc$. At a given time point $t$, $f_i(t)$ +approximately quantifies the proportion of time during which $c_i(t)$ exceeds +the threshold value $\thr$ within the averaging interval $\tlp$ specified by +$\fc$. The value of $f_i(t)$ is hence determined by $\thr$ with respect to the +distribution $\pci$ of $c_i(t)$ and is restricted to the interval $[0,1]$. -\textbf{The role of repetitive songs for the feature representation:} -Most grasshopper songs are produced by stridulation, which refers to the -pulling of the serrated stridulatory file on the hindlegs across a resonating -vein on the forewings~(\bcite{helversen1977stridulatory}; -\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Every "tooth" that -strikes the vein generates a brief sound pulse; multiple pulses make up a -syllable; and the repetition of syllables and pauses results in a -characteristic amplitude-modulated waveform pattern. +Different species-specific songs are represented by different combinations of +feature values, which should preferably be constant for the duration of a song +to enable reliable recognition. The fundamental requirement for a constant +$f_i(t)$ is that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all +$t$, which is fulfilled if $\pci$ is stable across $t$. The most +straightforward way to achieve a stable $\pci$ is that $c_i(t)$ is periodic and +$\tlp$ is sufficiently long to average over multiple cycles of $c_i(t)$. +Song-evoked $c_i(t)$ are indeed approximately periodic, which is largely an +inherited property of the song itself. Most grasshopper songs are produced by +stridulation, which refers to the pulling of the serrated stridulatory file on +the hindlegs across a resonating vein on the +forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song}; +\bcite{helversen1997recognition}). Every "tooth" that strikes the vein +generates a brief sound pulse; multiple pulses make up a syllable; and the +repetition of syllables and pauses results in a pattern with a high degree of +temporal regularity. Accordingly, a robust feature representation in the sense +of constant $f_i(t)$ is tightly linked to the mechanism of sound production and +the temporal structure of the generated song. -\subsection{Intensity invariance versus SNR along the auditory pathway} +Various grasshopper species, especially those with longer songs like \textit{C. +mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly +at first and then continuously increase the amplitude of their song over time. +This slow "ramping" amplitude modulation makes the overall song less periodic +despite its temporal regularity. The "ramping" appears more pronounced in +$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic +compression and adaptation during the preprocessing stage might be at least +partially beneficial for mitigating the effect of this amplitude modulation on +later representations. However, the adaptation of $\adapt(t)$ can only act on +certain time scales --- depending on the cutoff frequency of the underlying +highpass filter --- and is hence not able to compensate for "ramping" across +the entire duration of a song. -\subsection{Behavior in a natural acoustic environment} +Certain grasshopper species like \textit{Chorthippus dorsatus} are known to +switch their stridulation pattern in the middle of a +song~(\bcite{stumpner1994song}). \textit{C. dorsatus} starts stridulating with +both hindlegs in synchrony and thereby generates a pronounced syllable-pause +pattern similar to that of \textit{P. parallelus}. For the last part of its +song, however, \textit{C. dorsatus} switches to an alternating leg movement, +which results in a more continuous but not entirely unstructured rattling +sound. It is unclear what this composite design means for the feature +representation of \textit{C. dorsatus} songs. In principle, both parts of the +song could result in similar $\pci$ despite their different temporal structure, +which would allow for consistent $f_i(t)$ across the entire song. However, it +appears more likely that only one part of the song encodes species identity, +while the other part serves a different purpose such as fitness +advertisement~(SOURCE?). + +Finally, the question remains how the choice of an appropriate averaging +interval $\tlp$ depends on the duration and temporal structure of a song. The +minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$ to ensure a +stable $\pci$ and hence a constant $f_i(t)$. The maximum $\tlp$ should not +exceed the duration of a song to avoid the inclusion of behaviorally irrelevant +information. The longer $\tlp$, the longer $f_i(t)$ takes to stabilize after +the onset and before the offset of a song, which narrows the time window for +reliable recognition. The duration of species-specific grasshopper songs can +range from a few hundred milliseconds (e\,.g \textit{Stethophyma grossum}) to +well over a minute (e\,.g. \textit{C. mollis}), so that the optimal $\tlp$ is +likely to differ between species. + +\subsection{Sensory invariances in the grasshopper auditory system} + +The notion of invariance is fundamental for sensory processing systems. +Invariance, in the general sense, can be described as the property of a +transformation to maintain variation across certain meaningful input parameters +in its output while discarding variation across other input parameters. This +boils down to a selective input-output decorrelation that allows the system to +represent only those aspects of the stimulus that are behaviorally relevant to +the organism. + +The grasshopper auditory system has to deal with a number of sources of +non-informative song variation. For instance, the temporal structure of the +song pattern warps with temperature~(\bcite{skovmand1983song}). This also +affects certain structural parameters that are essential for song recognition, +mainly the duration of syllables and pauses. The auditory system can compensate +for this variation by reading out relative temporal relationships rather than +absolute time intervals~(\bcite{creutzig2009timescale}; +\bcite{creutzig2010timescale}). The ratio of syllable duration to pause +duration is relatively constant across temperatures and has been shown to be +suitable for song recognition~(\bcite{helversen1972gesang}), so that there is +likely no need to retain any information about the absolute duration of +syllables and pauses. + +The situation is more complex for variations in song intensity. Song intensity +at the receiver's position depends mostly on the distance to the sender and is +hence not a reliable cue to infer species identity. The auditory system should +therefore be invariant to intensity variations to recognize conspecific songs +regardless of sender distance. However, song intensity --- specifically, the +interaural intensity difference --- is also required for directional hearing, +which is essential for phonotaxis~(\bcite{helversen1988interaural}). Conflicts +between song recognition and directional hearing are avoided in the auditory +system by distributing both functions across two parallel +pathways~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}). This is +the main reason why our model pathway is focused entirely on song recognition +and has no capacity for directional hearing, no matter how relevant it may be +to the grasshopper. + +Furthermore, "invariance to variations in song intensity" does not do justice +to the full extent of the problem. Intensity is a function of song amplitude +within a certain time frame. It can refer to the individual syllables and +pauses of the song pattern as well as the entire song --- the former is +relevant for song recognition, while the latter is not. Intensity invariance in +the current context can therefore be described as time scale-selective +sensitivity to the faster amplitude dynamics of the song pattern and +simultaneous insensitivity to slower, more sustained amplitude dynamics. In the +model pathway, this time scale selectivity is reflected by the cutoff frequency +$\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most +$\fc$ are effective in removing the local offset of $\db(t)$ and render +$\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ will leave the +relevant amplitude dynamics of the song pattern intact. + +\subsection{Intensity invariance versus SNR} + +Each processing step along the model pathway is a transformation between input +representation and output representation. The intensity of the input is +characterized by scale $\sca$. The intensity of the output is characterized by +an appropriate intensity measure. If the transformation renders the output more +intensity-invariant, then the intensity measure will saturate for sufficiently +large $\sca$, which caps the output SNR to a constant value across these +$\sca$. Otherwise, the intensity measure and hence the output SNR will increase +monotonically with $\sca$. The trade-off between intensity invariance and SNR +refers to the principle that a transformation can either improve intensity +invariance or maintain SNR --- it cannot do both at the same time. This +principle is presumably not specific to the two mechanisms along the model +pathway but rather a general property of transformations that equalize between +different input intensities. + +Logarithmic compression and adaptation by highpass filtering is capable of +equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$, +output $\adapt(t)$ is a perfectly intensity-invariant representation of song +component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$ +limits the effectiveness of this mechanism to sufficiently large $\sca$. This +means that intensity invariance and SNR interact at the input level, as well. +Specifically, the saturation point of $\adapt(t)$ is determined by the input +SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal +$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of +$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates +frequencies outside the relevant range of grasshopper songs. The SNR is then +further improved by the rectification and lowpass filtering of $\filt(t)$ into +$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the +lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given +$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant +amplitude dynamics of the song pattern. The saturation level of $\adapt$, +unlike its saturation point, is independent of the SNR of $\env(t)$ because the +influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the +saturation level and the saturation point of $\adapt(t)$ vary between different +species and specific songs. These differences are likely rooted in the way in +which logarithmic compression acts on the specific distribution of $\env(t)$, +which is determined by $\fc$ and the structure and frequency spectrum of the +rectified $\filt(t)$. + +Thresholding and temporal averaging renders feature $f_i(t)$ +intensity-invariant for sufficiently large $\sca$. The trade-off between +intensity invariance and SNR is mediated by threshold value $\thr$. A lower +$\thr$ ($\thr\to0$) improves intensity invariance by shifting the saturation +point towards lower $\sca$ but also decreases the SNR of $f_i(t)$. The +saturation level of $f_i(t)$ is independent of $\thr$ as long as the intensity +invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is +therefore determined solely by the pure-noise response of $f_i(t)$. The +distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a +normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value +of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger +$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature +value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a +higher saturation point. In this case, any non-zero feature value that is +sustained for a sufficient duration could serve as indicator for the presence +of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning +of $\thr$ to the properties of both the species-specific song and the natural +noise in a certain habitat. + + +It seems reasonable to assume that $\thr$ is one of the parameters along the +pathway + +Physiologically, it is presumably easier to +manipulate $\thr$ + + +It seems reasonable that $\thr$ is easier to +manipulate in ev + + +Furthermore, $\thr$ is presumably a parameter along +the pathway that + + +$\thr$ + + +Furthermore, $\thr$ might be one of the parameters +along the pathway + + + +% However, the parameters that determine the SNR of $\adapt(t)$ are much less +% understood and likely relate to properties of the signal, whereas the SNR of +% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated +% by the system. + +\newpage +\textbf{Thresh-LP: Implication for intensity invariance:}\\ + +- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\ +$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\ +$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for +other criteria such as song-noise separation or diversity between features + +- Nonlinear operations can be used to detach representations from graded physical +stimulus (to fasciliate categorical behavioral decision-making?):\\ +1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\ +$\rightarrow$ Closely following the AM of the acoustic stimulus\\ +2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\ +$\rightarrow$ More decorrelated representation, compared to prior stages\\ +3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\ +$\rightarrow$ Trading a graded scale for two or more categorical states\\ +4) Represent stimulus properties under relevance constraint: $f_i(t)$\\ +$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\ +5) Categorical behavioral decision-making requires further nonlinearities\\ +$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed), +initiation of one behavior over another is categorical (e.g. approach/stay) + +\subsection{Intensity invariance versus intensity invariance} + +\subsection{Implications for behavior in a natural acoustic environment} % RIPPED FROM INTRODUCTION: @@ -1645,63 +1849,6 @@ operate on unmodified recordings of natural grasshopper songs instead of condensed pulse train approximations, which widens its scope towards more realistic, ecologically relevant scenarios. -\textbf{Excursion into time-warp invariance:} -For instance, the temporal structure of grasshopper songs warps with -temperature~(\bcite{skovmand1983song}). The auditory system can compensate for -this variability by reading out relative temporal relationships rather than -absolute time intervals~(\bcite{creutzig2009timescale}; -\bcite{creutzig2010timescale}), as those remain relatively constant across -different temperatures~(\bcite{helversen1972gesang}). - -\textbf{Definition of invariance (general, systemic):}\\ -Invariance = Property of a system to maintain a stable output with respect to a -set of relevant input parameters (variation to be represented) but irrespective -of one or more other parameters (variation to be discarded) -$\rightarrow$ Selective input-output decorrelation - -\textbf{Definition of intensity invariance (context of neurons and songs):}\\ -Intensity invariance = Time scale-selective sensitivity to certain faster -amplitude dynamics (song waveform, small-scale AM) and simultaneous -insensitivity to slower, more sustained amplitude dynamics (transient baseline, -large-scale AM, current overall intensity level)\\ -$\rightarrow$ Without time scale selectivity, any fully intensity-invariant -output will be a flat line - -\textbf{Log-HP: Implication for intensity invariance:}\\ -- Logarithmic scaling is essential for equalizing different song intensities\\ -$\rightarrow$ Intensity information can be manipulated more easily when in form -of a signal offset in log-space than a multiplicative scale in linear space - -- Capability to compensate for intensity variations, i.e. selective amplification -of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\ -$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\ -$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$ - -- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\ -$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR - -\textbf{Thresh-LP: Implication for intensity invariance:}\\ -- Role of song periodicity for feature representation! - -- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\ -$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\ -$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for -other criteria such as song-noise separation or diversity between features - -- Nonlinear operations can be used to detach representations from graded physical -stimulus (to fasciliate categorical behavioral decision-making?):\\ -1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\ -$\rightarrow$ Closely following the AM of the acoustic stimulus\\ -2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\ -$\rightarrow$ More decorrelated representation, compared to prior stages\\ -3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\ -$\rightarrow$ Trading a graded scale for two or more categorical states\\ -4) Represent stimulus properties under relevance constraint: $f_i(t)$\\ -$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\ -5) Categorical behavioral decision-making requires further nonlinearities\\ -$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed), -initiation of one behavior over another is categorical (e.g. approach/stay) - \newpage \section{Appendix} @@ -1716,7 +1863,7 @@ initiation of one behavior over another is categorical (e.g. approach/stay) $\noc(t)$ within the signal envelope $\env(t)$ over scale $\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$ (corresponding to the analysis underlying - Fig.\,\ref{fig:rect-lp}), using random 100 realizations of + Fig.\,\ref{fig:rect-lp}), using 100 random realizations of $\noc(t)$.} \label{fig:app_env-sd} \end{figure}% Referenced.