Writing discussion.

2026-05-28 18:17:59 +02:00
parent 6cd56b82b0
commit 1878fb5eaf
2 changed files with 289 additions and 142 deletions
--- a/main.pdf
+++ b/main.pdf
--- a/main.tex
+++ b/main.tex
@@ -104,6 +104,7 @@
 \newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation
 \newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
 \newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
+\newcommand{\pci}{p(c_i,\,\tlp)} % Kernel-specific probability density (lowpass interval)
 \newcommand{\muf}{\mu_{f_i}} % Average feature value

 \section{Introduction}
@@ -258,12 +259,13 @@ initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
 \bcite{bhavsar2017brain}).

 Functionally, the ascending neurons are the most diverse of the three neuronal
-populations. Individual ascending neurons possess highly specific response
-properties that contrast with the rather homogeneous response properties of the
-preceding receptor neurons and local
-interneurons~(\bcite{clemens2011efficient}), which indicates a transition from
-a uniform population-wide processing stream into several parallel branches.
-Accordingly, the model pathway is divided into two distinct
+populations. Around 15 to 20 ascending neurons have been identified in the
+grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
+ascending neurons possess highly specific response properties that contrast
+with the rather homogeneous response properties of the preceding receptor
+neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
+a transition from a uniform population-wide processing stream into several
+parallel branches. Accordingly, the model pathway is divided into two distinct
 stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
 processing steps at the levels of the tympanal membrane, the receptor neurons,
 and the local interneurons; and operates on one-dimensional signal
@@ -754,16 +756,15 @@ This effect is more pronounced for lower $\fc$ of the lowpass filter and is
 presumably caused by the attenuation of high-frequency components in the
 signal, which are more prominent in the noise component $\noc(t)$ than in the
 song component $\soc(t)$. The effect also appears relatively consistent across
-different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e)
-that are presumably based on different song structures and frequency spectra.
-In summary, the standard deviation of $\env(t)$ has never been observed to
-transition into a saturation regime for larger $\sca$ but rather continues to
-increase proportionally to $\sca$ for all tested $\fc$, in both the noiseless
-and the noisy case and across different species. Consequently, the combination
-of rectification and lowpass filtering does not contribute to intensity
-invariance. However, this transformation pair does improve the SNR of $\env(t)$
-relative to $\filt(t)$ and thus provides subsequent processing stages with a
-more robust input representation and higher input SNR.
+different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e
+and appendix Fig.\,\ref{fig:app_rect-lp}). In summary, the standard deviation
+of $\env(t)$ has never been observed to saturate for larger $\sca$ but rather
+continues to increase proportionally to $\sca$ for all tested $\fc$, in both
+the noiseless and the noisy case and across different species. Consequently,
+the combination of rectification and lowpass filtering does not contribute to
+intensity invariance. However, this transformation pair does improve the SNR of
+$\env(t)$ relative to $\filt(t)$ and thus provides subsequent processing stages
+with a more robust input representation and higher input SNR.

 \begin{figure}[!ht]
    \centering
@@ -883,24 +884,23 @@ $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the
 effective intensity invariance of $\adapt(t)$ through logarithmic compression
 and adaptation is limited by the SNR of $\env(t)$: Songs that have already
 sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
-subsequent processing steps, which emphasizes the importance of the SNR
-improvement by rectification and lowpass filtering during the previous
-processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise
-regime, transient regime, and saturation regime remains consistent across
-different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of
-$\sca$ at which the saturation regime is reached (see appendix
-Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$
-within the saturation regime vary considerably between and within species. For
+subsequent processing steps. The general pattern of noise regime, transient
+regime, and saturation regime remains consistent across different
+species~(Fig.\,\ref{fig:log-hp}e). However, the saturation point --- the $\sca$
+value at which the SNR of $\adapt(t)$ starts to saturate --- and the saturation
+level --- the constant SNR of $\adapt(t)$ within the saturation regime --- vary
+considerably between and within species~(appendix
+Figs.\,\ref{fig:app_log-hp_curves}+\ref{fig:app_log-hp_saturation}). For
 example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
-lower maximum SNR of $\adapt(t)$ compared to other species. These differences
-are not to be underestimated, since the SNR of $\adapt(t)$ within the
-saturation regime determines the maximum input SNR for subsequent processing
-steps. In other words, the fact that $\adapt(t)$ eventually reaches a
-saturation regime is, of course, desirable in the context of intensity
-invariance, but it also means to pass up on the higher SNR values that are
-achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
-Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
-is a recurring phenomenon that is further addressed in the following sections.
+lower saturation level compared to other species. These differences are not to
+be underestimated, since the saturation level of $\adapt(t)$ determines the
+maximum input SNR for subsequent processing steps. In other words, the fact
+that $\adapt(t)$ eventually reaches a saturation regime is, of course,
+desirable in the context of intensity invariance, but it also means to pass up
+on the higher SNR values that are achieved by $\env(t)$ for the same $\sca$ (up
+to several orders of magnitude, Fig.\,\ref{fig:log-hp}d). This trade-off
+between intensity invariance and SNR is a recurring phenomenon that is further
+addressed in the following sections.

 \begin{figure}[!ht]
    \centering
@@ -1000,24 +1000,17 @@ sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
 both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
 saturation regime).

-The value of $\mu_f$ in the saturation regime is independent of the precise
-value of $\Theta$, but the value of $\sca$ at which the saturation regime is
-reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
-a threshold value of $\Theta=0$ would be the optimal choice for achieving
-intensity invariance at the lowest possible $\sca$. In stark contrast, the
-closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise
-component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise
-regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column,
-and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
-"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
-pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song
-component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
-$\sca$ to reach the saturation regime. This trade-off between intensity
-invariance and SNR has already been observed during the previous analysis on
-logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
-parameters that determine the SNR of $\adapt(t)$ are much less understood and
-likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
-the choice of $\Theta$ and can be more directly manipulated by the system.
+The saturation level of $f(t)$ is independent of the precise value of $\Theta$,
+but the saturation point decreases with
+$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, a threshold value of
+$\Theta=0$ would be the optimal choice for achieving intensity invariance at
+the lowest possible $\sca$. In stark contrast, the closer $\Theta$ is to 0, the
+higher $\mu_f$ in response to the pure noise component $\noc(t)$ and the lower
+the resulting SNR of $f(t)$ between noise regime and saturation
+regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
+Fig.\,\ref{fig:thresh-lp_single}e). This trade-off between intensity invariance
+and SNR has already been observed during the previous analysis on logarithmic
+compression and adaptation~(Fig.\,\ref{fig:log-hp}d).

 Finally, the effects of thresholding and temporal averaging must be seen in the
 context of the previous transformation pair of logarithmic compression and
@@ -1102,11 +1095,11 @@ that the songs of each species are eventually represented by distinct points in
 feature space. However, the species-specific trajectories cross each other at
 numerous points, which means that the songs of two species --- each at a
 specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
-the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and
-the species: For \textit{C. mollis}, all $\muf$ saturate around the same
-$\sca$, while \textit{O. rufipes} exhibits considerable variation between the
-three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$,
-the stronger the curvature of the trajectory through feature space.
+the specific saturation point of $f_i(t)$ depends on the species: For
+\textit{C. mollis}, all $\muf$ saturate around the same $\sca$, while
+\textit{O. rufipes} exhibits considerable variation between the three $f_i(t)$.
+The larger the variation in saturation points between $f_i(t)$, the stronger
+the curvature of the trajectory through feature space.

 In the noisy case, $\muf$ is non-zero even for the smallest
 $\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
@@ -1121,9 +1114,9 @@ previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
 trajectories now move a much shorter distance through feature space for a
 similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
 and saturation regime, which increases the likelihood of trajectories crossing
-each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given
-species are slightly higher in the noisy case, but the variation between
-$f_i(t)$ remains largely unchanged.
+each other. Finally, the saturation points of $f_i(t)$ for a given species are
+slightly higher in the noisy case, but the variation between $f_i(t)$ remains
+largely unchanged.

 In summary, even a comparably small set of three features $f_i(t)$ can, in
 principle, represent different species-specific songs at distinct points in
@@ -1238,15 +1231,10 @@ broader and is not centered around the single saturation point based on the
 median but rather shifted towards lower $\sca$. Care must be taken when
 interpreting the height of either distribution due to the logarithmic scaling
 of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
-specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their
+the saturation points of specific $f_i(t)$ are indeed lower than those of their
 $c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
 averaging on intensity invariance is not necessarily nullified by the previous
-logarithmic compression and adaptation, which means that both mechanisms can,
-in principle, work together towards an intensity-invariant song representation.
-% Or does one simply overwrite the other? Can there even be a higher intensity
-% invariance based on the sum of both effects? Or does one simply kick in for
-% lower scales than the other and thus dictates the overall intensity
-% invariance? Whatever, discussion material.
+logarithmic compression and adaptation.

 \begin{figure}[!ht]
    \centering
@@ -1313,7 +1301,7 @@ representation goes hand in hand with a substantial degree of redundancy and is
 hardly expected to be present in the actual grasshopper auditory system. But
 the fact that the saturated $\muf$ are distributed symmetrically around 0.5
 provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
-saturation value in the absence of logarithmic
+saturation level in the absence of logarithmic
 compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
 the capping of $\adapt(t)$, as seen during previous
 analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
@@ -1327,8 +1315,8 @@ that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
 results in a wider range of $\muf$ across the feature set, it should be
 benefitial for forming species-specific combinations. However, this depends on
 multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
-the structure and distribution of the specific song and is hence not
-guaranteed simply by disabling logarithmic compression.
+the structure and distribution of the specific song and is hence not guaranteed
+simply by disabling logarithmic compression.

 \begin{figure}[!ht]
    \centering
@@ -1560,25 +1548,241 @@ functional modelling. Other sensory systems that are either more complex or
 have not been subject to decades of study will likely not be suitable for this
 approach yet.

-% \textbf{Song recognition pathway: Grasshopper vs. model:}\\
-% The model pathway includes a rather large number of Gabor kernels compared to
-% the 15 to 20 ascending neurons in the grasshopper auditory
-% system~(\bcite{stumpner1991auditory}). 
+\subsection{Feature representation, temporal averaging, and song design}

-\subsection{Interplay of song representation and song design}
+The feature set is the final song representation along the model pathway and
+constitutes the basis for song recognition. Each feature $f_i(t)$ results from
+the thresholding of the respective kernel response $c_i(t)$ by $\nl$ and the
+subsequent temporal averaging of binary response $b_i(t)$ by a lowpass filter
+with extremely low cutoff frequency $\fc$. At a given time point $t$, $f_i(t)$
+approximately quantifies the proportion of time during which $c_i(t)$ exceeds
+the threshold value $\thr$ within the averaging interval $\tlp$ specified by
+$\fc$. The value of $f_i(t)$ is hence determined by $\thr$ with respect to the
+distribution $\pci$ of $c_i(t)$ and is restricted to the interval $[0,1]$.

-\textbf{The role of repetitive songs for the feature representation:}
-Most grasshopper songs are produced by stridulation, which refers to the
-pulling of the serrated stridulatory file on the hindlegs across a resonating
-vein on the forewings~(\bcite{helversen1977stridulatory};
-\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Every "tooth" that
-strikes the vein generates a brief sound pulse; multiple pulses make up a
-syllable; and the repetition of syllables and pauses results in a
-characteristic amplitude-modulated waveform pattern.
+Different species-specific songs are represented by different combinations of
+feature values, which should preferably be constant for the duration of a song
+to enable reliable recognition. The fundamental requirement for a constant
+$f_i(t)$ is that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all
+$t$, which is fulfilled if $\pci$ is stable across $t$. The most
+straightforward way to achieve a stable $\pci$ is that $c_i(t)$ is periodic and
+$\tlp$ is sufficiently long to average over multiple cycles of $c_i(t)$.
+Song-evoked $c_i(t)$ are indeed approximately periodic, which is largely an
+inherited property of the song itself. Most grasshopper songs are produced by
+stridulation, which refers to the pulling of the serrated stridulatory file on
+the hindlegs across a resonating vein on the
+forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song};
+\bcite{helversen1997recognition}). Every "tooth" that strikes the vein
+generates a brief sound pulse; multiple pulses make up a syllable; and the
+repetition of syllables and pauses results in a pattern with a high degree of
+temporal regularity. Accordingly, a robust feature representation in the sense
+of constant $f_i(t)$ is tightly linked to the mechanism of sound production and
+the temporal structure of the generated song.

-\subsection{Intensity invariance versus SNR along the auditory pathway}
+Various grasshopper species, especially those with longer songs like \textit{C.
+mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
+at first and then continuously increase the amplitude of their song over time.
+This slow "ramping" amplitude modulation makes the overall song less periodic
+despite its temporal regularity. The "ramping" appears more pronounced in
+$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
+compression and adaptation during the preprocessing stage might be at least
+partially beneficial for mitigating the effect of this amplitude modulation on
+later representations. However, the adaptation of $\adapt(t)$ can only act on
+certain time scales --- depending on the cutoff frequency of the underlying
+highpass filter --- and is hence not able to compensate for "ramping" across
+the entire duration of a song.

-\subsection{Behavior in a natural acoustic environment}
+Certain grasshopper species like \textit{Chorthippus dorsatus} are known to
+switch their stridulation pattern in the middle of a
+song~(\bcite{stumpner1994song}). \textit{C. dorsatus} starts stridulating with
+both hindlegs in synchrony and thereby generates a pronounced syllable-pause
+pattern similar to that of \textit{P. parallelus}. For the last part of its
+song, however, \textit{C. dorsatus} switches to an alternating leg movement,
+which results in a more continuous but not entirely unstructured rattling
+sound. It is unclear what this composite design means for the feature
+representation of \textit{C. dorsatus} songs. In principle, both parts of the
+song could result in similar $\pci$ despite their different temporal structure,
+which would allow for consistent $f_i(t)$ across the entire song. However, it
+appears more likely that only one part of the song encodes species identity,
+while the other part serves a different purpose such as fitness
+advertisement~(SOURCE?).
+
+Finally, the question remains how the choice of an appropriate averaging
+interval $\tlp$ depends on the duration and temporal structure of a song. The
+minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$ to ensure a
+stable $\pci$ and hence a constant $f_i(t)$. The maximum $\tlp$ should not
+exceed the duration of a song to avoid the inclusion of behaviorally irrelevant
+information. The longer $\tlp$, the longer $f_i(t)$ takes to stabilize after
+the onset and before the offset of a song, which narrows the time window for
+reliable recognition. The duration of species-specific grasshopper songs can
+range from a few hundred milliseconds (e\,.g \textit{Stethophyma grossum}) to
+well over a minute (e\,.g. \textit{C. mollis}), so that the optimal $\tlp$ is
+likely to differ between species.
+
+\subsection{Sensory invariances in the grasshopper auditory system}
+
+The notion of invariance is fundamental for sensory processing systems.
+Invariance, in the general sense, can be described as the property of a
+transformation to maintain variation across certain meaningful input parameters
+in its output while discarding variation across other input parameters. This
+boils down to a selective input-output decorrelation that allows the system to
+represent only those aspects of the stimulus that are behaviorally relevant to
+the organism.
+
+The grasshopper auditory system has to deal with a number of sources of
+non-informative song variation. For instance, the temporal structure of the
+song pattern warps with temperature~(\bcite{skovmand1983song}). This also
+affects certain structural parameters that are essential for song recognition,
+mainly the duration of syllables and pauses. The auditory system can compensate
+for this variation by reading out relative temporal relationships rather than
+absolute time intervals~(\bcite{creutzig2009timescale};
+\bcite{creutzig2010timescale}). The ratio of syllable duration to pause
+duration is relatively constant across temperatures and has been shown to be
+suitable for song recognition~(\bcite{helversen1972gesang}), so that there is
+likely no need to retain any information about the absolute duration of
+syllables and pauses.
+
+The situation is more complex for variations in song intensity. Song intensity
+at the receiver's position depends mostly on the distance to the sender and is
+hence not a reliable cue to infer species identity. The auditory system should
+therefore be invariant to intensity variations to recognize conspecific songs
+regardless of sender distance. However, song intensity --- specifically, the
+interaural intensity difference --- is also required for directional hearing,
+which is essential for phonotaxis~(\bcite{helversen1988interaural}). Conflicts
+between song recognition and directional hearing are avoided in the auditory
+system by distributing both functions across two parallel
+pathways~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}). This is
+the main reason why our model pathway is focused entirely on song recognition
+and has no capacity for directional hearing, no matter how relevant it may be
+to the grasshopper.
+
+Furthermore, "invariance to variations in song intensity" does not do justice
+to the full extent of the problem. Intensity is a function of song amplitude
+within a certain time frame. It can refer to the individual syllables and
+pauses of the song pattern as well as the entire song --- the former is
+relevant for song recognition, while the latter is not. Intensity invariance in
+the current context can therefore be described as time scale-selective
+sensitivity to the faster amplitude dynamics of the song pattern and
+simultaneous insensitivity to slower, more sustained amplitude dynamics. In the
+model pathway, this time scale selectivity is reflected by the cutoff frequency
+$\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most
+$\fc$ are effective in removing the local offset of $\db(t)$ and render
+$\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ will leave the
+relevant amplitude dynamics of the song pattern intact.
+
+\subsection{Intensity invariance versus SNR}
+
+Each processing step along the model pathway is a transformation between input
+representation and output representation. The intensity of the input is
+characterized by scale $\sca$. The intensity of the output is characterized by
+an appropriate intensity measure. If the transformation renders the output more
+intensity-invariant, then the intensity measure will saturate for sufficiently
+large $\sca$, which caps the output SNR to a constant value across these
+$\sca$. Otherwise, the intensity measure and hence the output SNR will increase
+monotonically with $\sca$. The trade-off between intensity invariance and SNR
+refers to the principle that a transformation can either improve intensity
+invariance or maintain SNR --- it cannot do both at the same time. This
+principle is presumably not specific to the two mechanisms along the model
+pathway but rather a general property of transformations that equalize between
+different input intensities.
+
+Logarithmic compression and adaptation by highpass filtering is capable of
+equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
+output $\adapt(t)$ is a perfectly intensity-invariant representation of song
+component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
+limits the effectiveness of this mechanism to sufficiently large $\sca$. This
+means that intensity invariance and SNR interact at the input level, as well.
+Specifically, the saturation point of $\adapt(t)$ is determined by the input
+SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
+$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
+$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
+frequencies outside the relevant range of grasshopper songs. The SNR is then
+further improved by the rectification and lowpass filtering of $\filt(t)$ into
+$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
+lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
+$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
+amplitude dynamics of the song pattern. The saturation level of $\adapt$,
+unlike its saturation point, is independent of the SNR of $\env(t)$ because the
+influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
+saturation level and the saturation point of $\adapt(t)$ vary between different
+species and specific songs. These differences are likely rooted in the way in
+which logarithmic compression acts on the specific distribution of $\env(t)$,
+which is determined by $\fc$ and the structure and frequency spectrum of the
+rectified $\filt(t)$.
+
+Thresholding and temporal averaging renders feature $f_i(t)$
+intensity-invariant for sufficiently large $\sca$. The trade-off between
+intensity invariance and SNR is mediated by threshold value $\thr$. A lower
+$\thr$ ($\thr\to0$) improves intensity invariance by shifting the saturation
+point towards lower $\sca$ but also decreases the SNR of $f_i(t)$. The
+saturation level of $f_i(t)$ is independent of $\thr$ as long as the intensity
+invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
+therefore determined solely by the pure-noise response of $f_i(t)$. The
+distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
+normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
+of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
+$\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
+value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
+higher saturation point. In this case, any non-zero feature value that is
+sustained for a sufficient duration could serve as indicator for the presence
+of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
+of $\thr$ to the properties of both the species-specific song and the natural
+noise in a certain habitat.
+
+
+It seems reasonable to assume that $\thr$ is one of the parameters along the
+pathway
+
+Physiologically, it is presumably easier to
+manipulate $\thr$ 
+
+
+It seems reasonable that $\thr$ is easier to
+manipulate in ev
+
+
+Furthermore, $\thr$ is presumably a parameter along
+the pathway that 
+
+
+$\thr$
+
+
+Furthermore, $\thr$ might be one of the parameters
+along the pathway 
+
+
+
+% However, the parameters that determine the SNR of $\adapt(t)$ are much less
+% understood and likely relate to properties of the signal, whereas the SNR of
+% $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
+% by the system.
+
+\newpage
+\textbf{Thresh-LP: Implication for intensity invariance:}\\
+
+- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
+$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
+$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
+other criteria such as song-noise separation or diversity between features
+
+- Nonlinear operations can be used to detach representations from graded physical
+stimulus (to fasciliate categorical behavioral decision-making?):\\
+1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
+$\rightarrow$ Closely following the AM of the acoustic stimulus\\
+2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
+$\rightarrow$ More decorrelated representation, compared to prior stages\\
+3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
+$\rightarrow$ Trading a graded scale for two or more categorical states\\
+4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
+$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
+5) Categorical behavioral decision-making requires further nonlinearities\\
+$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
+initiation of one behavior over another is categorical (e.g. approach/stay)
+
+\subsection{Intensity invariance versus intensity invariance}
+
+\subsection{Implications for behavior in a natural acoustic environment}

 % RIPPED FROM INTRODUCTION:

@@ -1645,63 +1849,6 @@ operate on unmodified recordings of natural grasshopper songs instead of
 condensed pulse train approximations, which widens its scope towards more
 realistic, ecologically relevant scenarios.

-\textbf{Excursion into time-warp invariance:}
-For instance, the temporal structure of grasshopper songs warps with
-temperature~(\bcite{skovmand1983song}). The auditory system can compensate for
-this variability by reading out relative temporal relationships rather than
-absolute time intervals~(\bcite{creutzig2009timescale};
-\bcite{creutzig2010timescale}), as those remain relatively constant across
-different temperatures~(\bcite{helversen1972gesang}).
-
-\textbf{Definition of invariance (general, systemic):}\\
-Invariance = Property of a system to maintain a stable output with respect to a
-set of relevant input parameters (variation to be represented) but irrespective
-of one or more other parameters (variation to be discarded)
-$\rightarrow$ Selective input-output decorrelation
-
-\textbf{Definition of intensity invariance (context of neurons and songs):}\\
-Intensity invariance = Time scale-selective sensitivity to certain faster
-amplitude dynamics (song waveform, small-scale AM) and simultaneous
-insensitivity to slower, more sustained amplitude dynamics (transient baseline,
-large-scale AM, current overall intensity level)\\
-$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
-output will be a flat line
-
-\textbf{Log-HP: Implication for intensity invariance:}\\
- Logarithmic scaling is essential for equalizing different song intensities\\
-$\rightarrow$ Intensity information can be manipulated more easily when in form
-of a signal offset in log-space than a multiplicative scale in linear space
-
- Capability to compensate for intensity variations, i.e. selective amplification
-of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
-$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
-$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
-
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
-$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
-
-\textbf{Thresh-LP: Implication for intensity invariance:}\\
- Role of song periodicity for feature representation!
-
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
-$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
-$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
-other criteria such as song-noise separation or diversity between features
-
- Nonlinear operations can be used to detach representations from graded physical
-stimulus (to fasciliate categorical behavioral decision-making?):\\
-1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
-$\rightarrow$ Closely following the AM of the acoustic stimulus\\
-2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
-$\rightarrow$ More decorrelated representation, compared to prior stages\\
-3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
-$\rightarrow$ Trading a graded scale for two or more categorical states\\
-4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
-$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
-5) Categorical behavioral decision-making requires further nonlinearities\\
-$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
-initiation of one behavior over another is categorical (e.g. approach/stay)
-
 \newpage
 \section{Appendix}

@@ -1716,7 +1863,7 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
                     $\noc(t)$ within the signal envelope $\env(t)$ over scale
                     $\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$
                     (corresponding to the analysis underlying
-                     Fig.\,\ref{fig:rect-lp}), using random 100 realizations of
+                     Fig.\,\ref{fig:rect-lp}), using 100 random realizations of
                     $\noc(t)$.}
    \label{fig:app_env-sd}
 \end{figure}% Referenced.