Writing discussion.

2026-05-28 18:17:59 +02:00
parent 6cd56b82b0
commit 1878fb5eaf
2 changed files with 289 additions and 142 deletions
--- a/main.pdf
+++ b/main.pdf
--- a/main.tex
+++ b/main.tex
@@ -104,6 +104,7 @@
 \newcommand{\nsig}{\sigma_{\eta}} % Noise component standard deviation
 \newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
 \newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
 \newcommand{\pci}{p(c_i,\,\tlp)} % Kernel-specific probability density (lowpass interval)
 \newcommand{\muf}{\mu_{f_i}} % Average feature value
 \section{Introduction}
@@ -258,12 +259,13 @@ initiation~(\bcite{ronacher1986routes}; \bcite{bauer1987separate};
 \bcite{bhavsar2017brain}).
 Functionally, the ascending neurons are the most diverse of the three neuronal
-populations. Individual ascending neurons possess highly specific response
+populations. Around 15 to 20 ascending neurons have been identified in the
-properties that contrast with the rather homogeneous response properties of the
+grasshopper auditory system~(\bcite{stumpner1991auditory}). Individual
-preceding receptor neurons and local
+ascending neurons possess highly specific response properties that contrast
-interneurons~(\bcite{clemens2011efficient}), which indicates a transition from
+with the rather homogeneous response properties of the preceding receptor
-a uniform population-wide processing stream into several parallel branches.
+neurons and local interneurons~(\bcite{clemens2011efficient}), which indicates
-Accordingly, the model pathway is divided into two distinct
+a transition from a uniform population-wide processing stream into several
 parallel branches. Accordingly, the model pathway is divided into two distinct
 stages~(Fig.\,\ref{fig:pathway}d): The preprocessing stage incorporates the
 processing steps at the levels of the tympanal membrane, the receptor neurons,
 and the local interneurons; and operates on one-dimensional signal
@@ -754,16 +756,15 @@ This effect is more pronounced for lower $\fc$ of the lowpass filter and is
 presumably caused by the attenuation of high-frequency components in the
 signal, which are more prominent in the noise component $\noc(t)$ than in the
 song component $\soc(t)$. The effect also appears relatively consistent across
-different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e)
+different species, although small variations exist~(Fig.\,\ref{fig:rect-lp}e
-that are presumably based on different song structures and frequency spectra.
+and appendix Fig.\,\ref{fig:app_rect-lp}). In summary, the standard deviation
-In summary, the standard deviation of $\env(t)$ has never been observed to
+of $\env(t)$ has never been observed to saturate for larger $\sca$ but rather
-transition into a saturation regime for larger $\sca$ but rather continues to
+continues to increase proportionally to $\sca$ for all tested $\fc$, in both
-increase proportionally to $\sca$ for all tested $\fc$, in both the noiseless
+the noiseless and the noisy case and across different species. Consequently,
-and the noisy case and across different species. Consequently, the combination
+the combination of rectification and lowpass filtering does not contribute to
-of rectification and lowpass filtering does not contribute to intensity
+intensity invariance. However, this transformation pair does improve the SNR of
-invariance. However, this transformation pair does improve the SNR of $\env(t)$
+$\env(t)$ relative to $\filt(t)$ and thus provides subsequent processing stages
-relative to $\filt(t)$ and thus provides subsequent processing stages with a
+with a more robust input representation and higher input SNR.
 more robust input representation and higher input SNR.
 \begin{figure}[!ht]
    \centering
@@ -883,24 +884,23 @@ $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the
 effective intensity invariance of $\adapt(t)$ through logarithmic compression
 and adaptation is limited by the SNR of $\env(t)$: Songs that have already
 sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
-subsequent processing steps, which emphasizes the importance of the SNR
+subsequent processing steps. The general pattern of noise regime, transient
-improvement by rectification and lowpass filtering during the previous
+regime, and saturation regime remains consistent across different
-processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise
+species~(Fig.\,\ref{fig:log-hp}e). However, the saturation point --- the $\sca$
-regime, transient regime, and saturation regime remains consistent across
+value at which the SNR of $\adapt(t)$ starts to saturate --- and the saturation
-different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of
+level --- the constant SNR of $\adapt(t)$ within the saturation regime --- vary
-$\sca$ at which the saturation regime is reached (see appendix
+considerably between and within species~(appendix
-Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$
+Figs.\,\ref{fig:app_log-hp_curves}+\ref{fig:app_log-hp_saturation}). For
 within the saturation regime vary considerably between and within species. For
 example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
-lower maximum SNR of $\adapt(t)$ compared to other species. These differences
+lower saturation level compared to other species. These differences are not to
-are not to be underestimated, since the SNR of $\adapt(t)$ within the
+be underestimated, since the saturation level of $\adapt(t)$ determines the
-saturation regime determines the maximum input SNR for subsequent processing
+maximum input SNR for subsequent processing steps. In other words, the fact
-steps. In other words, the fact that $\adapt(t)$ eventually reaches a
+that $\adapt(t)$ eventually reaches a saturation regime is, of course,
-saturation regime is, of course, desirable in the context of intensity
+desirable in the context of intensity invariance, but it also means to pass up
-invariance, but it also means to pass up on the higher SNR values that are
+on the higher SNR values that are achieved by $\env(t)$ for the same $\sca$ (up
-achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
+to several orders of magnitude, Fig.\,\ref{fig:log-hp}d). This trade-off
-Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
+between intensity invariance and SNR is a recurring phenomenon that is further
-is a recurring phenomenon that is further addressed in the following sections.
+addressed in the following sections.
 \begin{figure}[!ht]
    \centering
@@ -1000,24 +1000,17 @@ sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
 both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
 saturation regime).
-The value of $\mu_f$ in the saturation regime is independent of the precise
+The saturation level of $f(t)$ is independent of the precise value of $\Theta$,
-value of $\Theta$, but the value of $\sca$ at which the saturation regime is
+but the saturation point decreases with
-reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
+$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore, a threshold value of
-a threshold value of $\Theta=0$ would be the optimal choice for achieving
+$\Theta=0$ would be the optimal choice for achieving intensity invariance at
-intensity invariance at the lowest possible $\sca$. In stark contrast, the
+the lowest possible $\sca$. In stark contrast, the closer $\Theta$ is to 0, the
-closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise
+higher $\mu_f$ in response to the pure noise component $\noc(t)$ and the lower
-component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise
+the resulting SNR of $f(t)$ between noise regime and saturation
-regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column,
+regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
-and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
+Fig.\,\ref{fig:thresh-lp_single}e). This trade-off between intensity invariance
-"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
+and SNR has already been observed during the previous analysis on logarithmic
-pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song
+compression and adaptation~(Fig.\,\ref{fig:log-hp}d).
 component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
 $\sca$ to reach the saturation regime. This trade-off between intensity
 invariance and SNR has already been observed during the previous analysis on
 logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
 parameters that determine the SNR of $\adapt(t)$ are much less understood and
 likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
 the choice of $\Theta$ and can be more directly manipulated by the system.
 Finally, the effects of thresholding and temporal averaging must be seen in the
 context of the previous transformation pair of logarithmic compression and
@@ -1102,11 +1095,11 @@ that the songs of each species are eventually represented by distinct points in
 feature space. However, the species-specific trajectories cross each other at
 numerous points, which means that the songs of two species --- each at a
 specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
-the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and
+the specific saturation point of $f_i(t)$ depends on the species: For
-the species: For \textit{C. mollis}, all $\muf$ saturate around the same
+\textit{C. mollis}, all $\muf$ saturate around the same $\sca$, while
-$\sca$, while \textit{O. rufipes} exhibits considerable variation between the
+\textit{O. rufipes} exhibits considerable variation between the three $f_i(t)$.
-three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$,
+The larger the variation in saturation points between $f_i(t)$, the stronger
-the stronger the curvature of the trajectory through feature space.
+the curvature of the trajectory through feature space.
 In the noisy case, $\muf$ is non-zero even for the smallest
 $\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
@@ -1121,9 +1114,9 @@ previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
 trajectories now move a much shorter distance through feature space for a
 similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
 and saturation regime, which increases the likelihood of trajectories crossing
-each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given
+each other. Finally, the saturation points of $f_i(t)$ for a given species are
-species are slightly higher in the noisy case, but the variation between
+slightly higher in the noisy case, but the variation between $f_i(t)$ remains
-$f_i(t)$ remains largely unchanged.
+largely unchanged.
 In summary, even a comparably small set of three features $f_i(t)$ can, in
 principle, represent different species-specific songs at distinct points in
@@ -1238,15 +1231,10 @@ broader and is not centered around the single saturation point based on the
 median but rather shifted towards lower $\sca$. Care must be taken when
 interpreting the height of either distribution due to the logarithmic scaling
 of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
-specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their
+the saturation points of specific $f_i(t)$ are indeed lower than those of their
 $c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
 averaging on intensity invariance is not necessarily nullified by the previous
-logarithmic compression and adaptation, which means that both mechanisms can,
+logarithmic compression and adaptation.
 in principle, work together towards an intensity-invariant song representation.
 % Or does one simply overwrite the other? Can there even be a higher intensity
 % invariance based on the sum of both effects? Or does one simply kick in for
 % lower scales than the other and thus dictates the overall intensity
 % invariance? Whatever, discussion material.
 \begin{figure}[!ht]
    \centering
@@ -1313,7 +1301,7 @@ representation goes hand in hand with a substantial degree of redundancy and is
 hardly expected to be present in the actual grasshopper auditory system. But
 the fact that the saturated $\muf$ are distributed symmetrically around 0.5
 provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
-saturation value in the absence of logarithmic
+saturation level in the absence of logarithmic
 compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
 the capping of $\adapt(t)$, as seen during previous
 analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
@@ -1327,8 +1315,8 @@ that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
 results in a wider range of $\muf$ across the feature set, it should be
 benefitial for forming species-specific combinations. However, this depends on
 multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
-the structure and distribution of the specific song and is hence not
+the structure and distribution of the specific song and is hence not guaranteed
-guaranteed simply by disabling logarithmic compression.
+simply by disabling logarithmic compression.
 \begin{figure}[!ht]
    \centering
@@ -1560,25 +1548,241 @@ functional modelling. Other sensory systems that are either more complex or
 have not been subject to decades of study will likely not be suitable for this
 approach yet.
-% \textbf{Song recognition pathway: Grasshopper vs. model:}\\
+\subsection{Feature representation, temporal averaging, and song design}
 % The model pathway includes a rather large number of Gabor kernels compared to
 % the 15 to 20 ascending neurons in the grasshopper auditory
 % system~(\bcite{stumpner1991auditory}). 
-\subsection{Interplay of song representation and song design}
+The feature set is the final song representation along the model pathway and
 constitutes the basis for song recognition. Each feature $f_i(t)$ results from
 the thresholding of the respective kernel response $c_i(t)$ by $\nl$ and the
 subsequent temporal averaging of binary response $b_i(t)$ by a lowpass filter
 with extremely low cutoff frequency $\fc$. At a given time point $t$, $f_i(t)$
 approximately quantifies the proportion of time during which $c_i(t)$ exceeds
 the threshold value $\thr$ within the averaging interval $\tlp$ specified by
 $\fc$. The value of $f_i(t)$ is hence determined by $\thr$ with respect to the
 distribution $\pci$ of $c_i(t)$ and is restricted to the interval $[0,1]$.
-\textbf{The role of repetitive songs for the feature representation:}
+Different species-specific songs are represented by different combinations of
-Most grasshopper songs are produced by stridulation, which refers to the
+feature values, which should preferably be constant for the duration of a song
-pulling of the serrated stridulatory file on the hindlegs across a resonating
+to enable reliable recognition. The fundamental requirement for a constant
-vein on the forewings~(\bcite{helversen1977stridulatory};
+$f_i(t)$ is that the time where $c_i(t)>\thr$ during $\tlp$ is the same for all
-\bcite{stumpner1994song}; \bcite{helversen1997recognition}). Every "tooth" that
+$t$, which is fulfilled if $\pci$ is stable across $t$. The most
-strikes the vein generates a brief sound pulse; multiple pulses make up a
+straightforward way to achieve a stable $\pci$ is that $c_i(t)$ is periodic and
-syllable; and the repetition of syllables and pauses results in a
+$\tlp$ is sufficiently long to average over multiple cycles of $c_i(t)$.
-characteristic amplitude-modulated waveform pattern.
+Song-evoked $c_i(t)$ are indeed approximately periodic, which is largely an
 inherited property of the song itself. Most grasshopper songs are produced by
 stridulation, which refers to the pulling of the serrated stridulatory file on
 the hindlegs across a resonating vein on the
 forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song};
 \bcite{helversen1997recognition}). Every "tooth" that strikes the vein
 generates a brief sound pulse; multiple pulses make up a syllable; and the
 repetition of syllables and pauses results in a pattern with a high degree of
 temporal regularity. Accordingly, a robust feature representation in the sense
 of constant $f_i(t)$ is tightly linked to the mechanism of sound production and
 the temporal structure of the generated song.
-\subsection{Intensity invariance versus SNR along the auditory pathway}
+Various grasshopper species, especially those with longer songs like \textit{C.
 mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly
 at first and then continuously increase the amplitude of their song over time.
 This slow "ramping" amplitude modulation makes the overall song less periodic
 despite its temporal regularity. The "ramping" appears more pronounced in
 $\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic
 compression and adaptation during the preprocessing stage might be at least
 partially beneficial for mitigating the effect of this amplitude modulation on
 later representations. However, the adaptation of $\adapt(t)$ can only act on
 certain time scales --- depending on the cutoff frequency of the underlying
 highpass filter --- and is hence not able to compensate for "ramping" across
 the entire duration of a song.
-\subsection{Behavior in a natural acoustic environment}
+Certain grasshopper species like \textit{Chorthippus dorsatus} are known to
 switch their stridulation pattern in the middle of a
 song~(\bcite{stumpner1994song}). \textit{C. dorsatus} starts stridulating with
 both hindlegs in synchrony and thereby generates a pronounced syllable-pause
 pattern similar to that of \textit{P. parallelus}. For the last part of its
 song, however, \textit{C. dorsatus} switches to an alternating leg movement,
 which results in a more continuous but not entirely unstructured rattling
 sound. It is unclear what this composite design means for the feature
 representation of \textit{C. dorsatus} songs. In principle, both parts of the
 song could result in similar $\pci$ despite their different temporal structure,
 which would allow for consistent $f_i(t)$ across the entire song. However, it
 appears more likely that only one part of the song encodes species identity,
 while the other part serves a different purpose such as fitness
 advertisement~(SOURCE?).
 Finally, the question remains how the choice of an appropriate averaging
 interval $\tlp$ depends on the duration and temporal structure of a song. The
 minimum $\tlp$ should encompass at least a few cycles of $c_i(t)$ to ensure a
 stable $\pci$ and hence a constant $f_i(t)$. The maximum $\tlp$ should not
 exceed the duration of a song to avoid the inclusion of behaviorally irrelevant
 information. The longer $\tlp$, the longer $f_i(t)$ takes to stabilize after
 the onset and before the offset of a song, which narrows the time window for
 reliable recognition. The duration of species-specific grasshopper songs can
 range from a few hundred milliseconds (e\,.g \textit{Stethophyma grossum}) to
 well over a minute (e\,.g. \textit{C. mollis}), so that the optimal $\tlp$ is
 likely to differ between species.
 \subsection{Sensory invariances in the grasshopper auditory system}
 The notion of invariance is fundamental for sensory processing systems.
 Invariance, in the general sense, can be described as the property of a
 transformation to maintain variation across certain meaningful input parameters
 in its output while discarding variation across other input parameters. This
 boils down to a selective input-output decorrelation that allows the system to
 represent only those aspects of the stimulus that are behaviorally relevant to
 the organism.
 The grasshopper auditory system has to deal with a number of sources of
 non-informative song variation. For instance, the temporal structure of the
 song pattern warps with temperature~(\bcite{skovmand1983song}). This also
 affects certain structural parameters that are essential for song recognition,
 mainly the duration of syllables and pauses. The auditory system can compensate
 for this variation by reading out relative temporal relationships rather than
 absolute time intervals~(\bcite{creutzig2009timescale};
 \bcite{creutzig2010timescale}). The ratio of syllable duration to pause
 duration is relatively constant across temperatures and has been shown to be
 suitable for song recognition~(\bcite{helversen1972gesang}), so that there is
 likely no need to retain any information about the absolute duration of
 syllables and pauses.
 The situation is more complex for variations in song intensity. Song intensity
 at the receiver's position depends mostly on the distance to the sender and is
 hence not a reliable cue to infer species identity. The auditory system should
 therefore be invariant to intensity variations to recognize conspecific songs
 regardless of sender distance. However, song intensity --- specifically, the
 interaural intensity difference --- is also required for directional hearing,
 which is essential for phonotaxis~(\bcite{helversen1988interaural}). Conflicts
 between song recognition and directional hearing are avoided in the auditory
 system by distributing both functions across two parallel
 pathways~(\bcite{helversen1984parallel}; \bcite{ronacher1986routes}). This is
 the main reason why our model pathway is focused entirely on song recognition
 and has no capacity for directional hearing, no matter how relevant it may be
 to the grasshopper.
 Furthermore, "invariance to variations in song intensity" does not do justice
 to the full extent of the problem. Intensity is a function of song amplitude
 within a certain time frame. It can refer to the individual syllables and
 pauses of the song pattern as well as the entire song --- the former is
 relevant for song recognition, while the latter is not. Intensity invariance in
 the current context can therefore be described as time scale-selective
 sensitivity to the faster amplitude dynamics of the song pattern and
 simultaneous insensitivity to slower, more sustained amplitude dynamics. In the
 model pathway, this time scale selectivity is reflected by the cutoff frequency
 $\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most
 $\fc$ are effective in removing the local offset of $\db(t)$ and render
 $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ will leave the
 relevant amplitude dynamics of the song pattern intact.
 \subsection{Intensity invariance versus SNR}
 Each processing step along the model pathway is a transformation between input
 representation and output representation. The intensity of the input is
 characterized by scale $\sca$. The intensity of the output is characterized by
 an appropriate intensity measure. If the transformation renders the output more
 intensity-invariant, then the intensity measure will saturate for sufficiently
 large $\sca$, which caps the output SNR to a constant value across these
 $\sca$. Otherwise, the intensity measure and hence the output SNR will increase
 monotonically with $\sca$. The trade-off between intensity invariance and SNR
 refers to the principle that a transformation can either improve intensity
 invariance or maintain SNR --- it cannot do both at the same time. This
 principle is presumably not specific to the two mechanisms along the model
 pathway but rather a general property of transformations that equalize between
 different input intensities.
 Logarithmic compression and adaptation by highpass filtering is capable of
 equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$,
 output $\adapt(t)$ is a perfectly intensity-invariant representation of song
 component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$
 limits the effectiveness of this mechanism to sufficiently large $\sca$. This
 means that intensity invariance and SNR interact at the input level, as well.
 Specifically, the saturation point of $\adapt(t)$ is determined by the input
 SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal
 $\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of
 $\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates
 frequencies outside the relevant range of grasshopper songs. The SNR is then
 further improved by the rectification and lowpass filtering of $\filt(t)$ into
 $\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the
 lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given
 $\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant
 amplitude dynamics of the song pattern. The saturation level of $\adapt$,
 unlike its saturation point, is independent of the SNR of $\env(t)$ because the
 influence of $\noc(t)$ is negligible for sufficiently large $\sca$. Both the
 saturation level and the saturation point of $\adapt(t)$ vary between different
 species and specific songs. These differences are likely rooted in the way in
 which logarithmic compression acts on the specific distribution of $\env(t)$,
 which is determined by $\fc$ and the structure and frequency spectrum of the
 rectified $\filt(t)$.
 Thresholding and temporal averaging renders feature $f_i(t)$
 intensity-invariant for sufficiently large $\sca$. The trade-off between
 intensity invariance and SNR is mediated by threshold value $\thr$. A lower
 $\thr$ ($\thr\to0$) improves intensity invariance by shifting the saturation
 point towards lower $\sca$ but also decreases the SNR of $f_i(t)$. The
 saturation level of $f_i(t)$ is independent of $\thr$ as long as the intensity
 invariance by the previous mechanism is neglected. The SNR of $f_i(t)$ is
 therefore determined solely by the pure-noise response of $f_i(t)$. The
 distribution $\pci$ of the pure-noise kernel response $c_i(t)$ is largely a
 normal distribution with mean $\mu\approx0$ for all kernels $k_i(t)$. The value
 of the pure-noise $f_i(t)$ is hence 0.5 for $\thr=0$ and decreases for larger
 $\thr$. If $\thr$ is set above the maximum of $c_i(t)$, the pure-noise feature
 value is 0, which results in an "unlimited" SNR of $f_i(t)$ at the cost of a
 higher saturation point. In this case, any non-zero feature value that is
 sustained for a sufficient duration could serve as indicator for the presence
 of $\soc(t)$ in addition to $\noc(t)$. This requires a fine evolutionary tuning
 of $\thr$ to the properties of both the species-specific song and the natural
 noise in a certain habitat.
 It seems reasonable to assume that $\thr$ is one of the parameters along the
 pathway
 Physiologically, it is presumably easier to
 manipulate $\thr$ 
 It seems reasonable that $\thr$ is easier to
 manipulate in ev
 Furthermore, $\thr$ is presumably a parameter along
 the pathway that 
 $\thr$
 Furthermore, $\thr$ might be one of the parameters
 along the pathway 
 % However, the parameters that determine the SNR of $\adapt(t)$ are much less
 % understood and likely relate to properties of the signal, whereas the SNR of
 % $f(t)$ depends on the choice of $\Theta$ and can be more directly manipulated
 % by the system.
 \newpage
 \textbf{Thresh-LP: Implication for intensity invariance:}\\
 - Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
 $\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
 $\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
 other criteria such as song-noise separation or diversity between features
 - Nonlinear operations can be used to detach representations from graded physical
 stimulus (to fasciliate categorical behavioral decision-making?):\\
 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
 $\rightarrow$ Closely following the AM of the acoustic stimulus\\
 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
 $\rightarrow$ More decorrelated representation, compared to prior stages\\
 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
 $\rightarrow$ Trading a graded scale for two or more categorical states\\
 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
 $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
 5) Categorical behavioral decision-making requires further nonlinearities\\
 $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
 initiation of one behavior over another is categorical (e.g. approach/stay)
 \subsection{Intensity invariance versus intensity invariance}
 \subsection{Implications for behavior in a natural acoustic environment}
 % RIPPED FROM INTRODUCTION:
@@ -1645,63 +1849,6 @@ operate on unmodified recordings of natural grasshopper songs instead of
 condensed pulse train approximations, which widens its scope towards more
 realistic, ecologically relevant scenarios.
 \textbf{Excursion into time-warp invariance:}
 For instance, the temporal structure of grasshopper songs warps with
 temperature~(\bcite{skovmand1983song}). The auditory system can compensate for
 this variability by reading out relative temporal relationships rather than
 absolute time intervals~(\bcite{creutzig2009timescale};
 \bcite{creutzig2010timescale}), as those remain relatively constant across
 different temperatures~(\bcite{helversen1972gesang}).
 \textbf{Definition of invariance (general, systemic):}\\
 Invariance = Property of a system to maintain a stable output with respect to a
 set of relevant input parameters (variation to be represented) but irrespective
 of one or more other parameters (variation to be discarded)
 $\rightarrow$ Selective input-output decorrelation
 \textbf{Definition of intensity invariance (context of neurons and songs):}\\
 Intensity invariance = Time scale-selective sensitivity to certain faster
 amplitude dynamics (song waveform, small-scale AM) and simultaneous
 insensitivity to slower, more sustained amplitude dynamics (transient baseline,
 large-scale AM, current overall intensity level)\\
 $\rightarrow$ Without time scale selectivity, any fully intensity-invariant
 output will be a flat line
 \textbf{Log-HP: Implication for intensity invariance:}\\
 - Logarithmic scaling is essential for equalizing different song intensities\\
 $\rightarrow$ Intensity information can be manipulated more easily when in form
 of a signal offset in log-space than a multiplicative scale in linear space
 - Capability to compensate for intensity variations, i.e. selective amplification
 of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
 $\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
 $\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
 - Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
 $\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
 \textbf{Thresh-LP: Implication for intensity invariance:}\\
 - Role of song periodicity for feature representation!
 - Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
 $\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
 $\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
 other criteria such as song-noise separation or diversity between features
 - Nonlinear operations can be used to detach representations from graded physical
 stimulus (to fasciliate categorical behavioral decision-making?):\\
 1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
 $\rightarrow$ Closely following the AM of the acoustic stimulus\\
 2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
 $\rightarrow$ More decorrelated representation, compared to prior stages\\
 3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
 $\rightarrow$ Trading a graded scale for two or more categorical states\\
 4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
 $\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
 5) Categorical behavioral decision-making requires further nonlinearities\\
 $\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
 initiation of one behavior over another is categorical (e.g. approach/stay)
 \newpage
 \section{Appendix}
@@ -1716,7 +1863,7 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
                     $\noc(t)$ within the signal envelope $\env(t)$ over scale
                     $\sca$. Based on input $\raw(t)$ with $\sigma_{\eta}=1$
                     (corresponding to the analysis underlying
-                     Fig.\,\ref{fig:rect-lp}), using random 100 realizations of
+                     Fig.\,\ref{fig:rect-lp}), using 100 random realizations of
                     $\noc(t)$.}
    \label{fig:app_env-sd}
 \end{figure}% Referenced.