diff --git a/main.pdf b/main.pdf index 493383d..81040ed 100644 Binary files a/main.pdf and b/main.pdf differ diff --git a/main.tex b/main.tex index bf989ef..b74fd6c 100644 --- a/main.tex +++ b/main.tex @@ -1595,7 +1595,7 @@ does not change substantially within $\tstat$. % Constraints on the song structure: % Also: Constant model features vs. actual grasshopper (calling) songs: -% (Also: Third revision and this section still doesn't sound good) +% (Also: Third revision and still far from done and good) Grasshoppers sing by pulling the stridulatory file on the hindlegs across a resonating vein on the forewings~(\bcite{helversen1977stridulatory}; \bcite{stumpner1994song}; \bcite{helversen1997recognition}). Different @@ -1660,6 +1660,7 @@ as soon as $f_i(t)$ is within tolerance or wait for $f_i(t)$ to stabilize for additional certainty. \subsection{Invariant processing in the grasshopper auditory system} +\label{sec:general_inv} % Invariance in the general (systemic) sense: The notion of invariance is fundamental for sensory processing systems. @@ -1710,45 +1711,64 @@ time scale-selectivity is reflected by the cutoff frequency $\fc$ of the highpass filter that underlies the adaptation of $\adapt(t)$: Most $\fc$ except the lowest ones are effective in removing the local offset of $\db(t)$ and render $\adapt(t)$ intensity-invariant, but only sufficiently low $\fc$ -preserve the relevant amplitude dynamics of the song pattern. Intensity -invariance by thresholding and temporal averaging also has a relevant time -scale, which is determined by the averaging interval $\tlp$. However, this time -scale is not constrained by the need to preserve the temporal structure of the -song pattern but to provide a suitable degree of temporal integration across -the song pattern~(Section\,\ref{sec:constant_feat}). +preserve the relevant amplitude dynamics of the song pattern. The time scale of +intensity invariance by thresholding and temporal averaging is determined by +the averaging interval $\tlp$. However, unlike $\fc$, $\tlp$ is not constrained +by the need to preserve the song pattern but rather to provide a suitable +degree of temporal integration~(Section\,\ref{sec:constant_feat}). -\subsection{Intensity invariance versus SNR} +\subsection{Intensity invariance versus SNR along the model pathway} -Each processing step along the model pathway is a transformation between input -representation and output representation. The intensity of the input is -characterized by scale $\sca$. The intensity of the output is characterized by -an appropriate intensity measure. If the transformation renders the output more -intensity-invariant, then the intensity measure will saturate for sufficiently -large $\sca$, which caps the output SNR to a constant value across these -$\sca$. Otherwise, the intensity measure and hence the output SNR will increase -monotonically with $\sca$. The trade-off between intensity invariance and SNR -refers to the principle that a transformation can either improve intensity -invariance or maintain SNR --- it cannot do both at the same time. This -principle is presumably not specific to the two mechanisms along the model -pathway but rather a general property of transformations that equalize between -different input intensities. +% % Establishing the principle trade-off (should maybe come later?): +% The output of a transformation is considered to be intensity-invariant if its +% intensity measure saturates for sufficiently large scales $\sca$, which in turn +% caps the output SNR to a constant value across these $\sca$. Otherwise, the +% output SNR will increase monotonically with $\sca$. The trade-off between +% intensity invariance and SNR refers to the principle that a transformation can +% either improve intensity invariance or maintain SNR --- it cannot do both at +% the same time. This principle is most likely not specific to the two mechanisms +% along the model pathway but rather a general property of transformations that +% equalize between different input intensities. -Logarithmic compression and adaptation by highpass filtering is capable of -equalizing a wide range of $\sca$. In the absence of noise component $\noc(t)$, -output $\adapt(t)$ is a perfectly intensity-invariant representation of song -component $\soc(t)$ across all $\sca>0$. However, the presence of $\noc(t)$ -limits the effectiveness of this mechanism to sufficiently large $\sca$. This -means that intensity invariance and SNR interact at the input level, as well. -Specifically, the saturation point of $\adapt(t)$ is determined by the input -SNR of $\env(t)$, which in turn depends on the initial SNR of the sound signal -$\raw(t)$. This initial SNR is presumably improved by the bandpass filtering of -$\raw(t)$ into $\filt(t)$ at the tympanal membrane, which attenuates -frequencies outside the relevant range of grasshopper songs. The SNR is then -further improved by the rectification and lowpass filtering of $\filt(t)$ into -$\env(t)$. This improvement depends on the cutoff frequency $\fc$ of the -lowpass filter --- the lower $\fc$, the higher the SNR of $\env(t)$ at a given -$\sca$. However, $\fc$ must not be too low to avoid the attenuation of relevant -amplitude dynamics of the song pattern. The saturation level of $\adapt$, +% Building a sufficient SNR "buffer": +A stridulating grasshopper generates a song with a specific initial intensity, +which is steadily attenuated as the song propagates through the +environment~(\bcite{michelsen1978sound}). A listening grasshopper receives a +sound signal $\raw(t)$, which is a mixture of the song component $\soc(t)$ with +scale $\sca$ and the environmental noise component $\noc(t)$. The greater the +distance between sender and receiver, the smaller $\sca$ and hence the lower +the SNR of $\raw(t)$ at the position of the receiver. The tympanal bandpass +filtering of $\raw(t)$ into $\filt(t)$ likely improves the SNR by attenuating +frequencies outside the relevant range of grasshopper songs. The SNR is further +improved by the rectification and lowpass filtering of $\filt(t)$ into +$\env(t)$. The lower the cutoff frequency $\fc$ of the lowpass filter, the +higher the SNR of $\env(t)$ at a given $\sca$, although $\fc$ must also be +sufficiently high to preserve the amplitude dynamics of the song pattern. +Overall, the first processing steps along the pathway are not designed to +achieve intensity invariance but rather to improve the SNR of the song +representation beyond the initial SNR of $\raw(t)$. + +The first mechanism of intensity invariance consists of logarithmic compression +and adaptation of $\env(t)$ into $\adapt(t)$. In the absence of $\noc(t)$, +$\adapt(t)$ is a perfectly intensity-invariant representation of $\soc(t)$. In +the presence of $\noc(t)$, $\adapt(t)$ is intensity-invariant only for a +sufficiently high SNR of $\env(t)$. The preceeding SNR improvements from +$\raw(t)$ to $\env(t)$ thus serve to improve the intensity invariance of +$\adapt(t)$ by shifting the saturation point towards lower $\sca$. However, +this effect is limited --- if the SNR of $\raw(t)$ at the receiver's position +does not allow for a sufficiently high SNR of $\env(t)$, $\adapt(t)$ will not +be intensity-invariant. The initial song intensity that the sender can achieve +therefore determines the distance at which $\adapt(t)$ is intensity-invariant +to the receiver. + +Assuming that intensity invariance of $\adapt(t)$ is required for reliable song +recognition, + + +This might be a reason why robustness to noise masking is an +attractive property of male calling songs~(\bcite{einhaupl2011attractiveness}). + +The saturation level of $\adapt$, unlike its saturation point, is independent of the SNR of $\env(t)$ because the influence of $\noc(t)$ is negligible for sufficiently large $\sca$. The output SNR of $\adapt(t)$ saturates at a comparably low value of around 10. This might @@ -1798,35 +1818,34 @@ the saturation level of $f_i(t)$ will be determined by the second mechanism. The saturation points of $f_i(t)$ across the set are distributed over a much wider range than those of the preceeding kernel responses $c_i(t)$, which suggests that the interaction between the two mechanisms is specific to -individual kernels $k_i(t)$. A number of $f_i(t)$ achieve a lower saturation -point than the respective $c_i(t)$, while some $f_i(t)$ exhibit similar or only -marginally lower saturation points. This raises the question whether two -consecutive mechanisms of intensity invariance are actually beneficial for the -overall system. +individual kernels. A number of $f_i(t)$ achieve a lower saturation point than +the respective $c_i(t)$, whereas some $f_i(t)$ exhibit similar or only +marginally lower saturation points. In these cases, the question arises to what +extent two consecutive mechanisms of intensity invariance are actually +beneficial for the overall system. -Various grasshopper species, especially those with longer songs like \textit{C. -mollis}, \textit{G. rufus}, or \textit{O. rufipes}, tend to stridulate softly -at first and then continuously increase the amplitude of their song over time. -This slow "ramping" amplitude modulation makes the overall song less periodic -despite its temporal regularity. The "ramping" appears more pronounced in -$\env(t)$ compared to $\adapt(t)$, which suggests that the logarithmic -compression and adaptation during the preprocessing stage might be at least -partially beneficial for mitigating the effect of this amplitude modulation on -later representations. However, the adaptation of $\adapt(t)$ can only act on -certain time scales --- depending on the cutoff frequency of the underlying -highpass filter --- and is hence not able to compensate for "ramping" across -the entire duration of a song. +From a computational perspective, the answer could be that logarithmic +compression and adaptation is a necessary preprocessing step towards robust +$f_i(t)$ because it works towards a more consistent distribution $\pci$ of +$c_i(t)$. If $\pci$ is consistent between different songs of the same species, +a static threshold value $\thr$ is sufficient to generate a consistent +species-specific feature representation. If $\pci$ is consistent over the +course of a song, $f_i(t)$ is constant throughout the song, which extends the +time window for reliable recognition~(Section\,\ref{sec:constant_feat}). -From a purely functional perspective, the answer could be that logarithmic -compression and adaptation is a necessary preprocessing step towards a robust -feature representation, even if thresholding and temporal averaging alone would -be sufficient to render $f_i(t)$ intensity-invariant. This preprocessing likely -improves the temporal regularity of the song pattern in $\adapt(t)$ and -$c_i(t)$, which is required for constant $f_i(t)$ across the duration of a -song~(Section\,\ref{sec:constant_feat}). It also ensures consistency between -the distribution $\pci$ of $c_i(t)$ across songs of different intensity, which -is essential for the generation of consistent species-specific $f_i(t)$ under a -static $\thr$. From a physiological perspective, the answer is likely that + +First, the preprocessing results in a more consistent +distribution $\pci$ of $c_i(t)$ between songs of different intensity and in +turn allows for the generation of consistent $f_i(t)$ under a static threshold +value $\thr$. Second, this preprocessing improves the temporal regularity of +the song pattern by mitigating the slow "ramping" amplitude modulation that is +common to many grasshopper songs. + +This preprocessing likely improves the temporal regularity of the song pattern +in $\adapt(t)$ and $c_i(t)$, which is required for constant $f_i(t)$ across the +duration of a song~(Section\,\ref{sec:constant_feat}). + +From a physiological perspective, the answer is likely that neurons possess only a limited firing rate for encoding stimulus intensities that can range over several orders of magnitude. Sigmoidal tuning curves over logarithmically compressed stimulus intensities are a common property of