Wrote results for pipeline_full, pipeline_short, and feat_cross_species.

2026-05-07 18:15:00 +02:00
parent a48457d967
commit 4b4a04ab2a
14 changed files with 548 additions and 296 deletions
--- a/main.tex
+++ b/main.tex
@@ -105,6 +105,7 @@
 \newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
 \newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
 \newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
+\newcommand{\muf}{\mu_{f_i}} % Average feature value

 \section{Exploring a grasshopper's sensory world}

@@ -312,7 +313,9 @@ within the auditory pathway.
 % - How to integrate the available knowledge on anatomy, physiology, ethology?\\
 % $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework

-\section{Developing a functional model of the\\grasshopper song recognition pathway}
+\section{Methods}
+
+\subsection{Functional model of the grasshopper song recognition pathway}

 % Too long (no splitting, only pruning).
 The essence of constructing a functional model of a given system is to gain a
@@ -373,7 +376,7 @@ outlined in the following sections.
    \label{fig:pathway}
 \end{figure}

-\subsection{Population-driven signal preprocessing}
+\subsubsection{Population-driven signal preprocessing}

 Grasshoppers receive airborne sound waves by a tympanal organ at either side of
 the body. The tympanal membrane acts as a mechanical resonance filter for
@@ -436,7 +439,7 @@ following feature extraction stage.
 \end{figure}
 \FloatBarrier

-\subsection{Feature extraction by individual neurons}
+\subsubsection{Feature extraction by individual neurons}

 The ascending neurons extract and encode a number of different features of the
 preprocessed signal. As a population, they hence represent the signal in a
@@ -555,7 +558,11 @@ can be read out by a simple linear classifier.
 \end{figure}
 \FloatBarrier

-\section{Mechanisms driving the emergence of\\intensity-invariant song representation}
+\subsubsection{Simulation-based analysis of the model pathway}
+
+\section{Results}
+
+\subsection{Mechanisms driving the emergence of intensity invariance}

 % Still missing the SNR analysis. Should be able to write around it for now.
 The robustness of song recognition is tied to the degree of intensity
@@ -571,7 +578,7 @@ intensity variations. The two mechanisms each comprise a nonlinear signal
 transformation followed by a linear signal transformation but differ in the
 specific operations involved, as outlined in the following sections.

-\subsection{Full-wave rectification \& lowpass filtering}
+\subsubsection{Full-wave rectification \& lowpass filtering}

 The first nonlinear transformation along the model pathway is the full-wave
 rectification of the tympanal signal $\filt(t)$ during the extraction of the
@@ -651,7 +658,7 @@ more robust input representation and higher input SNR.
 \end{figure}
 \FloatBarrier

-\subsection{Logarithmic compression \& spike-frequency adaptation}
+\subsubsection{Logarithmic compression \& spike-frequency adaptation}

 The second nonlinear transformation along the model pathway is the logarithmic
 compression of the signal envelope $\env(t)$ into $\db(t)$, Eq.\,\ref{eq:log},
@@ -794,7 +801,7 @@ is a recurring phenomenon that is further addressed in the following sections.
 \end{figure}
 \FloatBarrier

-\subsection{Thresholding nonlinearity \& temporal averaging}
+\subsubsection{Thresholding nonlinearity \& temporal averaging}

 The third nonlinear transformation along the model pathway is the thresholding
 nonlinearity $\nl$ that transforms each kernel response $c_i(t)$ into a binary
@@ -809,13 +816,13 @@ rescaled~(Fig.\,\ref{fig:thresh-lp_single}a) and convolved with kernel $k(t)$.
 The resulting kernel response $c(t)$ was passed through $H(c\,-\,\Theta)$ with
 three different threshold values
 $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}b-d). Each resulting binary response
-$b(t)$ was transformed into $f(t)$, whose average feature value serves as a
-measure of intensity~(Fig.\,\ref{fig:thresh-lp_single}ef). The thresholding
-nonlinearity $H(c\,-\,\Theta)$ categorizes the values of $c(t)$ into "relevant"
-($c(t)>\Theta$, $b(t)=1$) and "irrelevant" ($c(t)\leq\Theta$, $b(t)=0$)
-response values. It thereby splits the probability density $\pc$ of $c(t)$
-within some observed time interval $T$ into two complementary parts around
-$\Theta$:
+$b(t)$ was transformed into $f(t)$, whose average feature value $\mu_f$ serves
+as a measure of intensity~(Fig.\,\ref{fig:thresh-lp_single}ef). The
+thresholding nonlinearity $H(c\,-\,\Theta)$ categorizes the values of $c(t)$
+into "relevant" ($c(t)>\Theta$, $b(t)=1$) and "irrelevant" ($c(t)\leq\Theta$,
+$b(t)=0$) response values. It thereby splits the probability density $\pc$ of
+$c(t)$ within some observed time interval $T$ into two complementary parts
+around $\Theta$:
 \begin{equation}
    \int_{\Theta}^{+\infty} \pc\,dc\,=\,1\,-\,\int_{-\infty}^{\Theta} \pc\,dc\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc\,=\,1
    \label{eq:pdf_split}
@@ -856,45 +863,45 @@ points at which $c(t)$ crosses $\Theta$: The steeper the slope of $c(t)$, the
 less $T_1$ changes with variations in $\sca$. The most reliable way of
 exploiting this invariant porperty of $f(t)$ is to set $\Theta$ to a value near
 0, because these values are least affected by different scales of $c(t)$. For
-sufficiently large $\sca$, $f(t)$ then approaches the same constant value in
+sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
 both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
 saturation regime).

-The value of $f(t)$ in the saturation regime is independent of the precise
+The value of $\mu_f$ in the saturation regime is independent of the precise
 value of $\Theta$, but the value of $\sca$ at which the saturation regime is
 reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
 a threshold value of $\Theta=0$ would be the optimal choice for achieving
 intensity invariance at the lowest possible $\sca$. In stark contrast, the
-closer $\Theta$ is to 0, the higher the pure-noise response of $f(t)$ and the
-lower the resulting SNR of $f(t)$ between noise regime and saturation
-regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
-Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
+closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise
+component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise
+regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column,
+and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
 "unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
-pure-noise $c(t)$, so that any value of $f(t)$ greater than 0 indicates the
-presence of the song component $\soc(t)$ in input $\adapt(t)$ at the cost of
-requiring a higher $\sca$ to reach the saturation regime. This trade-off
-between intensity invariance and SNR has already been observed during the
-previous analysis on logarithmic compression and
-adaptation~(Fig.\,\ref{fig:log-hp}d). However, the parameters that determine
-the SNR of $\adapt(t)$ are much less understood and likely relate to properties
-of the signal, whereas the SNR of $f(t)$ depends on the choice of $\Theta$ and
-can be more directly manipulated by the system.
+pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song
+component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
+$\sca$ to reach the saturation regime. This trade-off between intensity
+invariance and SNR has already been observed during the previous analysis on
+logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
+parameters that determine the SNR of $\adapt(t)$ are much less understood and
+likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
+the choice of $\Theta$ and can be more directly manipulated by the system.

 Finally, the effects of thresholding and temporal averaging must be seen in the
 context of the previous transformation pair of logarithmic compression and
-adaptation. 
-
-Finally, the question remains whether the intensity-invariant output $\adapt(t)$
-of the previous transformation pair allows feature 
-
-Finally, the output $\adapt(t)$ of the previous transformation
-pair~(Fig.\,\ref{fig:log-hp}cd) can be related to the input $\adapt(t)$ of the
-current transformation pair by plotting the values of $f(t)$ over the standard
-deviation of input $\adapt(t)$ instead of
-$\sca$~(Fig.\,\ref{fig:thresh-lp_single}f). This is relevant because, unlike
-$\sca$, the standard deviation of $\adapt(t)$ is capped to a maximum value of
-around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
-
+adaptation: In the current analysis, the input $\adapt(t)$ can be rescaled by
+arbitrarily large $\sca$, while in the full pathway, the current input
+$\adapt(t)$ is the output $\adapt(t)$ of the previous transformation pair and
+is hence capped to a maximum standard deviation of around
+10\,dB~(Fig.\,\ref{fig:log-hp}cd). This can be illustrated by plotting $\mu_f$
+not over $\sca$~(Fig.\,\ref{fig:thresh-lp_single}e) but over the standard
+deviation of input $\adapt(t)$ instead~(Fig.\,\ref{fig:thresh-lp_single}f). It
+becomes apparent that $\mu_f$ saturates only for standard deviations of
+$\adapt(t)$ that would already be capped. Accordingly, $f(t)$ never reaches the
+saturation regime as determined by the current transformation pair but rather
+adheres to the saturation regime determined by the previous transformation
+pair. In this case, the saturated $\mu_f$ is not independent of $\Theta$
+anymore. The consequences of this interaction between the two mechanisms of
+intensity invariance are further explored in a later section.

 \begin{figure}[!ht]
    \centering
@@ -934,6 +941,72 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
 \end{figure}
 \FloatBarrier

+\subsection{Intensity invariance of species-specific feature representations}
+
+Having established both the meaning of the feature value and the mechanism of
+intensity invariance by thresholding and temporal averaging, the question
+remains how this mechanism acts on a set of features $f_i(t)$ based on
+different species-specific songs~(Fig.\,\ref{fig:thresh-lp_species}a). The
+previous analysis was repeated with three different kernels $k_i(t)$ using a
+single kernel-specific threshold value $\thr$; and the resulting average
+feature values $\muf$ were plotted over
+$\sca$~(Fig.\,\ref{fig:thresh-lp_species}bc). Additionally, 2D feature spaces
+spanned by each pair of $f_i(t)$ were plotted to investigate the separability
+of species-specific songs based on the feature representation in dependence of
+$\sca$~(Fig.\,\ref{fig:thresh-lp_species}de). Each species-specific combination
+of $\muf$ follows a trajectory through feature space that develops with $\sca$.
+These trajectories correspond to the transient regime between the constant
+(noise) regime and the saturation regime, which are only visible as the start
+and end points of the trajectories, respectively. The horizontal dashes in the
+colorbars indicate the range of $\sca$ that corresponds to the transient regime
+across $f_i(t)$ for each species.
+
+In the noiseless case, each $\muf$ is 0 for small $\sca$ across all
+species~(Fig.\,\ref{fig:thresh-lp_species}b) because $c_i(t)$ never exceeds
+$\thr$. Accordingly, each trajectory starts at the origin of the feature
+space~(Fig.\,\ref{fig:thresh-lp_species}d). For larger $\sca$, all $\muf$
+saturate at individual values whose combination differs between species, so
+that the songs of each species are eventually represented by distinct points in
+feature space. However, the species-specific trajectories cross each other at
+numerous points, which means that the songs of two species --- each at a
+specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
+the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and
+the species: For \textit{C. mollis}, all $\muf$ saturate around the same
+$\sca$, while \textit{O. rufipes} exhibits considerable variation between the
+three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$,
+the stronger the curvature of the trajectory through feature space.
+
+In the noisy case, $\muf$ is non-zero even for the smallest
+$\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
+component $\noc(t)$ to input $\adapt(t)$ drives $c_i(t)$ above $\thr$
+regardless of the song component $\soc(t)$. The starting value of $\muf$ is the
+same across all $f_i(t)$ and species by construction of the specific $\thr$. In
+consequence, the trajectories through feature space do not start at the origin
+but rather at approximately the same point along the
+diagonal~(Fig.\,\ref{fig:thresh-lp_species}e). For larger $\sca$, all $\muf$
+saturate at the same values as in the noiseless case, as expected from the
+previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
+trajectories now move a much shorter distance through feature space for a
+similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
+and saturation regime, which increases the likelihood of trajectories crossing
+each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given
+species are slightly higher in the noisy case, but the variation between
+$f_i(t)$ remains largely unchanged.
+
+In summary, even a comparably small set of three features $f_i(t)$ can, in
+principle, represent different species-specific songs at distinct points in
+feature space, regardless of the presence of noise. However, this only holds
+for sufficiently large $\sca$ that allow $f_i(t)$ to reach a saturation regime.
+During the transient regime, the species-specific combination of $\muf$ can
+very well be the same for two or more different species at specific $\sca$,
+although this may be alleviated by the inclusion of additional $f_i(t)$.
+Overall, the results of this analysis suggest that $\thr$ should rather be
+choosen in favor of a higher SNR ($\thr$ just above pure-noise $c_i(t)$) than a
+lower saturation point ($\thr\to0$). First, because this reduces the density of
+trajectories through feature space, and second, because the capping of
+$\adapt(t)$ by the previous transformation pair likely renders the saturation
+point of $f_i(t)$ less relevant.
+
 \begin{figure}[!ht]
    \centering
    \includegraphics[width=\textwidth]{figures/fig_invariance_thresh_lp_species.pdf}
@@ -968,28 +1041,81 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
    \label{fig:thresh-lp_species}
 \end{figure}
 \FloatBarrier
-% \caption{\textbf{Rectification and lowpass filtering improves SNR
-%                  but does not contribute to intensity invariance.}
-%                  Input $\raw(t)$ consists of song component $\soc(t)$ scaled by
-%                  $\sca$ with optional noise component $\noc(t)$ and is
-%                  successively transformed into tympanal signal $\filt(t)$ and
-%                  envelope $\env(t)$. Different line styles indicate different
-%                  cutoff frequencies $\fc$ of the lowpass filter extracting
-%                  $\env(t)$.
-%                  \textbf{Top}:~Example representations of $\filt(t)$ and
-%                  $\env(t)$ for different $\sca$.
-%                  \textbf{a}:~Noiseless case.
-%                  \textbf{b}:~Noisy case.
-%                  \textbf{Bottom}:~Intensity metrics over a range of $\sca$.
-%                  \textbf{c}:~Noiseless case: Standard deviations of $\filt(t)$
-%                  and $\env(t)$.
-%                  \textbf{d}:~Noisy case: Ratios of standard deviations of
-%                  $\filt(t)$ and $\env(t)$ to the respective reference standard
-%                  deviation for input $\raw(t)=\noc(t)$.
-%                  \textbf{e}:~Ratios of standard deviations of $\env(t)$ as in
-%                  \textbf{b} for different species (averaged over songs and
-%                  recordings, see appendix Fig.\,\ref{fig:app_rect-lp}).
-%                 }
+
+\subsection{Intensity invariance along the full model pathway}
+
+Through the previous analyses, we could establish two mechanisms of intensity
+invariance: Logarithmic compression and adaptation as well as thresholding and
+temporal averaging. While each transformation pair by itself can provide some
+level of invariance, certain results suggest that the first mechanism may
+actually limit or even nullify the effect of the second mechanism. In the
+following sections, we investigate the combined effect of both mechanisms along
+the full model pathway~(Fig.\,\ref{fig:pipeline_full}) and explore the
+consequences of disabling the first mechanism by skipping the logarithmic
+compression step~(Fig.\,\ref{fig:pipeline_short}).
+
+\subsubsection{Including logarithmic compression}
+
+For this analysis, input $\raw(t)$ --- including both song component $\soc(t)$
+and noise component $\noc(t)$ --- was rescaled and processed throughout all
+steps of the model pathway~(Fig.\,\ref{fig:pipeline_full}a) up to the feature
+set $f_i(t)$. As before, the standard deviation was used as intensity metric
+for each resulting representation except $b_i(t)$ and $f_i(t)$. For $f_i(t)$,
+the average feature value $\muf$ was used, while $b_i(t)$ was omitted from the
+analysis. Plotting each intensity metric over
+$\sca$~(Fig.\,\ref{fig:pipeline_full}b) reinforces many of the previous
+observations. For ease of visualization, the kernel-specific curves for
+$c_i(t)$ and $f_i(t)$ were summarized by their median. Representations prior to
+logarithmic compression --- $\filt(t)$ and $\env(t)$ --- show a linear increase
+of the intensity metric for larger $\sca$ on a double-logarithmic scale.
+Representations after logarithmic compression --- $\db(t)$, $\adapt(t)$, and
+$c_i(t)$ --- are the first to reach a saturation regime and do so at
+approximately the same $\sca$ because they are separated only by linear
+transformations. Feature set $f_i(t)$ reaches a saturation regime, as well. But
+contrary to previous results, the saturation point of $f_i(t)$ appears below
+that of $c_i(t)$, which suggests that the second mechanism of thresholding and
+temporal averaging can indeed improve intensity invariance beyond the first
+mechanism of logarithmic compression and adaptation. The difference in
+saturation points is best illustrated based on the ratio of each intensity
+metric to the respective pure-noise reference
+value~(Fig.\,\ref{fig:pipeline_full}d). However, compressing $f_i(t)$ into a
+median across $k_i(t)$ conceils many kernel-specific details. It is therefore
+necessary to consider the development of each $f_i(t)$ over $\sca$
+separately~(Fig.\,\ref{fig:pipeline_full}c). Indeed, all 40 $f_i(t)$ in the set
+reach a saturation regime for sufficiently large $\sca$. The saturated $\muf$
+are distributed over a range of values --- which is the prerequisite for
+forming species-specific combinations --- but are limited to a rather small
+subset of possible values between 0 and 1. Based on previous
+results~(Fig.\,\ref{fig:thresh-lp_single}f), this is likely due to the capping
+of $\adapt(t)$ that prevents $f_i(t)$ from reaching its intrinsic saturation
+value; but this cannot be confirmed until the following
+analysis~(Fig.\,\ref{fig:pipeline_short}). Looking at the kernel-specific SNR
+values of $c_i(t)$ over $\sca$~(Fig.\,\ref{fig:pipeline_full}e) and $f_i(t)$
+over $\sca$~(Fig.\,\ref{fig:pipeline_full}f) reveals a high degree of variation
+between different $k_i(t)$. Certain $f_i(t)$ achieve much higher SNR values
+than $c_i(t)$ for the same $\sca$ due to the former's capacity for arbitrarily
+low pure-noise responses ($\muf\to0$) and hence arbitrarily high SNR values.
+Finally, the question remains whether the suspected improvement of intensity
+invariance by $f_i(t)$ beyond $c_i(t)$ holds at the level of individual
+$k_i(t)$. The single saturation points based on the median across $k_i(t)$ for
+$c_i(t)$ and $f_i(t)$ are expanded into distributions of kernel-specific
+saturation points~(Fig.\,\ref{fig:pipeline_full}g). For $c_i(t)$, the
+distribution is rather narrow and corresponds well to the single saturation
+point based on the median. For $f_i(t)$, however, the distribution is much
+broader and is not centered around the single saturation point based on the
+median but rather shifted towards lower $\sca$. Care must be taken when
+interpreting the height of either distribution due to the logarithmic scaling
+of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
+specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their
+$c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
+averaging on intensity invariance is not necessarily nullified by the previous
+logarithmic compression and adaptation, which means that both mechanisms can,
+in principle, work together towards an intensity-invariant song representation.
+% Or does one simply overwrite the other? Can there even be a higher intensity
+% invariance based on the sum of both effects? Or does one simply kick in for
+% lower scales than the other and thus dictates the overall intensity
+% invariance? Whatever, discussion material.
+
 \begin{figure}[!ht]
    \centering
    \includegraphics[width=\textwidth]{figures/fig_invariance_full_Omocestus_rufipes.pdf}
@@ -1028,6 +1154,50 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
 \end{figure}
 \FloatBarrier

+\subsubsection{Excluding logarithmic compression}
+
+The previous analysis was repeated in exactly the same way as before, except
+that the logarithmic compression of $\env(t)$, Eq.\,\ref{eq:log}, was skipped
+in order to disable the first mechanism of intensity invariance. Consequently,
+$\adapt(t)$ is merely a highpass filtered version of $\env(t)$; and $\db(t)$ is
+missing entirely~(Fig.\,\ref{fig:pipeline_short}a). As expected, all
+representations prior to the thresholding nonlinearity $\nl$ --- $\filt(t)$,
+$\env(t)$, $\adapt(t)$, and $c_i(t)$ --- show a linear increase of the
+intensity metric for larger $\sca$, while $f_i(t)$ is the only representation
+to reach a saturation regime~(Fig.\,\ref{fig:pipeline_short}bd). The
+saturated $\muf$ are distributed over a much broader range of values than in
+the previous analysis~(Fig.\,\ref{fig:pipeline_short}c). Intriguingly, the
+distribution of $\muf$ is symmetric around a value of 0.5. This is relevant
+because every kernel $k^+(t)$ in the underlying kernel set has a counterpart of
+opposite sign that is otherwise identical, so that $k^+(t)=-k^-(t)$. The
+responses of $k^+(t)$ and $k^-(t)$ to the same input $\adapt(t)$ are also
+inverted because convolution is a linear operation: $c^+(t)=-c^-(t)$. The
+distributions of $c^+(t)$ and $c^-(t)$ are hence inverted to each other, as
+well: $p(c^+)=p(-c^-)$. Based on Eq.\,\ref{eq:feat_prop}, transforming $c^+(t)$
+and $c^-(t)$ further using the same $\Theta$ thus results in two complementary
+features $f^+(t)$ and $f^-(t)$ that are symmetric around 0.5, so that
+$f^+(t)=1-f^-(t)$. Of course, this symmetry throughout the feature
+representation goes hand in hand with a substantial degree of redundancy and is
+hardly expected to be present in the actual grasshopper auditory system. But
+the fact that the saturated $\muf$ are distributed symmetrically around 0.5
+provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
+saturation value in the absence of logarithmic
+compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
+the capping of $\adapt(t)$, as seen during previous
+analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
+Fig.\,\ref{fig:pipeline_full}c). Otherwise, there appear to be no major
+differences in the development of $f_i(t)$ over $\sca$ compared to the previous
+analysis, neither on the kernel-specific SNR
+values~(Fig.\,\ref{fig:pipeline_short}e) nor on the distribution of
+kernel-specific saturation points~(Fig.\,\ref{fig:pipeline_short}f). Overall,
+the most substantial consequence of skipping the logarithmic compression is
+that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
+results in a wider range of $\muf$ across the feature set, it should be
+benefitial for forming species-specific combinations. However, this depends on
+multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
+the structure and distribution of the specific song and is hence not
+guaranteed simply by disabling logarithmic compression.
+
 \begin{figure}[!ht]
    \centering
    \includegraphics[width=\textwidth]{figures/fig_invariance_short_Omocestus_rufipes.pdf}
@@ -1065,6 +1235,61 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
 \end{figure}
 \FloatBarrier

+\subsubsection{Field data}
+
+\begin{figure}[!ht]
+    \centering
+    \includegraphics[width=\textwidth]{figures/fig_invariance_field.pdf}
+    \caption{\textbf{Step-wise emergence of intensity invariant song
+                     representation along the model pathway.}
+                     }
+    \label{fig:pipeline_field}
+\end{figure}
+\FloatBarrier
+
+\subsection{Interspecific and intraspecific feature variability}
+
+In the final analysis of the current study, we investigated the variability of
+songs in the feature representation between different species and within the
+same species~(Fig.\,\ref{fig:feat_cross_species}). Naturally, a feature
+representation that is both consistent across different songs of the same
+species and sufficiently different between songs of different species is a
+fundamental prerequisite for species-specific song recognition. The data used
+in this analysis corresponds to the saturated $\muf$ of each $f_i(t)$ from the
+previous analysis of the full model pathway~(Fig.\,\ref{fig:pipeline_full}c),
+using different songs of \textit{O. rufipes} for the intraspecific comparisons
+and single songs from a number of species for the interspecific comparisons
+(also shown in Fig.\,\ref{fig:thresh-lp_species}a). Accordingly, each song is
+represented by 40 values of $\muf$ based on the same set of $f_i(t)$. For each
+comparison, $\muf$ from one song was plotted against $\muf$ from the other
+song, so that each dot within a subplot corresponds to a single feature
+$f_i(t)$. For the intraspecific
+comparisons~(Fig.\,\ref{fig:feat_cross_species}, upper triangular), the pairs
+of $\muf$ are distributed closely around the diagonal, with a minimum
+correlation coefficient of $\rho=0.85$, a maximum of $\rho=0.99$, and a median
+of $\rho=0.92$. A given $f_i(t)$ thus tends to have a similar $\muf$ across
+different songs of the same species. In contrast, the pairs of $\muf$ for the
+interspecific comparisons~(Fig.\,\ref{fig:feat_cross_species}, lower
+triangular) are distributed in a variety of different ways, most in broader
+clouds (e.g. \textit{C. biguttulus} vs. \textit{C. mollis}) but some more
+narrowly around the diagonal (e.g. \textit{P. parallelus} vs. \textit{C.
+dispar}). The correlation coefficients $\rho$ vary widely between different
+interspecific comparisons, with a minimum of $\rho=-0.1$, a maximum of
+$\rho=0.92$, and a median of $\rho=0.53$. A given $f_i(t)$ therefore tends to
+have a less similar $\muf$ across different species than within the same
+species, although certain exeptions exist~(Fig.\,\ref{fig:feat_cross_species},
+lower right). Accordingly, the feature representation that is generated by the
+model pathway is, in principle, suitable for the distinction between different
+species-specific songs. However, even the songs of the same species are subject
+to considerable variability in various aspects and depending on a multitude of
+external and internal factors, which cannot be fully captured based on a
+limited number of songs. The results of the current analysis are hence to be
+treated as a proof-of-concept that paves the way towards more comprehensive
+investigations on the details of song representation in feature space,
+including the effects of different parameters of the model pathway as well as
+the inclusion of additional songs and species to reflect the complexity of
+natural song variation.
+
 \begin{figure}[!ht]
    \centering
    \includegraphics[width=\textwidth]{figures/fig_features_cross_species.pdf}
@@ -1086,7 +1311,7 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
                     \textbf{Upper triangular}:~Intraspecific comparisons
                     between different songs of a single species (\textit{O.
                     rufipes}).
-                     \textbf{Lower left}:~Distribution of correlation
+                     \textbf{Lower right}:~Distribution of correlation
                     coefficients $\rho$ for each interspecific and
                     intraspecific comparison. Dots indicate single $\rho$
                     values.
@@ -1095,16 +1320,6 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
 \end{figure}
 \FloatBarrier

-\begin{figure}[!ht]
-    \centering
-    \includegraphics[width=\textwidth]{figures/fig_invariance_field.pdf}
-    \caption{\textbf{Step-wise emergence of intensity invariant song
-                     representation along the model pathway.}
-                     }
-    \label{fig:pipeline_field}
-\end{figure}
-\FloatBarrier
-
 \section{Conclusions \& outlook}

 \textbf{Song recognition pathway: Grasshopper vs. model:}\\