Wrote results for pipeline_full, pipeline_short, and feat_cross_species.

This commit is contained in:
j-hartling
2026-05-07 18:15:00 +02:00
parent a48457d967
commit 4b4a04ab2a
14 changed files with 548 additions and 296 deletions

365
main.tex
View File

@@ -105,6 +105,7 @@
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
\newcommand{\muf}{\mu_{f_i}} % Average feature value
\section{Exploring a grasshopper's sensory world}
@@ -312,7 +313,9 @@ within the auditory pathway.
% - How to integrate the available knowledge on anatomy, physiology, ethology?\\
% $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
\section{Developing a functional model of the\\grasshopper song recognition pathway}
\section{Methods}
\subsection{Functional model of the grasshopper song recognition pathway}
% Too long (no splitting, only pruning).
The essence of constructing a functional model of a given system is to gain a
@@ -373,7 +376,7 @@ outlined in the following sections.
\label{fig:pathway}
\end{figure}
\subsection{Population-driven signal preprocessing}
\subsubsection{Population-driven signal preprocessing}
Grasshoppers receive airborne sound waves by a tympanal organ at either side of
the body. The tympanal membrane acts as a mechanical resonance filter for
@@ -436,7 +439,7 @@ following feature extraction stage.
\end{figure}
\FloatBarrier
\subsection{Feature extraction by individual neurons}
\subsubsection{Feature extraction by individual neurons}
The ascending neurons extract and encode a number of different features of the
preprocessed signal. As a population, they hence represent the signal in a
@@ -555,7 +558,11 @@ can be read out by a simple linear classifier.
\end{figure}
\FloatBarrier
\section{Mechanisms driving the emergence of\\intensity-invariant song representation}
\subsubsection{Simulation-based analysis of the model pathway}
\section{Results}
\subsection{Mechanisms driving the emergence of intensity invariance}
% Still missing the SNR analysis. Should be able to write around it for now.
The robustness of song recognition is tied to the degree of intensity
@@ -571,7 +578,7 @@ intensity variations. The two mechanisms each comprise a nonlinear signal
transformation followed by a linear signal transformation but differ in the
specific operations involved, as outlined in the following sections.
\subsection{Full-wave rectification \& lowpass filtering}
\subsubsection{Full-wave rectification \& lowpass filtering}
The first nonlinear transformation along the model pathway is the full-wave
rectification of the tympanal signal $\filt(t)$ during the extraction of the
@@ -651,7 +658,7 @@ more robust input representation and higher input SNR.
\end{figure}
\FloatBarrier
\subsection{Logarithmic compression \& spike-frequency adaptation}
\subsubsection{Logarithmic compression \& spike-frequency adaptation}
The second nonlinear transformation along the model pathway is the logarithmic
compression of the signal envelope $\env(t)$ into $\db(t)$, Eq.\,\ref{eq:log},
@@ -794,7 +801,7 @@ is a recurring phenomenon that is further addressed in the following sections.
\end{figure}
\FloatBarrier
\subsection{Thresholding nonlinearity \& temporal averaging}
\subsubsection{Thresholding nonlinearity \& temporal averaging}
The third nonlinear transformation along the model pathway is the thresholding
nonlinearity $\nl$ that transforms each kernel response $c_i(t)$ into a binary
@@ -809,13 +816,13 @@ rescaled~(Fig.\,\ref{fig:thresh-lp_single}a) and convolved with kernel $k(t)$.
The resulting kernel response $c(t)$ was passed through $H(c\,-\,\Theta)$ with
three different threshold values
$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}b-d). Each resulting binary response
$b(t)$ was transformed into $f(t)$, whose average feature value serves as a
measure of intensity~(Fig.\,\ref{fig:thresh-lp_single}ef). The thresholding
nonlinearity $H(c\,-\,\Theta)$ categorizes the values of $c(t)$ into "relevant"
($c(t)>\Theta$, $b(t)=1$) and "irrelevant" ($c(t)\leq\Theta$, $b(t)=0$)
response values. It thereby splits the probability density $\pc$ of $c(t)$
within some observed time interval $T$ into two complementary parts around
$\Theta$:
$b(t)$ was transformed into $f(t)$, whose average feature value $\mu_f$ serves
as a measure of intensity~(Fig.\,\ref{fig:thresh-lp_single}ef). The
thresholding nonlinearity $H(c\,-\,\Theta)$ categorizes the values of $c(t)$
into "relevant" ($c(t)>\Theta$, $b(t)=1$) and "irrelevant" ($c(t)\leq\Theta$,
$b(t)=0$) response values. It thereby splits the probability density $\pc$ of
$c(t)$ within some observed time interval $T$ into two complementary parts
around $\Theta$:
\begin{equation}
\int_{\Theta}^{+\infty} \pc\,dc\,=\,1\,-\,\int_{-\infty}^{\Theta} \pc\,dc\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc\,=\,1
\label{eq:pdf_split}
@@ -856,45 +863,45 @@ points at which $c(t)$ crosses $\Theta$: The steeper the slope of $c(t)$, the
less $T_1$ changes with variations in $\sca$. The most reliable way of
exploiting this invariant porperty of $f(t)$ is to set $\Theta$ to a value near
0, because these values are least affected by different scales of $c(t)$. For
sufficiently large $\sca$, $f(t)$ then approaches the same constant value in
sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
saturation regime).
The value of $f(t)$ in the saturation regime is independent of the precise
The value of $\mu_f$ in the saturation regime is independent of the precise
value of $\Theta$, but the value of $\sca$ at which the saturation regime is
reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
a threshold value of $\Theta=0$ would be the optimal choice for achieving
intensity invariance at the lowest possible $\sca$. In stark contrast, the
closer $\Theta$ is to 0, the higher the pure-noise response of $f(t)$ and the
lower the resulting SNR of $f(t)$ between noise regime and saturation
regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise
component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise
regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column,
and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
pure-noise $c(t)$, so that any value of $f(t)$ greater than 0 indicates the
presence of the song component $\soc(t)$ in input $\adapt(t)$ at the cost of
requiring a higher $\sca$ to reach the saturation regime. This trade-off
between intensity invariance and SNR has already been observed during the
previous analysis on logarithmic compression and
adaptation~(Fig.\,\ref{fig:log-hp}d). However, the parameters that determine
the SNR of $\adapt(t)$ are much less understood and likely relate to properties
of the signal, whereas the SNR of $f(t)$ depends on the choice of $\Theta$ and
can be more directly manipulated by the system.
pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song
component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
$\sca$ to reach the saturation regime. This trade-off between intensity
invariance and SNR has already been observed during the previous analysis on
logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
parameters that determine the SNR of $\adapt(t)$ are much less understood and
likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
the choice of $\Theta$ and can be more directly manipulated by the system.
Finally, the effects of thresholding and temporal averaging must be seen in the
context of the previous transformation pair of logarithmic compression and
adaptation.
Finally, the question remains whether the intensity-invariant output $\adapt(t)$
of the previous transformation pair allows feature
Finally, the output $\adapt(t)$ of the previous transformation
pair~(Fig.\,\ref{fig:log-hp}cd) can be related to the input $\adapt(t)$ of the
current transformation pair by plotting the values of $f(t)$ over the standard
deviation of input $\adapt(t)$ instead of
$\sca$~(Fig.\,\ref{fig:thresh-lp_single}f). This is relevant because, unlike
$\sca$, the standard deviation of $\adapt(t)$ is capped to a maximum value of
around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
adaptation: In the current analysis, the input $\adapt(t)$ can be rescaled by
arbitrarily large $\sca$, while in the full pathway, the current input
$\adapt(t)$ is the output $\adapt(t)$ of the previous transformation pair and
is hence capped to a maximum standard deviation of around
10\,dB~(Fig.\,\ref{fig:log-hp}cd). This can be illustrated by plotting $\mu_f$
not over $\sca$~(Fig.\,\ref{fig:thresh-lp_single}e) but over the standard
deviation of input $\adapt(t)$ instead~(Fig.\,\ref{fig:thresh-lp_single}f). It
becomes apparent that $\mu_f$ saturates only for standard deviations of
$\adapt(t)$ that would already be capped. Accordingly, $f(t)$ never reaches the
saturation regime as determined by the current transformation pair but rather
adheres to the saturation regime determined by the previous transformation
pair. In this case, the saturated $\mu_f$ is not independent of $\Theta$
anymore. The consequences of this interaction between the two mechanisms of
intensity invariance are further explored in a later section.
\begin{figure}[!ht]
\centering
@@ -934,6 +941,72 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
\end{figure}
\FloatBarrier
\subsection{Intensity invariance of species-specific feature representations}
Having established both the meaning of the feature value and the mechanism of
intensity invariance by thresholding and temporal averaging, the question
remains how this mechanism acts on a set of features $f_i(t)$ based on
different species-specific songs~(Fig.\,\ref{fig:thresh-lp_species}a). The
previous analysis was repeated with three different kernels $k_i(t)$ using a
single kernel-specific threshold value $\thr$; and the resulting average
feature values $\muf$ were plotted over
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}bc). Additionally, 2D feature spaces
spanned by each pair of $f_i(t)$ were plotted to investigate the separability
of species-specific songs based on the feature representation in dependence of
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}de). Each species-specific combination
of $\muf$ follows a trajectory through feature space that develops with $\sca$.
These trajectories correspond to the transient regime between the constant
(noise) regime and the saturation regime, which are only visible as the start
and end points of the trajectories, respectively. The horizontal dashes in the
colorbars indicate the range of $\sca$ that corresponds to the transient regime
across $f_i(t)$ for each species.
In the noiseless case, each $\muf$ is 0 for small $\sca$ across all
species~(Fig.\,\ref{fig:thresh-lp_species}b) because $c_i(t)$ never exceeds
$\thr$. Accordingly, each trajectory starts at the origin of the feature
space~(Fig.\,\ref{fig:thresh-lp_species}d). For larger $\sca$, all $\muf$
saturate at individual values whose combination differs between species, so
that the songs of each species are eventually represented by distinct points in
feature space. However, the species-specific trajectories cross each other at
numerous points, which means that the songs of two species --- each at a
specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and
the species: For \textit{C. mollis}, all $\muf$ saturate around the same
$\sca$, while \textit{O. rufipes} exhibits considerable variation between the
three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$,
the stronger the curvature of the trajectory through feature space.
In the noisy case, $\muf$ is non-zero even for the smallest
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
component $\noc(t)$ to input $\adapt(t)$ drives $c_i(t)$ above $\thr$
regardless of the song component $\soc(t)$. The starting value of $\muf$ is the
same across all $f_i(t)$ and species by construction of the specific $\thr$. In
consequence, the trajectories through feature space do not start at the origin
but rather at approximately the same point along the
diagonal~(Fig.\,\ref{fig:thresh-lp_species}e). For larger $\sca$, all $\muf$
saturate at the same values as in the noiseless case, as expected from the
previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
trajectories now move a much shorter distance through feature space for a
similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
and saturation regime, which increases the likelihood of trajectories crossing
each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given
species are slightly higher in the noisy case, but the variation between
$f_i(t)$ remains largely unchanged.
In summary, even a comparably small set of three features $f_i(t)$ can, in
principle, represent different species-specific songs at distinct points in
feature space, regardless of the presence of noise. However, this only holds
for sufficiently large $\sca$ that allow $f_i(t)$ to reach a saturation regime.
During the transient regime, the species-specific combination of $\muf$ can
very well be the same for two or more different species at specific $\sca$,
although this may be alleviated by the inclusion of additional $f_i(t)$.
Overall, the results of this analysis suggest that $\thr$ should rather be
choosen in favor of a higher SNR ($\thr$ just above pure-noise $c_i(t)$) than a
lower saturation point ($\thr\to0$). First, because this reduces the density of
trajectories through feature space, and second, because the capping of
$\adapt(t)$ by the previous transformation pair likely renders the saturation
point of $f_i(t)$ less relevant.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_thresh_lp_species.pdf}
@@ -968,28 +1041,81 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
\label{fig:thresh-lp_species}
\end{figure}
\FloatBarrier
% \caption{\textbf{Rectification and lowpass filtering improves SNR
% but does not contribute to intensity invariance.}
% Input $\raw(t)$ consists of song component $\soc(t)$ scaled by
% $\sca$ with optional noise component $\noc(t)$ and is
% successively transformed into tympanal signal $\filt(t)$ and
% envelope $\env(t)$. Different line styles indicate different
% cutoff frequencies $\fc$ of the lowpass filter extracting
% $\env(t)$.
% \textbf{Top}:~Example representations of $\filt(t)$ and
% $\env(t)$ for different $\sca$.
% \textbf{a}:~Noiseless case.
% \textbf{b}:~Noisy case.
% \textbf{Bottom}:~Intensity metrics over a range of $\sca$.
% \textbf{c}:~Noiseless case: Standard deviations of $\filt(t)$
% and $\env(t)$.
% \textbf{d}:~Noisy case: Ratios of standard deviations of
% $\filt(t)$ and $\env(t)$ to the respective reference standard
% deviation for input $\raw(t)=\noc(t)$.
% \textbf{e}:~Ratios of standard deviations of $\env(t)$ as in
% \textbf{b} for different species (averaged over songs and
% recordings, see appendix Fig.\,\ref{fig:app_rect-lp}).
% }
\subsection{Intensity invariance along the full model pathway}
Through the previous analyses, we could establish two mechanisms of intensity
invariance: Logarithmic compression and adaptation as well as thresholding and
temporal averaging. While each transformation pair by itself can provide some
level of invariance, certain results suggest that the first mechanism may
actually limit or even nullify the effect of the second mechanism. In the
following sections, we investigate the combined effect of both mechanisms along
the full model pathway~(Fig.\,\ref{fig:pipeline_full}) and explore the
consequences of disabling the first mechanism by skipping the logarithmic
compression step~(Fig.\,\ref{fig:pipeline_short}).
\subsubsection{Including logarithmic compression}
For this analysis, input $\raw(t)$ --- including both song component $\soc(t)$
and noise component $\noc(t)$ --- was rescaled and processed throughout all
steps of the model pathway~(Fig.\,\ref{fig:pipeline_full}a) up to the feature
set $f_i(t)$. As before, the standard deviation was used as intensity metric
for each resulting representation except $b_i(t)$ and $f_i(t)$. For $f_i(t)$,
the average feature value $\muf$ was used, while $b_i(t)$ was omitted from the
analysis. Plotting each intensity metric over
$\sca$~(Fig.\,\ref{fig:pipeline_full}b) reinforces many of the previous
observations. For ease of visualization, the kernel-specific curves for
$c_i(t)$ and $f_i(t)$ were summarized by their median. Representations prior to
logarithmic compression --- $\filt(t)$ and $\env(t)$ --- show a linear increase
of the intensity metric for larger $\sca$ on a double-logarithmic scale.
Representations after logarithmic compression --- $\db(t)$, $\adapt(t)$, and
$c_i(t)$ --- are the first to reach a saturation regime and do so at
approximately the same $\sca$ because they are separated only by linear
transformations. Feature set $f_i(t)$ reaches a saturation regime, as well. But
contrary to previous results, the saturation point of $f_i(t)$ appears below
that of $c_i(t)$, which suggests that the second mechanism of thresholding and
temporal averaging can indeed improve intensity invariance beyond the first
mechanism of logarithmic compression and adaptation. The difference in
saturation points is best illustrated based on the ratio of each intensity
metric to the respective pure-noise reference
value~(Fig.\,\ref{fig:pipeline_full}d). However, compressing $f_i(t)$ into a
median across $k_i(t)$ conceils many kernel-specific details. It is therefore
necessary to consider the development of each $f_i(t)$ over $\sca$
separately~(Fig.\,\ref{fig:pipeline_full}c). Indeed, all 40 $f_i(t)$ in the set
reach a saturation regime for sufficiently large $\sca$. The saturated $\muf$
are distributed over a range of values --- which is the prerequisite for
forming species-specific combinations --- but are limited to a rather small
subset of possible values between 0 and 1. Based on previous
results~(Fig.\,\ref{fig:thresh-lp_single}f), this is likely due to the capping
of $\adapt(t)$ that prevents $f_i(t)$ from reaching its intrinsic saturation
value; but this cannot be confirmed until the following
analysis~(Fig.\,\ref{fig:pipeline_short}). Looking at the kernel-specific SNR
values of $c_i(t)$ over $\sca$~(Fig.\,\ref{fig:pipeline_full}e) and $f_i(t)$
over $\sca$~(Fig.\,\ref{fig:pipeline_full}f) reveals a high degree of variation
between different $k_i(t)$. Certain $f_i(t)$ achieve much higher SNR values
than $c_i(t)$ for the same $\sca$ due to the former's capacity for arbitrarily
low pure-noise responses ($\muf\to0$) and hence arbitrarily high SNR values.
Finally, the question remains whether the suspected improvement of intensity
invariance by $f_i(t)$ beyond $c_i(t)$ holds at the level of individual
$k_i(t)$. The single saturation points based on the median across $k_i(t)$ for
$c_i(t)$ and $f_i(t)$ are expanded into distributions of kernel-specific
saturation points~(Fig.\,\ref{fig:pipeline_full}g). For $c_i(t)$, the
distribution is rather narrow and corresponds well to the single saturation
point based on the median. For $f_i(t)$, however, the distribution is much
broader and is not centered around the single saturation point based on the
median but rather shifted towards lower $\sca$. Care must be taken when
interpreting the height of either distribution due to the logarithmic scaling
of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their
$c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
averaging on intensity invariance is not necessarily nullified by the previous
logarithmic compression and adaptation, which means that both mechanisms can,
in principle, work together towards an intensity-invariant song representation.
% Or does one simply overwrite the other? Can there even be a higher intensity
% invariance based on the sum of both effects? Or does one simply kick in for
% lower scales than the other and thus dictates the overall intensity
% invariance? Whatever, discussion material.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_full_Omocestus_rufipes.pdf}
@@ -1028,6 +1154,50 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
\end{figure}
\FloatBarrier
\subsubsection{Excluding logarithmic compression}
The previous analysis was repeated in exactly the same way as before, except
that the logarithmic compression of $\env(t)$, Eq.\,\ref{eq:log}, was skipped
in order to disable the first mechanism of intensity invariance. Consequently,
$\adapt(t)$ is merely a highpass filtered version of $\env(t)$; and $\db(t)$ is
missing entirely~(Fig.\,\ref{fig:pipeline_short}a). As expected, all
representations prior to the thresholding nonlinearity $\nl$ --- $\filt(t)$,
$\env(t)$, $\adapt(t)$, and $c_i(t)$ --- show a linear increase of the
intensity metric for larger $\sca$, while $f_i(t)$ is the only representation
to reach a saturation regime~(Fig.\,\ref{fig:pipeline_short}bd). The
saturated $\muf$ are distributed over a much broader range of values than in
the previous analysis~(Fig.\,\ref{fig:pipeline_short}c). Intriguingly, the
distribution of $\muf$ is symmetric around a value of 0.5. This is relevant
because every kernel $k^+(t)$ in the underlying kernel set has a counterpart of
opposite sign that is otherwise identical, so that $k^+(t)=-k^-(t)$. The
responses of $k^+(t)$ and $k^-(t)$ to the same input $\adapt(t)$ are also
inverted because convolution is a linear operation: $c^+(t)=-c^-(t)$. The
distributions of $c^+(t)$ and $c^-(t)$ are hence inverted to each other, as
well: $p(c^+)=p(-c^-)$. Based on Eq.\,\ref{eq:feat_prop}, transforming $c^+(t)$
and $c^-(t)$ further using the same $\Theta$ thus results in two complementary
features $f^+(t)$ and $f^-(t)$ that are symmetric around 0.5, so that
$f^+(t)=1-f^-(t)$. Of course, this symmetry throughout the feature
representation goes hand in hand with a substantial degree of redundancy and is
hardly expected to be present in the actual grasshopper auditory system. But
the fact that the saturated $\muf$ are distributed symmetrically around 0.5
provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
saturation value in the absence of logarithmic
compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
the capping of $\adapt(t)$, as seen during previous
analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
Fig.\,\ref{fig:pipeline_full}c). Otherwise, there appear to be no major
differences in the development of $f_i(t)$ over $\sca$ compared to the previous
analysis, neither on the kernel-specific SNR
values~(Fig.\,\ref{fig:pipeline_short}e) nor on the distribution of
kernel-specific saturation points~(Fig.\,\ref{fig:pipeline_short}f). Overall,
the most substantial consequence of skipping the logarithmic compression is
that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
results in a wider range of $\muf$ across the feature set, it should be
benefitial for forming species-specific combinations. However, this depends on
multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
the structure and distribution of the specific song and is hence not
guaranteed simply by disabling logarithmic compression.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_short_Omocestus_rufipes.pdf}
@@ -1065,6 +1235,61 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
\end{figure}
\FloatBarrier
\subsubsection{Field data}
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_field.pdf}
\caption{\textbf{Step-wise emergence of intensity invariant song
representation along the model pathway.}
}
\label{fig:pipeline_field}
\end{figure}
\FloatBarrier
\subsection{Interspecific and intraspecific feature variability}
In the final analysis of the current study, we investigated the variability of
songs in the feature representation between different species and within the
same species~(Fig.\,\ref{fig:feat_cross_species}). Naturally, a feature
representation that is both consistent across different songs of the same
species and sufficiently different between songs of different species is a
fundamental prerequisite for species-specific song recognition. The data used
in this analysis corresponds to the saturated $\muf$ of each $f_i(t)$ from the
previous analysis of the full model pathway~(Fig.\,\ref{fig:pipeline_full}c),
using different songs of \textit{O. rufipes} for the intraspecific comparisons
and single songs from a number of species for the interspecific comparisons
(also shown in Fig.\,\ref{fig:thresh-lp_species}a). Accordingly, each song is
represented by 40 values of $\muf$ based on the same set of $f_i(t)$. For each
comparison, $\muf$ from one song was plotted against $\muf$ from the other
song, so that each dot within a subplot corresponds to a single feature
$f_i(t)$. For the intraspecific
comparisons~(Fig.\,\ref{fig:feat_cross_species}, upper triangular), the pairs
of $\muf$ are distributed closely around the diagonal, with a minimum
correlation coefficient of $\rho=0.85$, a maximum of $\rho=0.99$, and a median
of $\rho=0.92$. A given $f_i(t)$ thus tends to have a similar $\muf$ across
different songs of the same species. In contrast, the pairs of $\muf$ for the
interspecific comparisons~(Fig.\,\ref{fig:feat_cross_species}, lower
triangular) are distributed in a variety of different ways, most in broader
clouds (e.g. \textit{C. biguttulus} vs. \textit{C. mollis}) but some more
narrowly around the diagonal (e.g. \textit{P. parallelus} vs. \textit{C.
dispar}). The correlation coefficients $\rho$ vary widely between different
interspecific comparisons, with a minimum of $\rho=-0.1$, a maximum of
$\rho=0.92$, and a median of $\rho=0.53$. A given $f_i(t)$ therefore tends to
have a less similar $\muf$ across different species than within the same
species, although certain exeptions exist~(Fig.\,\ref{fig:feat_cross_species},
lower right). Accordingly, the feature representation that is generated by the
model pathway is, in principle, suitable for the distinction between different
species-specific songs. However, even the songs of the same species are subject
to considerable variability in various aspects and depending on a multitude of
external and internal factors, which cannot be fully captured based on a
limited number of songs. The results of the current analysis are hence to be
treated as a proof-of-concept that paves the way towards more comprehensive
investigations on the details of song representation in feature space,
including the effects of different parameters of the model pathway as well as
the inclusion of additional songs and species to reflect the complexity of
natural song variation.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_features_cross_species.pdf}
@@ -1086,7 +1311,7 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
\textbf{Upper triangular}:~Intraspecific comparisons
between different songs of a single species (\textit{O.
rufipes}).
\textbf{Lower left}:~Distribution of correlation
\textbf{Lower right}:~Distribution of correlation
coefficients $\rho$ for each interspecific and
intraspecific comparison. Dots indicate single $\rho$
values.
@@ -1095,16 +1320,6 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
\end{figure}
\FloatBarrier
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_field.pdf}
\caption{\textbf{Step-wise emergence of intensity invariant song
representation along the model pathway.}
}
\label{fig:pipeline_field}
\end{figure}
\FloatBarrier
\section{Conclusions \& outlook}
\textbf{Song recognition pathway: Grasshopper vs. model:}\\