Wrote results rect-lp and log-hp :)

Finished some more figure captions.
This commit is contained in:
j-hartling
2026-05-04 19:50:04 +02:00
parent 69f172ff2c
commit 16014c02a0
15 changed files with 1376 additions and 1232 deletions

336
main.tex
View File

@@ -33,7 +33,7 @@
%\bibstyle
%\citation
\title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system}
\title{Emergent intensity invariance vs. signal-to-noise ratio at three consecutive processing stages along the grasshopper song recognition pathway}
\author{Jona Hartling, Jan Benda}
\date{}
@@ -403,7 +403,7 @@ pathway, logarithmic compression is achieved by conversion to decibel scale
\db(t)\,=\,20\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,1
\label{eq:log}
\end{equation}
relative to the maximum intensity $\dbref$ of the signal envelope $\env(t)$.
relative to the common reference intensity $\dbref$.
Both the receptor neurons~(\bcite{romer1976informationsverarbeitung};
\bcite{gollisch2004input}; \bcite{fisch2012channel}) and, on a larger scale,
the subsequent local interneurons~(\bcite{hildebrandt2009origin};
@@ -555,7 +555,7 @@ can be read out by a simple linear classifier.
\end{figure}
\FloatBarrier
\section{Two mechanisms driving the emergence of intensity-invariant song representation}
\section{Mechanisms driving the emergence of\\intensity-invariant song representation}
% Still missing the SNR analysis. Should be able to write around it for now.
The robustness of song recognition is tied to the degree of intensity
@@ -573,6 +573,54 @@ specific operations involved, as outlined in the following sections.
\subsection{Full-wave rectification \& lowpass filtering}
The first nonlinear transformation along the model pathway is the full-wave
rectification of the tympanal signal $\filt(t)$ during the extraction of the
signal envelope (Eq.\,\ref{eq:env}). Rectification transforms the distribution
of $\filt(t)$ from an approximately zero-centered distribution with both
positive and negative values into a strictly non-negative distribution. Signal
envelope $\env(t)$ is then obtained by lowpass filtering the rectified
$\filt(t)$. The effects of this transformation pair on SNR and potential
intensity invariance were analyzed by rescaling and processing the input signal
$\raw(t)$ and comparing standard deviations between the resulting $\filt(t)$
and $\env(t)$, once for the noiseless case~(Fig.\,\ref{fig:rect-lp}a) and once
for the noisy case~(Fig.\,\ref{fig:rect-lp}b). In addition, the cutoff
frequency $\fc$ of the lowpass filter was varied to investigate the influence
of different filter bandwidths. In the noiseless case, the standard deviations
of $\filt(t)$ and $\env(t)$ are each reduced compared to the input $\raw(t)$ by
a multiplicative factor. These factors are constant across all $\sca$, which
results in a downward shift of the respective curve on a double-logarithmic
scale, away from the diagonal~(Fig.\,\ref{fig:rect-lp}c). For $\filt(t)$, the
reduction is a consequence of the bandpass filtering~(Eq.\,\ref{eq:bandpass})
of $\raw(t)$. For $\env(t)$, the standard deviation is further reduced compared
to $\filt(t)$. Rectification contributes much less to this reduction than
lowpass filtering. The degree of reduction by lowpass filtering depends on the
cutoff frequency $\fc$, with lower $\fc$ (narrow bandwidth) resulting in a
stronger reduction. In the noisy case, the standard deviations of $\filt(t)$
and $\env(t)$ can be related to the respective pure-noise reference standard
deviation~(Fig.\,\ref{fig:rect-lp}d). This causes each curve to start with a
constant regime of SNR values near 1 for smaller $\sca$, which reflects the
dominance of the noise component $\noc(t)$ over the song component $\soc(t)$ in
the input $\raw(t)$. For larger $\sca$, all curves transition into a regime of
linearly increasing SNR on a double-logarithmic scale. For $\filt(t)$, the
linear part of the curve deviates only slightly from the diagonal. For
$\env(t)$, however, the transition occurs at lower $\sca$ compared to
$\filt(t)$, and the linear part of the curve is shifted leftward away from the
diagonal, which means that higher SNR values are achieved for the same $\sca$.
This effect is more pronounced for lower $\fc$ of the lowpass filter and is
presumably caused by the attenuation of high-frequency components in the
signal, which are more prominent in the noise component $\noc(t)$ than in the
song component $\soc(t)$. The effect also appears relatively consistent across
different species, although small variations based on different song structures
and distributions exist~(Fig.\,\ref{fig:rect-lp}e). In summary, the standard
deviation of $\env(t)$ has never been observed to transition into a saturation
regime for larger $\sca$ but rather continues to increase proportionally to
$\sca$ for all tested $\fc$, in both the noiseless and the noisy case and
across different species. Consequently, the combination of rectification and
lowpass filtering does not contribute to intensity invariance. However, this
transformation pair does improve the SNR of $\env(t)$ relative to $\filt(t)$
and thus provides subsequent processing stages with a more robust input
representation and higher input SNR.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_rect_lp.pdf}
@@ -605,73 +653,113 @@ specific operations involved, as outlined in the following sections.
\subsection{Logarithmic compression \& spike-frequency adaptation}
The first notable emergence of intensity invariance along the model pathway
occurs during the transformation of the signal envelope $\env(t)$ into the
logarithmically scaled envelope $\db(t)$ and then into the intensity-adapted
envelope $\adapt(t)$. In order to disentangle the interplay of logarithmic
compression and adaptation, $\env(t)$ can be rewritten as a synthetic mixture
The second nonlinear transformation along the model pathway is the logarithmic
compression of the signal envelope $\env(t)$ into $\db(t)$, Eq.\,\ref{eq:log},
which is then followed by the highpass filtering of $\db(t)$,
Eq.\,\ref{eq:highpass}, to obtain the intensity-adapted envelope $\adapt(t)$.
The interplay of this transformation pair was analyzed by rescaling and
processing the input signal $\filt(t)$ and comparing standard deviations
between the resulting $\env(t)$, $\db(t)$, and $\adapt(t)$. It is necessary to
use $\filt(t)$ as input for this analysis instead of $\env(t)$, because
$\env(t)$ results from a nonlinear transformation and hence cannot be
synthesized as an additive mixture of song component $\soc(t)$ and noise
component $\noc(t)$. % <-- Sentence may be methods section material.
However, it is much easier to conceive a mathematical description of the
effects of logarithmic compression and adaptation if $\env(t)$ itself is
assumed to be composed of $\soc(t)$ and $\noc(t)$. In the noiseless
case~(Fig.\,\ref{fig:log-hp}a), $\env(t)$ takes the form of
\begin{equation}
\env(t)\,=\,\sca\,\cdot\,\soc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
\label{eq:toy_env_pure}
\end{equation}
The standard deviation of $\env(t)$ increases linearly with $\sca$ on a
double-logarithmic scale and is slightly reduced~(Fig.\,\ref{fig:log-hp}c)
compared to the input $\filt(t)$, which is consistent with the results of the
previous analysis~(Fig.\,\ref{fig:rect-lp}c). By conversion of $\env(t)$ to
decibel scale, $\sca$ turns from a multiplicative scale in linear space into an
additive term, or offset, in logarithmic space:
\begin{equation}
\db(t)\,=\,20\,\cdot\,\dec \left[\,\sca\,\cdot\,s(t)\,\right]\,=\,20\,\cdot\,\left[\dec \sca\,+\,\dec s(t)\right], \qquad \sca\,>\,0
\label{eq:toy_log_pure}
\end{equation}
The highpass filtering of $\db(t)$ can be approximated as a subtraction of the
local signal offset within a suitable time interval $0 \ll \thp <
\frac{1}{\fc}$:
\begin{equation}
\begin{split}
\adapt(t)\,\approx\,\db(t)\,-\,20\,\cdot\,\dec \sca\,=\,20\,\cdot\,\dec s(t)
\end{split}
\label{eq:toy_highpass_pure}
\end{equation}
This eliminates $\sca$ from $\adapt(t)$ and thus renders it perfectly
intensity-invariant, with a constant standard deviation of around 10\,dB across
all $\sca>0$~(Fig.\,\ref{fig:log-hp}c). In contrast, in the noisy
case~(Fig.\,\ref{fig:log-hp}b), $\env(t)$ takes the form of
\begin{equation}
\env(t)\,=\,\sca\,\cdot\,\soc(t)\,+\,\noc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
\label{eq:toy_env}
\label{eq:toy_env_noise}
\end{equation}
of a song component $\soc(t)$ with variable multiplicative scale $\sca\geq0$
and a fixed-scale noise component $\noc(t)$. Both $\soc(t)$ and $\noc(t)$ are
assumed to have unit variance. By conversion of $\env(t)$ to decibel
scale~(Eq.\,\ref{eq:log}), $\sca$ turns from a multiplicative scale in linear
space into an additive term, or offset, in logarithmic space
Similar to the previous analysis~(Fig.\,\ref{fig:rect-lp}d), the ratio of the
standard deviation of $\env(t)$ to its pure-noise reference standard deviation
on a double-logarithmic scale follows a constant regime for small $\sca$ and a
linearly increasing regime for larger $\sca$~(Fig.\,\ref{fig:log-hp}d). Decibel
conversion of $\env(t)$
% \begin{equation}
% \begin{split}
% \db(t)\,&=\,\dec \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
% &=\,\dec \frac{\alpha}{\dbref}\,+\,\dec \left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right], \qquad \sca\,>\,0
% \db(t)\,&=\,20\,\cdot\,\dec \left[\,\sca\,\cdot\,s(t)\,+\,\eta(t)\,\right]\\
% &=\,20\,\cdot\,\left(\dec \sca\,+\,\dec \left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]\right), \qquad \sca\,>\,0
% \end{split}
% \label{eq:toy_log}
% \label{eq:toy_log_noise}
% \end{equation}
\begin{equation}
\begin{split}
\db(t)\,&=\,20\,\cdot\,\dec \left[\,\sca\,\cdot\,s(t)\,+\,\eta(t)\,\right]\\
&=\,20\,\cdot\,\left(\dec \sca\,+\,\dec \left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]\right), \qquad \sca\,>\,0
\end{split}
\label{eq:toy_log}
\db(t)\,=\,20\,\cdot\,\left(\dec \sca\,+\,\dec \left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]\right), \qquad \sca\,>\,0
\label{eq:toy_log_noise}
\end{equation}
allows for the separation of $\sca$ from $\soc(t)$ but introduces a scaling of
$\noc(t)$ by the inverse of $\sca$, which remains present even after the offset
subtraction:
\begin{equation}
\begin{split}
\adapt(t)\,\approx\,20\,\cdot\,\dec\left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]
\end{split}
\label{eq:toy_highpass_noise}
\end{equation}
which allows for its separation from $\soc(t)$ but introduces a scaling of
$\noc(t)$ by the inverse of $\sca$. The subsequent
highpass filtering~(Eq.\,\ref{eq:highpass}) of $\db(t)$ can then be
approximated as a subtraction of the local offset within a suitable time
interval $0 \ll \thp < \frac{1}{\fc}$:
% \begin{equation}
% \begin{split}
% \adapt(t)\,\approx\,\db(t)\,-\,\dec \frac{\sca}{\dbref}\,=\,\dec\left[s(t)\,+\,\frac{\eta(t)}{\sca}\right], \qquad \sca\,>\,0
% \adapt(t)\,\approx\,\db(t)\,-\,20\,\cdot\,\dec \sca\,=\,20\,\cdot\,\dec\left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]
% \end{split}
% \label{eq:toy_highpass}
% \label{eq:toy_highpass_noise}
% \end{equation}
\begin{equation}
\begin{split}
\adapt(t)\,\approx\,\db(t)\,-\,20\,\cdot\,\dec \sca\,=\,20\,\cdot\,\dec\left[s(t)\,+\,\frac{\eta(t)}{\sca}\right], \qquad \sca\,>\,0
\end{split}
\label{eq:toy_highpass}
\end{equation}
This means that $\sca$ cannot be entirely eliminated from $\adapt(t)$, only
redistributed between $\soc(t)$ and $\noc(t)$. In consequence, if $\sca$ is
sufficiently large ($\sca\gg1$), $\noc(t)$ is attenuated to the point of being
negligible, so that $\adapt(t)$ represents $\soc(t)$ in a scale-free manner. If
$\soc(t)$ and $\noc(t)$ are at similar scales ($\sca\approx1$), $\adapt(t)$
largely resembles $\db(t)$. However, if $\sca$ is sufficiently small
($\sca\ll1$), $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation.
Therefore, the effective intensity invariance of $\adapt(t)$ relative to
$\env(t)$ is limited by the initial scaling of $\soc(t)$ relative to $\noc(t)$;
that is, the signal-to-noise ratio (SNR) of $\env(t)$ with ($\sca>0$) and
without ($\sca=0$) song component $\soc(t)$
\begin{equation}
\text{SNR}(\sca)\,=\,\frac{\xvar}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1, \qquad \svar\,=\,\nvar\,=\,1
\label{eq:toy_snr}
\end{equation}
which depends quadratically on $\sca$ if $\soc(t)\perp\noc(t)$. Overall, the
combination of logarithmic compression and adaptation allows for the
equalization of different sufficiently large song scales, which is essential
for intensity-invariant song representation. However, this mechanism is unable
to recover songs that have already sunken below the noise floor, which
emphasizes the importance of a sufficiently high SNR at the intial reception of
the signal for reliable song recognition.
This means that, in the noisy case, $\sca$ cannot be entirely eliminated from
$\adapt(t)$, only redistributed between $\soc(t)$ and $\noc(t)$. If $\sca$ is
sufficiently large ($\sca\gg1$, saturation regime), $\noc(t)$ is attenuated to
the point of being negligible, so that $\adapt(t)$ is a scale-free
representation of $\soc(t)$. If $\sca$ and $\noc(t)$ are at similar scales
($\sca\approx1$, transient regime), $\adapt(t)$ largely resembles $\db(t)$.
Finally, if $\sca$ is sufficiently small ($0<\sca\ll1$, noise regime),
$\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the
effective intensity invariance of $\adapt(t)$ through logarithmic compression
and adaptation is limited by the SNR of $\env(t)$: Songs that have already
sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
subsequent processing steps, which emphasizes the importance of the SNR
improvement by rectification and lowpass filtering during the previous
processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise
regime, transient regime, and saturation regime remains consistent across
different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of
$\sca$ at which the saturation regime is reached (see appendix
Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$
within the saturation regime vary considerably between and within species. For
example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
lower maximum SNR of $\adapt(t)$ compared to other species. These differences
are not to be underestimated, since the SNR of $\adapt(t)$ within the
saturation regime determines the maximum input SNR for subsequent processing
steps. In other words, the fact that $\adapt(t)$ eventually reaches a
saturation regime is, of course, desirable in the context of intensity
invariance, but it also means to pass up on the higher SNR values that are
achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
--- and the consequences it has further downstream along the pathway --- are
adressed in the following sections.
\begin{figure}[!ht]
\centering
@@ -744,30 +832,6 @@ the signal for reliable song recognition.
\end{figure}
\FloatBarrier
% \caption{\textbf{Rectification and lowpass filtering improves SNR
% but does not contribute to intensity invariance.}
% Input $\raw(t)$ consists of song component $\soc(t)$ scaled by
% $\sca$ with optional noise component $\noc(t)$ and is
% successively transformed into tympanal signal $\filt(t)$ and
% envelope $\env(t)$. Different line styles indicate different
% cutoff frequencies $\fc$ of the lowpass filter extracting
% $\env(t)$.
% \textbf{Top}:~Example representations of $\filt(t)$ and
% $\env(t)$ for different $\sca$.
% \textbf{a}:~Noiseless case.
% \textbf{b}:~Noisy case.
% \textbf{Bottom}:~Intensity metrics over a range of $\sca$.
% \textbf{c}:~Noiseless case: Standard deviations of $\filt(t)$
% and $\env(t)$.
% \textbf{d}:~Noisy case: Ratios of standard deviations of
% $\filt(t)$ and $\env(t)$ to the respective reference standard
% deviation for input $\raw(t)=\noc(t)$.
% \textbf{e}:~Ratios of standard deviations of $\env(t)$ as in
% \textbf{b} for different species (averaged over songs and
% recordings, see appendix Fig.\,\ref{fig:app_rect-lp}).
% }
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_thresh_lp_species.pdf}
@@ -796,17 +860,67 @@ the signal for reliable song recognition.
curve span of the norm across all three $\mu_{f_i}$ per
species.
\textbf{d}:~Noiseless case.
\textbf{e}:~Noisy case. Shaded areas
\textbf{e}:~Noisy case. Shaded areas indicate the average
minimum $\mu_{f_i}$ across all species-specific trajectories.
}
\label{fig:thresh-lp_species}
\end{figure}
\FloatBarrier
% \caption{\textbf{Rectification and lowpass filtering improves SNR
% but does not contribute to intensity invariance.}
% Input $\raw(t)$ consists of song component $\soc(t)$ scaled by
% $\sca$ with optional noise component $\noc(t)$ and is
% successively transformed into tympanal signal $\filt(t)$ and
% envelope $\env(t)$. Different line styles indicate different
% cutoff frequencies $\fc$ of the lowpass filter extracting
% $\env(t)$.
% \textbf{Top}:~Example representations of $\filt(t)$ and
% $\env(t)$ for different $\sca$.
% \textbf{a}:~Noiseless case.
% \textbf{b}:~Noisy case.
% \textbf{Bottom}:~Intensity metrics over a range of $\sca$.
% \textbf{c}:~Noiseless case: Standard deviations of $\filt(t)$
% and $\env(t)$.
% \textbf{d}:~Noisy case: Ratios of standard deviations of
% $\filt(t)$ and $\env(t)$ to the respective reference standard
% deviation for input $\raw(t)=\noc(t)$.
% \textbf{e}:~Ratios of standard deviations of $\env(t)$ as in
% \textbf{b} for different species (averaged over songs and
% recordings, see appendix Fig.\,\ref{fig:app_rect-lp}).
% }
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_full_Omocestus_rufipes.pdf}
\caption{\textbf{Step-wise emergence of intensity invariant song
representation along the model pathway.}
\caption{\textbf{Step-wise emergence of intensity-invariant song
representation along the full model pathway.}
Input $\raw(t)$ consists of song component $\soc(t)$
scaled by $\sca$ with added noise component $\noc(t)$ and
is processed up to the feature set $f_i(t)$. Different
color shades indicate different types of Gabor kernels
with specific lobe number $\kn$ and either $+$ or $-$
sign, sorted (dark to light) first by increasing $\kn$ and
then by sign~($1\,\leq\,\kn\,\leq\,4$; first $+$, then $-$
for each $\kn$; five kernel widths $\kw$ of 1, 2, 4, 8,
and $16\,$ms per type; 8 types, 40 kernels in total).
\textbf{a}:~Example representations of $\filt(t)$,
$\env(t)$, $\db(t)$, $\adapt(t)$, $c_i(t)$, and $f_i(t)$
for different $\sca$.
\textbf{b}:~Intensity metrics over $\sca$. For $c_i(t)$
and $f_i(t)$, the median over kernels is shown. Dots
indicate $95\,\%$ curve span for $\db(t)$, $\adapt(t)$,
$c_i(t)$, and $f_i(t)$.
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
$f_i(t)$ over $\sca$.
\textbf{d}:~Ratios of intensity metrics to the respective
reference value for input $\raw(t)=\noc(t)$. For $c_i(t)$
and $f_i(t)$, the median over kernel-specific ratios is
shown.
\textbf{e}:~Ratios of standard deviation $\sigma_{c_i}$ of
each $c_i(t)$.
\textbf{f}:~Ratios of $\mu_{f_i}$.
\textbf{g}:~Distributions of kernel-specific $\sca$ that
correspond to $95\,\%$ curve span for $c_i(t)$ and
$f_i(t)$. Dots indicate the values from \textbf{b}.
}
\label{fig:pipeline_full}
\end{figure}
@@ -816,7 +930,34 @@ the signal for reliable song recognition.
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_short_Omocestus_rufipes.pdf}
\caption{\textbf{Step-wise emergence of intensity invariant song
representation along the model pathway.}
representation along the model pathway without logarithmic
compression.}
Input $\raw(t)$ consists of song component $\soc(t)$
scaled by $\sca$ with added noise component $\noc(t)$ and
is processed up to the feature set $f_i(t)$, skipping
$\db(t)$. Different color shades indicate different types
of Gabor kernels with specific lobe number $\kn$ and
either $+$ or $-$ sign, sorted (dark to light) first by
increasing $\kn$ and then by
sign~($1\,\leq\,\kn\,\leq\,4$; first $+$, then $-$ for
each $\kn$; five kernel widths $\kw$ of 1, 2, 4, 8, and
$16\,$ms per type; 8 types, 40 kernels in total).
\textbf{a}:~Example representations of $\filt(t)$,
$\env(t)$, $\adapt(t)$, $c_i(t)$, and $f_i(t)$ for
different $\sca$.
\textbf{b}:~Intensity metrics over $\sca$. For $c_i(t)$
and $f_i(t)$, the median over kernels is shown. Dots
indicate $95\,\%$ curve span for $f_i(t)$.
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
$f_i(t)$ over $\sca$.
\textbf{d}:~Ratios of intensity metrics to the respective
reference value for input $\raw(t)=\noc(t)$. For $c_i(t)$
and $f_i(t)$, the median over kernel-specific ratios is
shown.
\textbf{e}:~Ratios of $\mu_{f_i}$.
\textbf{f}:~Distribution of kernel-specific $\sca$ that
correspond to $95\,\%$ curve span for $f_i(t)$. Dots
indicate the value from \textbf{b}.
}
\label{fig:pipeline_short}
\end{figure}
@@ -825,7 +966,28 @@ the signal for reliable song recognition.
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_features_cross_species.pdf}
\caption{\textbf{Inter- and intraspecific feature variability.}
\caption{\textbf{Interspecific and intraspecific feature variability.}
Average value $\mu_{f_i}$ of each feature $f_i(t)$ against
its counterpart from a 2nd feature set based on a
different input $\raw(t)$. Each dot within a subplot
represents a single feature $f_i(t)$. Different color
shades indicate different types of Gabor kernels with
specific lobe number $\kn$ and either $+$ or $-$ sign,
sorted (dark to light) first by increasing $\kn$ and then
by sign~($1\,\leq\,\kn\,\leq\,4$; first $+$, then $-$ for
each $\kn$; five kernel widths $\kw$ of 1, 2, 4, 8, and
$16\,$ms per type; 8 types, 40 kernels in total). Data is
based on the analysis underlying
Fig\,\ref{fig:pipeline_full}.
\textbf{Lower triangular}:~Interspecific comparisons
between single songs of different species.
\textbf{Upper triangular}:~Intraspecific comparisons
between different songs of a single species (\textit{O.
rufipes}).
\textbf{Lower left}:~Distribution of correlation
coefficients $\rho$ for each interspecific and
intraspecific comparison. Dots indicate single $\rho$
values.
}
\label{fig:feat_cross_species}
\end{figure}