Wrote results for pipeline_full, pipeline_short, and feat_cross_species.
This commit is contained in:
365
main.tex
365
main.tex
@@ -105,6 +105,7 @@
|
||||
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
|
||||
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
|
||||
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
|
||||
\newcommand{\muf}{\mu_{f_i}} % Average feature value
|
||||
|
||||
\section{Exploring a grasshopper's sensory world}
|
||||
|
||||
@@ -312,7 +313,9 @@ within the auditory pathway.
|
||||
% - How to integrate the available knowledge on anatomy, physiology, ethology?\\
|
||||
% $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
|
||||
|
||||
\section{Developing a functional model of the\\grasshopper song recognition pathway}
|
||||
\section{Methods}
|
||||
|
||||
\subsection{Functional model of the grasshopper song recognition pathway}
|
||||
|
||||
% Too long (no splitting, only pruning).
|
||||
The essence of constructing a functional model of a given system is to gain a
|
||||
@@ -373,7 +376,7 @@ outlined in the following sections.
|
||||
\label{fig:pathway}
|
||||
\end{figure}
|
||||
|
||||
\subsection{Population-driven signal preprocessing}
|
||||
\subsubsection{Population-driven signal preprocessing}
|
||||
|
||||
Grasshoppers receive airborne sound waves by a tympanal organ at either side of
|
||||
the body. The tympanal membrane acts as a mechanical resonance filter for
|
||||
@@ -436,7 +439,7 @@ following feature extraction stage.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\subsection{Feature extraction by individual neurons}
|
||||
\subsubsection{Feature extraction by individual neurons}
|
||||
|
||||
The ascending neurons extract and encode a number of different features of the
|
||||
preprocessed signal. As a population, they hence represent the signal in a
|
||||
@@ -555,7 +558,11 @@ can be read out by a simple linear classifier.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\section{Mechanisms driving the emergence of\\intensity-invariant song representation}
|
||||
\subsubsection{Simulation-based analysis of the model pathway}
|
||||
|
||||
\section{Results}
|
||||
|
||||
\subsection{Mechanisms driving the emergence of intensity invariance}
|
||||
|
||||
% Still missing the SNR analysis. Should be able to write around it for now.
|
||||
The robustness of song recognition is tied to the degree of intensity
|
||||
@@ -571,7 +578,7 @@ intensity variations. The two mechanisms each comprise a nonlinear signal
|
||||
transformation followed by a linear signal transformation but differ in the
|
||||
specific operations involved, as outlined in the following sections.
|
||||
|
||||
\subsection{Full-wave rectification \& lowpass filtering}
|
||||
\subsubsection{Full-wave rectification \& lowpass filtering}
|
||||
|
||||
The first nonlinear transformation along the model pathway is the full-wave
|
||||
rectification of the tympanal signal $\filt(t)$ during the extraction of the
|
||||
@@ -651,7 +658,7 @@ more robust input representation and higher input SNR.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\subsection{Logarithmic compression \& spike-frequency adaptation}
|
||||
\subsubsection{Logarithmic compression \& spike-frequency adaptation}
|
||||
|
||||
The second nonlinear transformation along the model pathway is the logarithmic
|
||||
compression of the signal envelope $\env(t)$ into $\db(t)$, Eq.\,\ref{eq:log},
|
||||
@@ -794,7 +801,7 @@ is a recurring phenomenon that is further addressed in the following sections.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\subsection{Thresholding nonlinearity \& temporal averaging}
|
||||
\subsubsection{Thresholding nonlinearity \& temporal averaging}
|
||||
|
||||
The third nonlinear transformation along the model pathway is the thresholding
|
||||
nonlinearity $\nl$ that transforms each kernel response $c_i(t)$ into a binary
|
||||
@@ -809,13 +816,13 @@ rescaled~(Fig.\,\ref{fig:thresh-lp_single}a) and convolved with kernel $k(t)$.
|
||||
The resulting kernel response $c(t)$ was passed through $H(c\,-\,\Theta)$ with
|
||||
three different threshold values
|
||||
$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}b-d). Each resulting binary response
|
||||
$b(t)$ was transformed into $f(t)$, whose average feature value serves as a
|
||||
measure of intensity~(Fig.\,\ref{fig:thresh-lp_single}ef). The thresholding
|
||||
nonlinearity $H(c\,-\,\Theta)$ categorizes the values of $c(t)$ into "relevant"
|
||||
($c(t)>\Theta$, $b(t)=1$) and "irrelevant" ($c(t)\leq\Theta$, $b(t)=0$)
|
||||
response values. It thereby splits the probability density $\pc$ of $c(t)$
|
||||
within some observed time interval $T$ into two complementary parts around
|
||||
$\Theta$:
|
||||
$b(t)$ was transformed into $f(t)$, whose average feature value $\mu_f$ serves
|
||||
as a measure of intensity~(Fig.\,\ref{fig:thresh-lp_single}ef). The
|
||||
thresholding nonlinearity $H(c\,-\,\Theta)$ categorizes the values of $c(t)$
|
||||
into "relevant" ($c(t)>\Theta$, $b(t)=1$) and "irrelevant" ($c(t)\leq\Theta$,
|
||||
$b(t)=0$) response values. It thereby splits the probability density $\pc$ of
|
||||
$c(t)$ within some observed time interval $T$ into two complementary parts
|
||||
around $\Theta$:
|
||||
\begin{equation}
|
||||
\int_{\Theta}^{+\infty} \pc\,dc\,=\,1\,-\,\int_{-\infty}^{\Theta} \pc\,dc\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc\,=\,1
|
||||
\label{eq:pdf_split}
|
||||
@@ -856,45 +863,45 @@ points at which $c(t)$ crosses $\Theta$: The steeper the slope of $c(t)$, the
|
||||
less $T_1$ changes with variations in $\sca$. The most reliable way of
|
||||
exploiting this invariant porperty of $f(t)$ is to set $\Theta$ to a value near
|
||||
0, because these values are least affected by different scales of $c(t)$. For
|
||||
sufficiently large $\sca$, $f(t)$ then approaches the same constant value in
|
||||
sufficiently large $\sca$, $f(t)$ then approaches the same constant $\mu_f$ in
|
||||
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
|
||||
saturation regime).
|
||||
|
||||
The value of $f(t)$ in the saturation regime is independent of the precise
|
||||
The value of $\mu_f$ in the saturation regime is independent of the precise
|
||||
value of $\Theta$, but the value of $\sca$ at which the saturation regime is
|
||||
reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
|
||||
a threshold value of $\Theta=0$ would be the optimal choice for achieving
|
||||
intensity invariance at the lowest possible $\sca$. In stark contrast, the
|
||||
closer $\Theta$ is to 0, the higher the pure-noise response of $f(t)$ and the
|
||||
lower the resulting SNR of $f(t)$ between noise regime and saturation
|
||||
regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
|
||||
Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
|
||||
closer $\Theta$ is to 0, the higher $\mu_f$ in response to the pure noise
|
||||
component $\noc(t)$ and the lower the resulting SNR of $f(t)$ between noise
|
||||
regime and saturation regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column,
|
||||
and Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
|
||||
"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
|
||||
pure-noise $c(t)$, so that any value of $f(t)$ greater than 0 indicates the
|
||||
presence of the song component $\soc(t)$ in input $\adapt(t)$ at the cost of
|
||||
requiring a higher $\sca$ to reach the saturation regime. This trade-off
|
||||
between intensity invariance and SNR has already been observed during the
|
||||
previous analysis on logarithmic compression and
|
||||
adaptation~(Fig.\,\ref{fig:log-hp}d). However, the parameters that determine
|
||||
the SNR of $\adapt(t)$ are much less understood and likely relate to properties
|
||||
of the signal, whereas the SNR of $f(t)$ depends on the choice of $\Theta$ and
|
||||
can be more directly manipulated by the system.
|
||||
pure-noise $c(t)$, so that any $\mu_f>0$ indicates the presence of the song
|
||||
component $\soc(t)$ in input $\adapt(t)$ at the cost of requiring a higher
|
||||
$\sca$ to reach the saturation regime. This trade-off between intensity
|
||||
invariance and SNR has already been observed during the previous analysis on
|
||||
logarithmic compression and adaptation~(Fig.\,\ref{fig:log-hp}d). However, the
|
||||
parameters that determine the SNR of $\adapt(t)$ are much less understood and
|
||||
likely relate to properties of the signal, whereas the SNR of $f(t)$ depends on
|
||||
the choice of $\Theta$ and can be more directly manipulated by the system.
|
||||
|
||||
Finally, the effects of thresholding and temporal averaging must be seen in the
|
||||
context of the previous transformation pair of logarithmic compression and
|
||||
adaptation.
|
||||
|
||||
Finally, the question remains whether the intensity-invariant output $\adapt(t)$
|
||||
of the previous transformation pair allows feature
|
||||
|
||||
Finally, the output $\adapt(t)$ of the previous transformation
|
||||
pair~(Fig.\,\ref{fig:log-hp}cd) can be related to the input $\adapt(t)$ of the
|
||||
current transformation pair by plotting the values of $f(t)$ over the standard
|
||||
deviation of input $\adapt(t)$ instead of
|
||||
$\sca$~(Fig.\,\ref{fig:thresh-lp_single}f). This is relevant because, unlike
|
||||
$\sca$, the standard deviation of $\adapt(t)$ is capped to a maximum value of
|
||||
around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
|
||||
|
||||
adaptation: In the current analysis, the input $\adapt(t)$ can be rescaled by
|
||||
arbitrarily large $\sca$, while in the full pathway, the current input
|
||||
$\adapt(t)$ is the output $\adapt(t)$ of the previous transformation pair and
|
||||
is hence capped to a maximum standard deviation of around
|
||||
10\,dB~(Fig.\,\ref{fig:log-hp}cd). This can be illustrated by plotting $\mu_f$
|
||||
not over $\sca$~(Fig.\,\ref{fig:thresh-lp_single}e) but over the standard
|
||||
deviation of input $\adapt(t)$ instead~(Fig.\,\ref{fig:thresh-lp_single}f). It
|
||||
becomes apparent that $\mu_f$ saturates only for standard deviations of
|
||||
$\adapt(t)$ that would already be capped. Accordingly, $f(t)$ never reaches the
|
||||
saturation regime as determined by the current transformation pair but rather
|
||||
adheres to the saturation regime determined by the previous transformation
|
||||
pair. In this case, the saturated $\mu_f$ is not independent of $\Theta$
|
||||
anymore. The consequences of this interaction between the two mechanisms of
|
||||
intensity invariance are further explored in a later section.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
@@ -934,6 +941,72 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\subsection{Intensity invariance of species-specific feature representations}
|
||||
|
||||
Having established both the meaning of the feature value and the mechanism of
|
||||
intensity invariance by thresholding and temporal averaging, the question
|
||||
remains how this mechanism acts on a set of features $f_i(t)$ based on
|
||||
different species-specific songs~(Fig.\,\ref{fig:thresh-lp_species}a). The
|
||||
previous analysis was repeated with three different kernels $k_i(t)$ using a
|
||||
single kernel-specific threshold value $\thr$; and the resulting average
|
||||
feature values $\muf$ were plotted over
|
||||
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}bc). Additionally, 2D feature spaces
|
||||
spanned by each pair of $f_i(t)$ were plotted to investigate the separability
|
||||
of species-specific songs based on the feature representation in dependence of
|
||||
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}de). Each species-specific combination
|
||||
of $\muf$ follows a trajectory through feature space that develops with $\sca$.
|
||||
These trajectories correspond to the transient regime between the constant
|
||||
(noise) regime and the saturation regime, which are only visible as the start
|
||||
and end points of the trajectories, respectively. The horizontal dashes in the
|
||||
colorbars indicate the range of $\sca$ that corresponds to the transient regime
|
||||
across $f_i(t)$ for each species.
|
||||
|
||||
In the noiseless case, each $\muf$ is 0 for small $\sca$ across all
|
||||
species~(Fig.\,\ref{fig:thresh-lp_species}b) because $c_i(t)$ never exceeds
|
||||
$\thr$. Accordingly, each trajectory starts at the origin of the feature
|
||||
space~(Fig.\,\ref{fig:thresh-lp_species}d). For larger $\sca$, all $\muf$
|
||||
saturate at individual values whose combination differs between species, so
|
||||
that the songs of each species are eventually represented by distinct points in
|
||||
feature space. However, the species-specific trajectories cross each other at
|
||||
numerous points, which means that the songs of two species --- each at a
|
||||
specific $\sca$ --- can result in the same combination of $\muf$. Furthermore,
|
||||
the specific value of $\sca$ at which $\muf$ saturates depends on $f_i(t)$ and
|
||||
the species: For \textit{C. mollis}, all $\muf$ saturate around the same
|
||||
$\sca$, while \textit{O. rufipes} exhibits considerable variation between the
|
||||
three $f_i(t)$. The larger the variation in saturation points between $f_i(t)$,
|
||||
the stronger the curvature of the trajectory through feature space.
|
||||
|
||||
In the noisy case, $\muf$ is non-zero even for the smallest
|
||||
$\sca$~(Fig.\,\ref{fig:thresh-lp_species}c) because the addition of the noise
|
||||
component $\noc(t)$ to input $\adapt(t)$ drives $c_i(t)$ above $\thr$
|
||||
regardless of the song component $\soc(t)$. The starting value of $\muf$ is the
|
||||
same across all $f_i(t)$ and species by construction of the specific $\thr$. In
|
||||
consequence, the trajectories through feature space do not start at the origin
|
||||
but rather at approximately the same point along the
|
||||
diagonal~(Fig.\,\ref{fig:thresh-lp_species}e). For larger $\sca$, all $\muf$
|
||||
saturate at the same values as in the noiseless case, as expected from the
|
||||
previous analysis~(Fig.\,\ref{fig:thresh-lp_single}e). However, the
|
||||
trajectories now move a much shorter distance through feature space for a
|
||||
similar range of $\sca$ due to the lower SNR of $f_i(t)$ between noise regime
|
||||
and saturation regime, which increases the likelihood of trajectories crossing
|
||||
each other. Finally, the values of $\sca$ at which $\muf$ saturate for a given
|
||||
species are slightly higher in the noisy case, but the variation between
|
||||
$f_i(t)$ remains largely unchanged.
|
||||
|
||||
In summary, even a comparably small set of three features $f_i(t)$ can, in
|
||||
principle, represent different species-specific songs at distinct points in
|
||||
feature space, regardless of the presence of noise. However, this only holds
|
||||
for sufficiently large $\sca$ that allow $f_i(t)$ to reach a saturation regime.
|
||||
During the transient regime, the species-specific combination of $\muf$ can
|
||||
very well be the same for two or more different species at specific $\sca$,
|
||||
although this may be alleviated by the inclusion of additional $f_i(t)$.
|
||||
Overall, the results of this analysis suggest that $\thr$ should rather be
|
||||
choosen in favor of a higher SNR ($\thr$ just above pure-noise $c_i(t)$) than a
|
||||
lower saturation point ($\thr\to0$). First, because this reduces the density of
|
||||
trajectories through feature space, and second, because the capping of
|
||||
$\adapt(t)$ by the previous transformation pair likely renders the saturation
|
||||
point of $f_i(t)$ less relevant.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_thresh_lp_species.pdf}
|
||||
@@ -968,28 +1041,81 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
|
||||
\label{fig:thresh-lp_species}
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
% \caption{\textbf{Rectification and lowpass filtering improves SNR
|
||||
% but does not contribute to intensity invariance.}
|
||||
% Input $\raw(t)$ consists of song component $\soc(t)$ scaled by
|
||||
% $\sca$ with optional noise component $\noc(t)$ and is
|
||||
% successively transformed into tympanal signal $\filt(t)$ and
|
||||
% envelope $\env(t)$. Different line styles indicate different
|
||||
% cutoff frequencies $\fc$ of the lowpass filter extracting
|
||||
% $\env(t)$.
|
||||
% \textbf{Top}:~Example representations of $\filt(t)$ and
|
||||
% $\env(t)$ for different $\sca$.
|
||||
% \textbf{a}:~Noiseless case.
|
||||
% \textbf{b}:~Noisy case.
|
||||
% \textbf{Bottom}:~Intensity metrics over a range of $\sca$.
|
||||
% \textbf{c}:~Noiseless case: Standard deviations of $\filt(t)$
|
||||
% and $\env(t)$.
|
||||
% \textbf{d}:~Noisy case: Ratios of standard deviations of
|
||||
% $\filt(t)$ and $\env(t)$ to the respective reference standard
|
||||
% deviation for input $\raw(t)=\noc(t)$.
|
||||
% \textbf{e}:~Ratios of standard deviations of $\env(t)$ as in
|
||||
% \textbf{b} for different species (averaged over songs and
|
||||
% recordings, see appendix Fig.\,\ref{fig:app_rect-lp}).
|
||||
% }
|
||||
|
||||
\subsection{Intensity invariance along the full model pathway}
|
||||
|
||||
Through the previous analyses, we could establish two mechanisms of intensity
|
||||
invariance: Logarithmic compression and adaptation as well as thresholding and
|
||||
temporal averaging. While each transformation pair by itself can provide some
|
||||
level of invariance, certain results suggest that the first mechanism may
|
||||
actually limit or even nullify the effect of the second mechanism. In the
|
||||
following sections, we investigate the combined effect of both mechanisms along
|
||||
the full model pathway~(Fig.\,\ref{fig:pipeline_full}) and explore the
|
||||
consequences of disabling the first mechanism by skipping the logarithmic
|
||||
compression step~(Fig.\,\ref{fig:pipeline_short}).
|
||||
|
||||
\subsubsection{Including logarithmic compression}
|
||||
|
||||
For this analysis, input $\raw(t)$ --- including both song component $\soc(t)$
|
||||
and noise component $\noc(t)$ --- was rescaled and processed throughout all
|
||||
steps of the model pathway~(Fig.\,\ref{fig:pipeline_full}a) up to the feature
|
||||
set $f_i(t)$. As before, the standard deviation was used as intensity metric
|
||||
for each resulting representation except $b_i(t)$ and $f_i(t)$. For $f_i(t)$,
|
||||
the average feature value $\muf$ was used, while $b_i(t)$ was omitted from the
|
||||
analysis. Plotting each intensity metric over
|
||||
$\sca$~(Fig.\,\ref{fig:pipeline_full}b) reinforces many of the previous
|
||||
observations. For ease of visualization, the kernel-specific curves for
|
||||
$c_i(t)$ and $f_i(t)$ were summarized by their median. Representations prior to
|
||||
logarithmic compression --- $\filt(t)$ and $\env(t)$ --- show a linear increase
|
||||
of the intensity metric for larger $\sca$ on a double-logarithmic scale.
|
||||
Representations after logarithmic compression --- $\db(t)$, $\adapt(t)$, and
|
||||
$c_i(t)$ --- are the first to reach a saturation regime and do so at
|
||||
approximately the same $\sca$ because they are separated only by linear
|
||||
transformations. Feature set $f_i(t)$ reaches a saturation regime, as well. But
|
||||
contrary to previous results, the saturation point of $f_i(t)$ appears below
|
||||
that of $c_i(t)$, which suggests that the second mechanism of thresholding and
|
||||
temporal averaging can indeed improve intensity invariance beyond the first
|
||||
mechanism of logarithmic compression and adaptation. The difference in
|
||||
saturation points is best illustrated based on the ratio of each intensity
|
||||
metric to the respective pure-noise reference
|
||||
value~(Fig.\,\ref{fig:pipeline_full}d). However, compressing $f_i(t)$ into a
|
||||
median across $k_i(t)$ conceils many kernel-specific details. It is therefore
|
||||
necessary to consider the development of each $f_i(t)$ over $\sca$
|
||||
separately~(Fig.\,\ref{fig:pipeline_full}c). Indeed, all 40 $f_i(t)$ in the set
|
||||
reach a saturation regime for sufficiently large $\sca$. The saturated $\muf$
|
||||
are distributed over a range of values --- which is the prerequisite for
|
||||
forming species-specific combinations --- but are limited to a rather small
|
||||
subset of possible values between 0 and 1. Based on previous
|
||||
results~(Fig.\,\ref{fig:thresh-lp_single}f), this is likely due to the capping
|
||||
of $\adapt(t)$ that prevents $f_i(t)$ from reaching its intrinsic saturation
|
||||
value; but this cannot be confirmed until the following
|
||||
analysis~(Fig.\,\ref{fig:pipeline_short}). Looking at the kernel-specific SNR
|
||||
values of $c_i(t)$ over $\sca$~(Fig.\,\ref{fig:pipeline_full}e) and $f_i(t)$
|
||||
over $\sca$~(Fig.\,\ref{fig:pipeline_full}f) reveals a high degree of variation
|
||||
between different $k_i(t)$. Certain $f_i(t)$ achieve much higher SNR values
|
||||
than $c_i(t)$ for the same $\sca$ due to the former's capacity for arbitrarily
|
||||
low pure-noise responses ($\muf\to0$) and hence arbitrarily high SNR values.
|
||||
Finally, the question remains whether the suspected improvement of intensity
|
||||
invariance by $f_i(t)$ beyond $c_i(t)$ holds at the level of individual
|
||||
$k_i(t)$. The single saturation points based on the median across $k_i(t)$ for
|
||||
$c_i(t)$ and $f_i(t)$ are expanded into distributions of kernel-specific
|
||||
saturation points~(Fig.\,\ref{fig:pipeline_full}g). For $c_i(t)$, the
|
||||
distribution is rather narrow and corresponds well to the single saturation
|
||||
point based on the median. For $f_i(t)$, however, the distribution is much
|
||||
broader and is not centered around the single saturation point based on the
|
||||
median but rather shifted towards lower $\sca$. Care must be taken when
|
||||
interpreting the height of either distribution due to the logarithmic scaling
|
||||
of the underlying $\sca$ axis. Nevertheless, the overall pattern suggests that
|
||||
specific $f_i(t)$ can reach a saturation regime at lower $\sca$ than their
|
||||
$c_i(t)$ counterparts. Therefore, the effect of thresholding and temporal
|
||||
averaging on intensity invariance is not necessarily nullified by the previous
|
||||
logarithmic compression and adaptation, which means that both mechanisms can,
|
||||
in principle, work together towards an intensity-invariant song representation.
|
||||
% Or does one simply overwrite the other? Can there even be a higher intensity
|
||||
% invariance based on the sum of both effects? Or does one simply kick in for
|
||||
% lower scales than the other and thus dictates the overall intensity
|
||||
% invariance? Whatever, discussion material.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_full_Omocestus_rufipes.pdf}
|
||||
@@ -1028,6 +1154,50 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\subsubsection{Excluding logarithmic compression}
|
||||
|
||||
The previous analysis was repeated in exactly the same way as before, except
|
||||
that the logarithmic compression of $\env(t)$, Eq.\,\ref{eq:log}, was skipped
|
||||
in order to disable the first mechanism of intensity invariance. Consequently,
|
||||
$\adapt(t)$ is merely a highpass filtered version of $\env(t)$; and $\db(t)$ is
|
||||
missing entirely~(Fig.\,\ref{fig:pipeline_short}a). As expected, all
|
||||
representations prior to the thresholding nonlinearity $\nl$ --- $\filt(t)$,
|
||||
$\env(t)$, $\adapt(t)$, and $c_i(t)$ --- show a linear increase of the
|
||||
intensity metric for larger $\sca$, while $f_i(t)$ is the only representation
|
||||
to reach a saturation regime~(Fig.\,\ref{fig:pipeline_short}bd). The
|
||||
saturated $\muf$ are distributed over a much broader range of values than in
|
||||
the previous analysis~(Fig.\,\ref{fig:pipeline_short}c). Intriguingly, the
|
||||
distribution of $\muf$ is symmetric around a value of 0.5. This is relevant
|
||||
because every kernel $k^+(t)$ in the underlying kernel set has a counterpart of
|
||||
opposite sign that is otherwise identical, so that $k^+(t)=-k^-(t)$. The
|
||||
responses of $k^+(t)$ and $k^-(t)$ to the same input $\adapt(t)$ are also
|
||||
inverted because convolution is a linear operation: $c^+(t)=-c^-(t)$. The
|
||||
distributions of $c^+(t)$ and $c^-(t)$ are hence inverted to each other, as
|
||||
well: $p(c^+)=p(-c^-)$. Based on Eq.\,\ref{eq:feat_prop}, transforming $c^+(t)$
|
||||
and $c^-(t)$ further using the same $\Theta$ thus results in two complementary
|
||||
features $f^+(t)$ and $f^-(t)$ that are symmetric around 0.5, so that
|
||||
$f^+(t)=1-f^-(t)$. Of course, this symmetry throughout the feature
|
||||
representation goes hand in hand with a substantial degree of redundancy and is
|
||||
hardly expected to be present in the actual grasshopper auditory system. But
|
||||
the fact that the saturated $\muf$ are distributed symmetrically around 0.5
|
||||
provides concrete evidence that each $f_i(t)$ is able to reach its intrinsic
|
||||
saturation value in the absence of logarithmic
|
||||
compression~(Fig.\,\ref{fig:pipeline_short}c), which is otherwise prevented by
|
||||
the capping of $\adapt(t)$, as seen during previous
|
||||
analyses~(Fig.\,\ref{fig:thresh-lp_single}f and
|
||||
Fig.\,\ref{fig:pipeline_full}c). Otherwise, there appear to be no major
|
||||
differences in the development of $f_i(t)$ over $\sca$ compared to the previous
|
||||
analysis, neither on the kernel-specific SNR
|
||||
values~(Fig.\,\ref{fig:pipeline_short}e) nor on the distribution of
|
||||
kernel-specific saturation points~(Fig.\,\ref{fig:pipeline_short}f). Overall,
|
||||
the most substantial consequence of skipping the logarithmic compression is
|
||||
that it allows $f_i(t)$ to reach its intrinsic saturation value. If this
|
||||
results in a wider range of $\muf$ across the feature set, it should be
|
||||
benefitial for forming species-specific combinations. However, this depends on
|
||||
multiple different factors such as the choice of $k_i(t)$ and $\thr$ as well as
|
||||
the structure and distribution of the specific song and is hence not
|
||||
guaranteed simply by disabling logarithmic compression.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_short_Omocestus_rufipes.pdf}
|
||||
@@ -1065,6 +1235,61 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\subsubsection{Field data}
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_field.pdf}
|
||||
\caption{\textbf{Step-wise emergence of intensity invariant song
|
||||
representation along the model pathway.}
|
||||
}
|
||||
\label{fig:pipeline_field}
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\subsection{Interspecific and intraspecific feature variability}
|
||||
|
||||
In the final analysis of the current study, we investigated the variability of
|
||||
songs in the feature representation between different species and within the
|
||||
same species~(Fig.\,\ref{fig:feat_cross_species}). Naturally, a feature
|
||||
representation that is both consistent across different songs of the same
|
||||
species and sufficiently different between songs of different species is a
|
||||
fundamental prerequisite for species-specific song recognition. The data used
|
||||
in this analysis corresponds to the saturated $\muf$ of each $f_i(t)$ from the
|
||||
previous analysis of the full model pathway~(Fig.\,\ref{fig:pipeline_full}c),
|
||||
using different songs of \textit{O. rufipes} for the intraspecific comparisons
|
||||
and single songs from a number of species for the interspecific comparisons
|
||||
(also shown in Fig.\,\ref{fig:thresh-lp_species}a). Accordingly, each song is
|
||||
represented by 40 values of $\muf$ based on the same set of $f_i(t)$. For each
|
||||
comparison, $\muf$ from one song was plotted against $\muf$ from the other
|
||||
song, so that each dot within a subplot corresponds to a single feature
|
||||
$f_i(t)$. For the intraspecific
|
||||
comparisons~(Fig.\,\ref{fig:feat_cross_species}, upper triangular), the pairs
|
||||
of $\muf$ are distributed closely around the diagonal, with a minimum
|
||||
correlation coefficient of $\rho=0.85$, a maximum of $\rho=0.99$, and a median
|
||||
of $\rho=0.92$. A given $f_i(t)$ thus tends to have a similar $\muf$ across
|
||||
different songs of the same species. In contrast, the pairs of $\muf$ for the
|
||||
interspecific comparisons~(Fig.\,\ref{fig:feat_cross_species}, lower
|
||||
triangular) are distributed in a variety of different ways, most in broader
|
||||
clouds (e.g. \textit{C. biguttulus} vs. \textit{C. mollis}) but some more
|
||||
narrowly around the diagonal (e.g. \textit{P. parallelus} vs. \textit{C.
|
||||
dispar}). The correlation coefficients $\rho$ vary widely between different
|
||||
interspecific comparisons, with a minimum of $\rho=-0.1$, a maximum of
|
||||
$\rho=0.92$, and a median of $\rho=0.53$. A given $f_i(t)$ therefore tends to
|
||||
have a less similar $\muf$ across different species than within the same
|
||||
species, although certain exeptions exist~(Fig.\,\ref{fig:feat_cross_species},
|
||||
lower right). Accordingly, the feature representation that is generated by the
|
||||
model pathway is, in principle, suitable for the distinction between different
|
||||
species-specific songs. However, even the songs of the same species are subject
|
||||
to considerable variability in various aspects and depending on a multitude of
|
||||
external and internal factors, which cannot be fully captured based on a
|
||||
limited number of songs. The results of the current analysis are hence to be
|
||||
treated as a proof-of-concept that paves the way towards more comprehensive
|
||||
investigations on the details of song representation in feature space,
|
||||
including the effects of different parameters of the model pathway as well as
|
||||
the inclusion of additional songs and species to reflect the complexity of
|
||||
natural song variation.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_features_cross_species.pdf}
|
||||
@@ -1086,7 +1311,7 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
|
||||
\textbf{Upper triangular}:~Intraspecific comparisons
|
||||
between different songs of a single species (\textit{O.
|
||||
rufipes}).
|
||||
\textbf{Lower left}:~Distribution of correlation
|
||||
\textbf{Lower right}:~Distribution of correlation
|
||||
coefficients $\rho$ for each interspecific and
|
||||
intraspecific comparison. Dots indicate single $\rho$
|
||||
values.
|
||||
@@ -1095,16 +1320,6 @@ around 10\,dB by the previous transformation pair~(Fig.\,\ref{fig:log-hp}cd)
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_field.pdf}
|
||||
\caption{\textbf{Step-wise emergence of intensity invariant song
|
||||
representation along the model pathway.}
|
||||
}
|
||||
\label{fig:pipeline_field}
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\section{Conclusions \& outlook}
|
||||
|
||||
\textbf{Song recognition pathway: Grasshopper vs. model:}\\
|
||||
|
||||
Reference in New Issue
Block a user