Wrote results rect-lp and log-hp :)
Finished some more figure captions.
This commit is contained in:
336
main.tex
336
main.tex
@@ -33,7 +33,7 @@
|
||||
%\bibstyle
|
||||
%\citation
|
||||
|
||||
\title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system}
|
||||
\title{Emergent intensity invariance vs. signal-to-noise ratio at three consecutive processing stages along the grasshopper song recognition pathway}
|
||||
\author{Jona Hartling, Jan Benda}
|
||||
\date{}
|
||||
|
||||
@@ -403,7 +403,7 @@ pathway, logarithmic compression is achieved by conversion to decibel scale
|
||||
\db(t)\,=\,20\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,1
|
||||
\label{eq:log}
|
||||
\end{equation}
|
||||
relative to the maximum intensity $\dbref$ of the signal envelope $\env(t)$.
|
||||
relative to the common reference intensity $\dbref$.
|
||||
Both the receptor neurons~(\bcite{romer1976informationsverarbeitung};
|
||||
\bcite{gollisch2004input}; \bcite{fisch2012channel}) and, on a larger scale,
|
||||
the subsequent local interneurons~(\bcite{hildebrandt2009origin};
|
||||
@@ -555,7 +555,7 @@ can be read out by a simple linear classifier.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\section{Two mechanisms driving the emergence of intensity-invariant song representation}
|
||||
\section{Mechanisms driving the emergence of\\intensity-invariant song representation}
|
||||
|
||||
% Still missing the SNR analysis. Should be able to write around it for now.
|
||||
The robustness of song recognition is tied to the degree of intensity
|
||||
@@ -573,6 +573,54 @@ specific operations involved, as outlined in the following sections.
|
||||
|
||||
\subsection{Full-wave rectification \& lowpass filtering}
|
||||
|
||||
The first nonlinear transformation along the model pathway is the full-wave
|
||||
rectification of the tympanal signal $\filt(t)$ during the extraction of the
|
||||
signal envelope (Eq.\,\ref{eq:env}). Rectification transforms the distribution
|
||||
of $\filt(t)$ from an approximately zero-centered distribution with both
|
||||
positive and negative values into a strictly non-negative distribution. Signal
|
||||
envelope $\env(t)$ is then obtained by lowpass filtering the rectified
|
||||
$\filt(t)$. The effects of this transformation pair on SNR and potential
|
||||
intensity invariance were analyzed by rescaling and processing the input signal
|
||||
$\raw(t)$ and comparing standard deviations between the resulting $\filt(t)$
|
||||
and $\env(t)$, once for the noiseless case~(Fig.\,\ref{fig:rect-lp}a) and once
|
||||
for the noisy case~(Fig.\,\ref{fig:rect-lp}b). In addition, the cutoff
|
||||
frequency $\fc$ of the lowpass filter was varied to investigate the influence
|
||||
of different filter bandwidths. In the noiseless case, the standard deviations
|
||||
of $\filt(t)$ and $\env(t)$ are each reduced compared to the input $\raw(t)$ by
|
||||
a multiplicative factor. These factors are constant across all $\sca$, which
|
||||
results in a downward shift of the respective curve on a double-logarithmic
|
||||
scale, away from the diagonal~(Fig.\,\ref{fig:rect-lp}c). For $\filt(t)$, the
|
||||
reduction is a consequence of the bandpass filtering~(Eq.\,\ref{eq:bandpass})
|
||||
of $\raw(t)$. For $\env(t)$, the standard deviation is further reduced compared
|
||||
to $\filt(t)$. Rectification contributes much less to this reduction than
|
||||
lowpass filtering. The degree of reduction by lowpass filtering depends on the
|
||||
cutoff frequency $\fc$, with lower $\fc$ (narrow bandwidth) resulting in a
|
||||
stronger reduction. In the noisy case, the standard deviations of $\filt(t)$
|
||||
and $\env(t)$ can be related to the respective pure-noise reference standard
|
||||
deviation~(Fig.\,\ref{fig:rect-lp}d). This causes each curve to start with a
|
||||
constant regime of SNR values near 1 for smaller $\sca$, which reflects the
|
||||
dominance of the noise component $\noc(t)$ over the song component $\soc(t)$ in
|
||||
the input $\raw(t)$. For larger $\sca$, all curves transition into a regime of
|
||||
linearly increasing SNR on a double-logarithmic scale. For $\filt(t)$, the
|
||||
linear part of the curve deviates only slightly from the diagonal. For
|
||||
$\env(t)$, however, the transition occurs at lower $\sca$ compared to
|
||||
$\filt(t)$, and the linear part of the curve is shifted leftward away from the
|
||||
diagonal, which means that higher SNR values are achieved for the same $\sca$.
|
||||
This effect is more pronounced for lower $\fc$ of the lowpass filter and is
|
||||
presumably caused by the attenuation of high-frequency components in the
|
||||
signal, which are more prominent in the noise component $\noc(t)$ than in the
|
||||
song component $\soc(t)$. The effect also appears relatively consistent across
|
||||
different species, although small variations based on different song structures
|
||||
and distributions exist~(Fig.\,\ref{fig:rect-lp}e). In summary, the standard
|
||||
deviation of $\env(t)$ has never been observed to transition into a saturation
|
||||
regime for larger $\sca$ but rather continues to increase proportionally to
|
||||
$\sca$ for all tested $\fc$, in both the noiseless and the noisy case and
|
||||
across different species. Consequently, the combination of rectification and
|
||||
lowpass filtering does not contribute to intensity invariance. However, this
|
||||
transformation pair does improve the SNR of $\env(t)$ relative to $\filt(t)$
|
||||
and thus provides subsequent processing stages with a more robust input
|
||||
representation and higher input SNR.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_rect_lp.pdf}
|
||||
@@ -605,73 +653,113 @@ specific operations involved, as outlined in the following sections.
|
||||
|
||||
\subsection{Logarithmic compression \& spike-frequency adaptation}
|
||||
|
||||
The first notable emergence of intensity invariance along the model pathway
|
||||
occurs during the transformation of the signal envelope $\env(t)$ into the
|
||||
logarithmically scaled envelope $\db(t)$ and then into the intensity-adapted
|
||||
envelope $\adapt(t)$. In order to disentangle the interplay of logarithmic
|
||||
compression and adaptation, $\env(t)$ can be rewritten as a synthetic mixture
|
||||
The second nonlinear transformation along the model pathway is the logarithmic
|
||||
compression of the signal envelope $\env(t)$ into $\db(t)$, Eq.\,\ref{eq:log},
|
||||
which is then followed by the highpass filtering of $\db(t)$,
|
||||
Eq.\,\ref{eq:highpass}, to obtain the intensity-adapted envelope $\adapt(t)$.
|
||||
The interplay of this transformation pair was analyzed by rescaling and
|
||||
processing the input signal $\filt(t)$ and comparing standard deviations
|
||||
between the resulting $\env(t)$, $\db(t)$, and $\adapt(t)$. It is necessary to
|
||||
use $\filt(t)$ as input for this analysis instead of $\env(t)$, because
|
||||
$\env(t)$ results from a nonlinear transformation and hence cannot be
|
||||
synthesized as an additive mixture of song component $\soc(t)$ and noise
|
||||
component $\noc(t)$. % <-- Sentence may be methods section material.
|
||||
However, it is much easier to conceive a mathematical description of the
|
||||
effects of logarithmic compression and adaptation if $\env(t)$ itself is
|
||||
assumed to be composed of $\soc(t)$ and $\noc(t)$. In the noiseless
|
||||
case~(Fig.\,\ref{fig:log-hp}a), $\env(t)$ takes the form of
|
||||
\begin{equation}
|
||||
\env(t)\,=\,\sca\,\cdot\,\soc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
|
||||
\label{eq:toy_env_pure}
|
||||
\end{equation}
|
||||
The standard deviation of $\env(t)$ increases linearly with $\sca$ on a
|
||||
double-logarithmic scale and is slightly reduced~(Fig.\,\ref{fig:log-hp}c)
|
||||
compared to the input $\filt(t)$, which is consistent with the results of the
|
||||
previous analysis~(Fig.\,\ref{fig:rect-lp}c). By conversion of $\env(t)$ to
|
||||
decibel scale, $\sca$ turns from a multiplicative scale in linear space into an
|
||||
additive term, or offset, in logarithmic space:
|
||||
\begin{equation}
|
||||
\db(t)\,=\,20\,\cdot\,\dec \left[\,\sca\,\cdot\,s(t)\,\right]\,=\,20\,\cdot\,\left[\dec \sca\,+\,\dec s(t)\right], \qquad \sca\,>\,0
|
||||
\label{eq:toy_log_pure}
|
||||
\end{equation}
|
||||
The highpass filtering of $\db(t)$ can be approximated as a subtraction of the
|
||||
local signal offset within a suitable time interval $0 \ll \thp <
|
||||
\frac{1}{\fc}$:
|
||||
\begin{equation}
|
||||
\begin{split}
|
||||
\adapt(t)\,\approx\,\db(t)\,-\,20\,\cdot\,\dec \sca\,=\,20\,\cdot\,\dec s(t)
|
||||
\end{split}
|
||||
\label{eq:toy_highpass_pure}
|
||||
\end{equation}
|
||||
This eliminates $\sca$ from $\adapt(t)$ and thus renders it perfectly
|
||||
intensity-invariant, with a constant standard deviation of around 10\,dB across
|
||||
all $\sca>0$~(Fig.\,\ref{fig:log-hp}c). In contrast, in the noisy
|
||||
case~(Fig.\,\ref{fig:log-hp}b), $\env(t)$ takes the form of
|
||||
\begin{equation}
|
||||
\env(t)\,=\,\sca\,\cdot\,\soc(t)\,+\,\noc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
|
||||
\label{eq:toy_env}
|
||||
\label{eq:toy_env_noise}
|
||||
\end{equation}
|
||||
of a song component $\soc(t)$ with variable multiplicative scale $\sca\geq0$
|
||||
and a fixed-scale noise component $\noc(t)$. Both $\soc(t)$ and $\noc(t)$ are
|
||||
assumed to have unit variance. By conversion of $\env(t)$ to decibel
|
||||
scale~(Eq.\,\ref{eq:log}), $\sca$ turns from a multiplicative scale in linear
|
||||
space into an additive term, or offset, in logarithmic space
|
||||
Similar to the previous analysis~(Fig.\,\ref{fig:rect-lp}d), the ratio of the
|
||||
standard deviation of $\env(t)$ to its pure-noise reference standard deviation
|
||||
on a double-logarithmic scale follows a constant regime for small $\sca$ and a
|
||||
linearly increasing regime for larger $\sca$~(Fig.\,\ref{fig:log-hp}d). Decibel
|
||||
conversion of $\env(t)$
|
||||
% \begin{equation}
|
||||
% \begin{split}
|
||||
% \db(t)\,&=\,\dec \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
|
||||
% &=\,\dec \frac{\alpha}{\dbref}\,+\,\dec \left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right], \qquad \sca\,>\,0
|
||||
% \db(t)\,&=\,20\,\cdot\,\dec \left[\,\sca\,\cdot\,s(t)\,+\,\eta(t)\,\right]\\
|
||||
% &=\,20\,\cdot\,\left(\dec \sca\,+\,\dec \left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]\right), \qquad \sca\,>\,0
|
||||
% \end{split}
|
||||
% \label{eq:toy_log}
|
||||
% \label{eq:toy_log_noise}
|
||||
% \end{equation}
|
||||
\begin{equation}
|
||||
\begin{split}
|
||||
\db(t)\,&=\,20\,\cdot\,\dec \left[\,\sca\,\cdot\,s(t)\,+\,\eta(t)\,\right]\\
|
||||
&=\,20\,\cdot\,\left(\dec \sca\,+\,\dec \left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]\right), \qquad \sca\,>\,0
|
||||
\end{split}
|
||||
\label{eq:toy_log}
|
||||
\db(t)\,=\,20\,\cdot\,\left(\dec \sca\,+\,\dec \left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]\right), \qquad \sca\,>\,0
|
||||
\label{eq:toy_log_noise}
|
||||
\end{equation}
|
||||
allows for the separation of $\sca$ from $\soc(t)$ but introduces a scaling of
|
||||
$\noc(t)$ by the inverse of $\sca$, which remains present even after the offset
|
||||
subtraction:
|
||||
\begin{equation}
|
||||
\begin{split}
|
||||
\adapt(t)\,\approx\,20\,\cdot\,\dec\left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]
|
||||
\end{split}
|
||||
\label{eq:toy_highpass_noise}
|
||||
\end{equation}
|
||||
which allows for its separation from $\soc(t)$ but introduces a scaling of
|
||||
$\noc(t)$ by the inverse of $\sca$. The subsequent
|
||||
highpass filtering~(Eq.\,\ref{eq:highpass}) of $\db(t)$ can then be
|
||||
approximated as a subtraction of the local offset within a suitable time
|
||||
interval $0 \ll \thp < \frac{1}{\fc}$:
|
||||
% \begin{equation}
|
||||
% \begin{split}
|
||||
% \adapt(t)\,\approx\,\db(t)\,-\,\dec \frac{\sca}{\dbref}\,=\,\dec\left[s(t)\,+\,\frac{\eta(t)}{\sca}\right], \qquad \sca\,>\,0
|
||||
% \adapt(t)\,\approx\,\db(t)\,-\,20\,\cdot\,\dec \sca\,=\,20\,\cdot\,\dec\left[s(t)\,+\,\frac{\eta(t)}{\sca}\right]
|
||||
% \end{split}
|
||||
% \label{eq:toy_highpass}
|
||||
% \label{eq:toy_highpass_noise}
|
||||
% \end{equation}
|
||||
\begin{equation}
|
||||
\begin{split}
|
||||
\adapt(t)\,\approx\,\db(t)\,-\,20\,\cdot\,\dec \sca\,=\,20\,\cdot\,\dec\left[s(t)\,+\,\frac{\eta(t)}{\sca}\right], \qquad \sca\,>\,0
|
||||
\end{split}
|
||||
\label{eq:toy_highpass}
|
||||
\end{equation}
|
||||
This means that $\sca$ cannot be entirely eliminated from $\adapt(t)$, only
|
||||
redistributed between $\soc(t)$ and $\noc(t)$. In consequence, if $\sca$ is
|
||||
sufficiently large ($\sca\gg1$), $\noc(t)$ is attenuated to the point of being
|
||||
negligible, so that $\adapt(t)$ represents $\soc(t)$ in a scale-free manner. If
|
||||
$\soc(t)$ and $\noc(t)$ are at similar scales ($\sca\approx1$), $\adapt(t)$
|
||||
largely resembles $\db(t)$. However, if $\sca$ is sufficiently small
|
||||
($\sca\ll1$), $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation.
|
||||
Therefore, the effective intensity invariance of $\adapt(t)$ relative to
|
||||
$\env(t)$ is limited by the initial scaling of $\soc(t)$ relative to $\noc(t)$;
|
||||
that is, the signal-to-noise ratio (SNR) of $\env(t)$ with ($\sca>0$) and
|
||||
without ($\sca=0$) song component $\soc(t)$
|
||||
\begin{equation}
|
||||
\text{SNR}(\sca)\,=\,\frac{\xvar}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1, \qquad \svar\,=\,\nvar\,=\,1
|
||||
\label{eq:toy_snr}
|
||||
\end{equation}
|
||||
which depends quadratically on $\sca$ if $\soc(t)\perp\noc(t)$. Overall, the
|
||||
combination of logarithmic compression and adaptation allows for the
|
||||
equalization of different sufficiently large song scales, which is essential
|
||||
for intensity-invariant song representation. However, this mechanism is unable
|
||||
to recover songs that have already sunken below the noise floor, which
|
||||
emphasizes the importance of a sufficiently high SNR at the intial reception of
|
||||
the signal for reliable song recognition.
|
||||
This means that, in the noisy case, $\sca$ cannot be entirely eliminated from
|
||||
$\adapt(t)$, only redistributed between $\soc(t)$ and $\noc(t)$. If $\sca$ is
|
||||
sufficiently large ($\sca\gg1$, saturation regime), $\noc(t)$ is attenuated to
|
||||
the point of being negligible, so that $\adapt(t)$ is a scale-free
|
||||
representation of $\soc(t)$. If $\sca$ and $\noc(t)$ are at similar scales
|
||||
($\sca\approx1$, transient regime), $\adapt(t)$ largely resembles $\db(t)$.
|
||||
Finally, if $\sca$ is sufficiently small ($0<\sca\ll1$, noise regime),
|
||||
$\noc(t)$ masks $\soc(t)$ even after the intensity adaptation. Accordingly, the
|
||||
effective intensity invariance of $\adapt(t)$ through logarithmic compression
|
||||
and adaptation is limited by the SNR of $\env(t)$: Songs that have already
|
||||
sunken into the noise floor at the level of $\env(t)$ cannot be recovered by
|
||||
subsequent processing steps, which emphasizes the importance of the SNR
|
||||
improvement by rectification and lowpass filtering during the previous
|
||||
processing step~(Fig.\,\ref{fig:rect-lp}d). The general pattern of noise
|
||||
regime, transient regime, and saturation regime remains consistent across
|
||||
different species~(Fig.\,\ref{fig:log-hp}e). However, the specific value of
|
||||
$\sca$ at which the saturation regime is reached (see appendix
|
||||
Fig.\,\ref{fig:app_log-hp_saturation}) and the maximum SNR value of $\adapt(t)$
|
||||
within the saturation regime vary considerably between and within species. For
|
||||
example, \textit{C. biguttulus} and \textit{C. mollis} display a noticably
|
||||
lower maximum SNR of $\adapt(t)$ compared to other species. These differences
|
||||
are not to be underestimated, since the SNR of $\adapt(t)$ within the
|
||||
saturation regime determines the maximum input SNR for subsequent processing
|
||||
steps. In other words, the fact that $\adapt(t)$ eventually reaches a
|
||||
saturation regime is, of course, desirable in the context of intensity
|
||||
invariance, but it also means to pass up on the higher SNR values that are
|
||||
achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
|
||||
Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
|
||||
--- and the consequences it has further downstream along the pathway --- are
|
||||
adressed in the following sections.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
@@ -744,30 +832,6 @@ the signal for reliable song recognition.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
|
||||
% \caption{\textbf{Rectification and lowpass filtering improves SNR
|
||||
% but does not contribute to intensity invariance.}
|
||||
% Input $\raw(t)$ consists of song component $\soc(t)$ scaled by
|
||||
% $\sca$ with optional noise component $\noc(t)$ and is
|
||||
% successively transformed into tympanal signal $\filt(t)$ and
|
||||
% envelope $\env(t)$. Different line styles indicate different
|
||||
% cutoff frequencies $\fc$ of the lowpass filter extracting
|
||||
% $\env(t)$.
|
||||
% \textbf{Top}:~Example representations of $\filt(t)$ and
|
||||
% $\env(t)$ for different $\sca$.
|
||||
% \textbf{a}:~Noiseless case.
|
||||
% \textbf{b}:~Noisy case.
|
||||
% \textbf{Bottom}:~Intensity metrics over a range of $\sca$.
|
||||
% \textbf{c}:~Noiseless case: Standard deviations of $\filt(t)$
|
||||
% and $\env(t)$.
|
||||
% \textbf{d}:~Noisy case: Ratios of standard deviations of
|
||||
% $\filt(t)$ and $\env(t)$ to the respective reference standard
|
||||
% deviation for input $\raw(t)=\noc(t)$.
|
||||
% \textbf{e}:~Ratios of standard deviations of $\env(t)$ as in
|
||||
% \textbf{b} for different species (averaged over songs and
|
||||
% recordings, see appendix Fig.\,\ref{fig:app_rect-lp}).
|
||||
% }
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_thresh_lp_species.pdf}
|
||||
@@ -796,17 +860,67 @@ the signal for reliable song recognition.
|
||||
curve span of the norm across all three $\mu_{f_i}$ per
|
||||
species.
|
||||
\textbf{d}:~Noiseless case.
|
||||
\textbf{e}:~Noisy case. Shaded areas
|
||||
\textbf{e}:~Noisy case. Shaded areas indicate the average
|
||||
minimum $\mu_{f_i}$ across all species-specific trajectories.
|
||||
}
|
||||
\label{fig:thresh-lp_species}
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
% \caption{\textbf{Rectification and lowpass filtering improves SNR
|
||||
% but does not contribute to intensity invariance.}
|
||||
% Input $\raw(t)$ consists of song component $\soc(t)$ scaled by
|
||||
% $\sca$ with optional noise component $\noc(t)$ and is
|
||||
% successively transformed into tympanal signal $\filt(t)$ and
|
||||
% envelope $\env(t)$. Different line styles indicate different
|
||||
% cutoff frequencies $\fc$ of the lowpass filter extracting
|
||||
% $\env(t)$.
|
||||
% \textbf{Top}:~Example representations of $\filt(t)$ and
|
||||
% $\env(t)$ for different $\sca$.
|
||||
% \textbf{a}:~Noiseless case.
|
||||
% \textbf{b}:~Noisy case.
|
||||
% \textbf{Bottom}:~Intensity metrics over a range of $\sca$.
|
||||
% \textbf{c}:~Noiseless case: Standard deviations of $\filt(t)$
|
||||
% and $\env(t)$.
|
||||
% \textbf{d}:~Noisy case: Ratios of standard deviations of
|
||||
% $\filt(t)$ and $\env(t)$ to the respective reference standard
|
||||
% deviation for input $\raw(t)=\noc(t)$.
|
||||
% \textbf{e}:~Ratios of standard deviations of $\env(t)$ as in
|
||||
% \textbf{b} for different species (averaged over songs and
|
||||
% recordings, see appendix Fig.\,\ref{fig:app_rect-lp}).
|
||||
% }
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_full_Omocestus_rufipes.pdf}
|
||||
\caption{\textbf{Step-wise emergence of intensity invariant song
|
||||
representation along the model pathway.}
|
||||
\caption{\textbf{Step-wise emergence of intensity-invariant song
|
||||
representation along the full model pathway.}
|
||||
Input $\raw(t)$ consists of song component $\soc(t)$
|
||||
scaled by $\sca$ with added noise component $\noc(t)$ and
|
||||
is processed up to the feature set $f_i(t)$. Different
|
||||
color shades indicate different types of Gabor kernels
|
||||
with specific lobe number $\kn$ and either $+$ or $-$
|
||||
sign, sorted (dark to light) first by increasing $\kn$ and
|
||||
then by sign~($1\,\leq\,\kn\,\leq\,4$; first $+$, then $-$
|
||||
for each $\kn$; five kernel widths $\kw$ of 1, 2, 4, 8,
|
||||
and $16\,$ms per type; 8 types, 40 kernels in total).
|
||||
\textbf{a}:~Example representations of $\filt(t)$,
|
||||
$\env(t)$, $\db(t)$, $\adapt(t)$, $c_i(t)$, and $f_i(t)$
|
||||
for different $\sca$.
|
||||
\textbf{b}:~Intensity metrics over $\sca$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernels is shown. Dots
|
||||
indicate $95\,\%$ curve span for $\db(t)$, $\adapt(t)$,
|
||||
$c_i(t)$, and $f_i(t)$.
|
||||
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
|
||||
$f_i(t)$ over $\sca$.
|
||||
\textbf{d}:~Ratios of intensity metrics to the respective
|
||||
reference value for input $\raw(t)=\noc(t)$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernel-specific ratios is
|
||||
shown.
|
||||
\textbf{e}:~Ratios of standard deviation $\sigma_{c_i}$ of
|
||||
each $c_i(t)$.
|
||||
\textbf{f}:~Ratios of $\mu_{f_i}$.
|
||||
\textbf{g}:~Distributions of kernel-specific $\sca$ that
|
||||
correspond to $95\,\%$ curve span for $c_i(t)$ and
|
||||
$f_i(t)$. Dots indicate the values from \textbf{b}.
|
||||
}
|
||||
\label{fig:pipeline_full}
|
||||
\end{figure}
|
||||
@@ -816,7 +930,34 @@ the signal for reliable song recognition.
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_short_Omocestus_rufipes.pdf}
|
||||
\caption{\textbf{Step-wise emergence of intensity invariant song
|
||||
representation along the model pathway.}
|
||||
representation along the model pathway without logarithmic
|
||||
compression.}
|
||||
Input $\raw(t)$ consists of song component $\soc(t)$
|
||||
scaled by $\sca$ with added noise component $\noc(t)$ and
|
||||
is processed up to the feature set $f_i(t)$, skipping
|
||||
$\db(t)$. Different color shades indicate different types
|
||||
of Gabor kernels with specific lobe number $\kn$ and
|
||||
either $+$ or $-$ sign, sorted (dark to light) first by
|
||||
increasing $\kn$ and then by
|
||||
sign~($1\,\leq\,\kn\,\leq\,4$; first $+$, then $-$ for
|
||||
each $\kn$; five kernel widths $\kw$ of 1, 2, 4, 8, and
|
||||
$16\,$ms per type; 8 types, 40 kernels in total).
|
||||
\textbf{a}:~Example representations of $\filt(t)$,
|
||||
$\env(t)$, $\adapt(t)$, $c_i(t)$, and $f_i(t)$ for
|
||||
different $\sca$.
|
||||
\textbf{b}:~Intensity metrics over $\sca$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernels is shown. Dots
|
||||
indicate $95\,\%$ curve span for $f_i(t)$.
|
||||
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
|
||||
$f_i(t)$ over $\sca$.
|
||||
\textbf{d}:~Ratios of intensity metrics to the respective
|
||||
reference value for input $\raw(t)=\noc(t)$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernel-specific ratios is
|
||||
shown.
|
||||
\textbf{e}:~Ratios of $\mu_{f_i}$.
|
||||
\textbf{f}:~Distribution of kernel-specific $\sca$ that
|
||||
correspond to $95\,\%$ curve span for $f_i(t)$. Dots
|
||||
indicate the value from \textbf{b}.
|
||||
}
|
||||
\label{fig:pipeline_short}
|
||||
\end{figure}
|
||||
@@ -825,7 +966,28 @@ the signal for reliable song recognition.
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_features_cross_species.pdf}
|
||||
\caption{\textbf{Inter- and intraspecific feature variability.}
|
||||
\caption{\textbf{Interspecific and intraspecific feature variability.}
|
||||
Average value $\mu_{f_i}$ of each feature $f_i(t)$ against
|
||||
its counterpart from a 2nd feature set based on a
|
||||
different input $\raw(t)$. Each dot within a subplot
|
||||
represents a single feature $f_i(t)$. Different color
|
||||
shades indicate different types of Gabor kernels with
|
||||
specific lobe number $\kn$ and either $+$ or $-$ sign,
|
||||
sorted (dark to light) first by increasing $\kn$ and then
|
||||
by sign~($1\,\leq\,\kn\,\leq\,4$; first $+$, then $-$ for
|
||||
each $\kn$; five kernel widths $\kw$ of 1, 2, 4, 8, and
|
||||
$16\,$ms per type; 8 types, 40 kernels in total). Data is
|
||||
based on the analysis underlying
|
||||
Fig\,\ref{fig:pipeline_full}.
|
||||
\textbf{Lower triangular}:~Interspecific comparisons
|
||||
between single songs of different species.
|
||||
\textbf{Upper triangular}:~Intraspecific comparisons
|
||||
between different songs of a single species (\textit{O.
|
||||
rufipes}).
|
||||
\textbf{Lower left}:~Distribution of correlation
|
||||
coefficients $\rho$ for each interspecific and
|
||||
intraspecific comparison. Dots indicate single $\rho$
|
||||
values.
|
||||
}
|
||||
\label{fig:feat_cross_species}
|
||||
\end{figure}
|
||||
|
||||
Reference in New Issue
Block a user