Added newly processed species to fig_features_cross_species.pdf.
Wrote more of the results.
This commit is contained in:
183
main.tex
183
main.tex
@@ -103,8 +103,8 @@
|
||||
\newcommand{\xvar}{\sigma_{x}^{2}} % Variance of synthetic mixture
|
||||
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song component variance
|
||||
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
|
||||
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
|
||||
\newcommand{\pclp}{p(c_i,\,\tlp)} % Probability density (lowpass interval)
|
||||
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
|
||||
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
|
||||
|
||||
\section{Exploring a grasshopper's sensory world}
|
||||
|
||||
@@ -758,8 +758,7 @@ saturation regime is, of course, desirable in the context of intensity
|
||||
invariance, but it also means to pass up on the higher SNR values that are
|
||||
achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
|
||||
Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
|
||||
--- and the consequences it has further downstream along the pathway --- are
|
||||
adressed in the following sections.
|
||||
is a recurring phenomenon that is further addressed in the following sections.
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
@@ -797,6 +796,92 @@ adressed in the following sections.
|
||||
|
||||
\subsection{Thresholding nonlinearity \& temporal averaging}
|
||||
|
||||
The third nonlinear transformation along the model pathway is the thresholding
|
||||
nonlinearity $\nl$ that transforms each kernel response $c_i(t)$ into a binary
|
||||
binary response $b_i(t)$, Eq.\,\ref{eq:binary}. This transformation takes place
|
||||
after the convolutional filtering of $\adapt(t)$ with kernel $k_i(t)$,
|
||||
Eq.\,\ref{eq:conv}, and is followed by the temporal averaging of $b_i(t)$ into
|
||||
the feature set $f_i(t)$ by a lowpass filter, Eq.\,\ref{eq:lowpass}. The
|
||||
effects of thresholding and temporal averaging are best illustrated based on a
|
||||
single kernel~(Fig.\,\ref{fig:thresh-lp_single}) instead of the full set. For
|
||||
this analysis, input $\adapt(t)$ was
|
||||
rescaled~(Fig.\,\ref{fig:thresh-lp_single}a) and convolved with kernel $k(t)$.
|
||||
The resulting kernel response $c(t)$ was passed through $H(c\,-\,\Theta)$ with
|
||||
three different threshold values
|
||||
$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}b-d). Each resulting binary response
|
||||
$b(t)$ was transformed into $f(t)$, whose average feature value serves as a
|
||||
measure of intensity~(Fig.\,\ref{fig:thresh-lp_single}ef). The thresholding
|
||||
nonlinearity $H(c\,-\,\Theta)$ categorizes the values of $c(t)$ into "relevant"
|
||||
($c(t)>\Theta$, $b(t)=1$) and "irrelevant" ($c(t)\leq\Theta$, $b(t)=0$)
|
||||
response values. It thereby splits the probability density $\pc$ of $c(t)$
|
||||
within some observed time interval $T$ into two complementary parts around
|
||||
$\Theta$:
|
||||
\begin{equation}
|
||||
\int_{\Theta}^{+\infty} \pc\,dc\,=\,1\,-\,\int_{-\infty}^{\Theta} \pc\,dc\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc\,=\,1
|
||||
\label{eq:pdf_split}
|
||||
\end{equation}
|
||||
The right-sided part of the split $\pc$ corresponds to time $T_1$ where
|
||||
$c(t)>\Theta$, while the left-sided part corresponds to time $T_0=T-T_1$ where
|
||||
$c(t)\leq\Theta$. The semi-definite integral over the right-sided part of $\pc$
|
||||
represents the ratio of time $T_1$ to total time $T$ because the indefinite
|
||||
integral of a probability density is normalized to 1. The lowpass filtering of
|
||||
$b(t)$ can be approximated as temporal averaging over a suitable time interval
|
||||
$\tlp>\frac{1}{\fc}$ in order to express $f(t)$ as a similar temporal ratio
|
||||
\begin{equation}
|
||||
f(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b(t)\,\in\,\{0,\,1\}
|
||||
\label{eq:feat_avg}
|
||||
\end{equation}
|
||||
of time $T_1$ during which $b(t)$ is 1 within the averaging interval $\tlp$.
|
||||
Therefore, the value of $f(t)$ at every time point $t$ approximately signifies
|
||||
the cumulative probability that $c(t)$ exceeds $\Theta$ during the
|
||||
corresponding averaging interval $\tlp$:
|
||||
\begin{equation}
|
||||
f(t)\,\approx\,\int_{\Theta}^{+\infty} \pclp\,dc\,=\,P(c\,>\,\Theta,\,\tlp)
|
||||
\label{eq:feat_prop}
|
||||
\end{equation}
|
||||
In a sense, $f(t)$ can be interpreted as some sort of duty cycle with respect
|
||||
to $\Theta$. For example, a feature value of $f(t)=0.4$ means that $c(t)$
|
||||
exceeds $\Theta$ for approximately 40\,\% of the time within $\tlp$ around $t$.
|
||||
In the most extreme cases, $\Theta$ lays either above the maximum of $c(t)$ or
|
||||
below the minimum of $c(t)$, which results in a minimum or maximum possible
|
||||
feature value of $f(t)=0$~(Fig.\,\ref{fig:thresh-lp_single}d, left column) or
|
||||
$f(t)=1$, respectively.
|
||||
|
||||
Importantly, $f(t)$ neither retains information about the timing of individual
|
||||
threshold crossings nor the precise values of $c(t)$ apart from their relation
|
||||
to $\Theta$. Accordingly, for a given $\Theta$, different $\sca$ can still
|
||||
result in similar $T_1$ segments (and hence similar feature values) depending
|
||||
on the magnitude of the derivative of $c(t)$ in temporal proximity to time
|
||||
points at which $c(t)$ crosses $\Theta$: The steeper the slope of $c(t)$, the
|
||||
less $T_1$ changes with variations in $\sca$. The most reliable way of
|
||||
exploiting this invariant porperty of $f(t)$ is to set $\Theta$ to a value near
|
||||
0, because these values are least affected by different scales of $c(t)$. For
|
||||
sufficiently large $\sca$, $f(t)$ then approaches the same constant value in
|
||||
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
|
||||
saturation regime).
|
||||
|
||||
The value of $f(t)$ in the saturation regime is independent of the precise
|
||||
value of $\Theta$, but the value of $\sca$ at which the saturation regime is
|
||||
reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
|
||||
a threshold value of $\Theta=0$ would be the optimal choice for achieving
|
||||
intensity invariance at the lowest possible $\sca$. In stark contrast, the
|
||||
closer $\Theta$ is to 0, the higher the pure-noise response of $f(t)$ and the
|
||||
lower the resulting SNR of $f(t)$ between noise regime and saturation
|
||||
regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
|
||||
Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
|
||||
"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
|
||||
pure-noise $c(t)$, so that any value of $f(t)$ greater than 0 indicates the
|
||||
presence of the song component $\soc(t)$ in input $\adapt(t)$ at the cost of
|
||||
requiring a higher $\sca$ to reach the saturation regime. This trade-off
|
||||
between intensity invariance and SNR has already been observed during the
|
||||
previous analysis on logarithmic compression and
|
||||
adaptation~(Fig.\,\ref{fig:log-hp}d). However, the parameters that determine
|
||||
the SNR of $\adapt(t)$ are much less understood and likely relate to properties
|
||||
of the signal, whereas the SNR of $f(t)$ depends on the choice of $\Theta$ and
|
||||
can be more directly manipulated by the system.
|
||||
|
||||
Finally,
|
||||
|
||||
\begin{figure}[!ht]
|
||||
\centering
|
||||
\includegraphics[width=\textwidth]{figures/fig_invariance_thresh_lp_single.pdf}
|
||||
@@ -1003,96 +1088,6 @@ adressed in the following sections.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
The second key mechanism for the emergence of intensity invariance along the
|
||||
model pathway takes place during the transformation of the kernel responses
|
||||
$c_i(t)$ over the binary responses $b_i(t)$ into the finalized features
|
||||
$f_i(t)$. Kernel response $c_i(t)$ quantifies the degree of similarity between
|
||||
kernel $k_i(t)$ and the preprocessed signal $\adapt(t)$. The thresholding
|
||||
nonlinearity $\nl$ categorizes the value of $c_i(t)$ at every time point $t$
|
||||
into "relevant" ($c_i(t)>\thr$, $b_i(t)=1$) and "irrelevant" ($c_i(t)\leq\thr$,
|
||||
$b_i(t)=0$) response values
|
||||
|
||||
By passing $c_i(t)$ through the thresholding
|
||||
nonlinearity $\nl$, its amplitude values are binned
|
||||
into one of two categories~(Eq.\,\ref{eq:binary}).
|
||||
|
||||
: $c_i(t)>\thr$
|
||||
|
||||
|
||||
|
||||
|
||||
This mechanism is mediated by the thresholding nonlinearity $\nl$. By
|
||||
passing $c_i(t)$ through the thresholding nonlinearity~(Eq.\,\ref{eq:binary}),
|
||||
its probability density $\pc$ within some observed time interval $T$ is split
|
||||
around threshold value $\thr$ into two complementary parts:
|
||||
\begin{equation}
|
||||
\int_{\thr}^{+\infty} \pc\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} \pc\,dc_i\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc_i\,=\,1
|
||||
\label{eq:pdf_split}
|
||||
\end{equation}
|
||||
The right-sided part of the split $\pc$ corresponds to time $T_1$ where
|
||||
$c_i(t)>\thr$, while the left-sided part corresponds to time $T_0=T-T_1$ where
|
||||
$c_i(t)\leq\thr$. The semi-definite integral over the right-sided part of $\pc$
|
||||
represents the ratio of time $T_1$ to total time $T$ because the indefinite
|
||||
integral of a probability density is normalized to 1. Following the
|
||||
thresholding nonlinearity, the resulting binary responses $b_i(t)$ are
|
||||
lowpass-filtered~(Eq.\,\ref{eq:lowpass}) to obtain $f_i(t)$, which can be
|
||||
approximated as temporal averaging over a suitable time interval
|
||||
$\tlp>\frac{1}{\fc}$
|
||||
\begin{equation}
|
||||
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
|
||||
\label{eq:feat_avg}
|
||||
\end{equation}
|
||||
Feature $f_i(t)$
|
||||
|
||||
If the lowpass
|
||||
filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
|
||||
averaging over a suitable time interval $\tlp>\frac{1}{\fc}$, then $f_i(t)$ can
|
||||
be linked to a similar temporal ratio
|
||||
% \begin{equation}
|
||||
% f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
|
||||
% \label{eq:feat_avg}
|
||||
% \end{equation}
|
||||
of time $T_1$ during which $b_i(t)$ is 1 within the total averaging interval
|
||||
$\tlp$. Therefore, the value of $f_i(t)$ at every time point $t$ approximately
|
||||
signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
|
||||
corresponding averaging interval $\tlp$:
|
||||
\begin{equation}
|
||||
f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
|
||||
\label{eq:feat_prop}
|
||||
\end{equation}
|
||||
In a sense, $f_i(t)$ resembles a duty cycle of some sort, which quantifies
|
||||
purely temporal relations in the structure of $c_i(t)$ with no regard for
|
||||
precise amplitude values apart from their relation to $\thr$.
|
||||
|
||||
Accordingly, a substantial amount of information about the degree of similarity
|
||||
between signal $\adapt(t)$ and kernel $k_i(t)$ that is contained in $c_i(t)$ is
|
||||
lost during its transformation into $f_i(t)$. Instead, $f_i(t)$ only retains
|
||||
information about the temporal relation of $c_i(t)$ relative to $\thr$
|
||||
|
||||
|
||||
This loss of amplitude information is the key to the intensity
|
||||
invariance of $f_i(t)$: For a given $\thr$, different scales of $c_i(t)$ can
|
||||
still result in similar $T_1$ segments depending on the magnitude of the
|
||||
derivative of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$
|
||||
crosses $\thr$. The steeper the slope of $c_i(t)$ around the threshold
|
||||
crossings, the less $T_1$ changes with scale variations.
|
||||
|
||||
|
||||
|
||||
In a sense, $f_i(t)$ resembles a duty
|
||||
cycle of some sort, as it quantifies purely temporal relations in the structure
|
||||
of $c_i(t)$ with no regard for precise amplitude values apart from their
|
||||
relation to $\thr$. This near-complete loss of amplitude information is the key
|
||||
to the intensity invariance of $f_i(t)$: For a given $\thr$, different scales
|
||||
of $c_i(t)$ can still result in similar $T_1$ segments depending on the
|
||||
magnitude of the derivative of $c_i(t)$ in temporal proximity to time points at
|
||||
which $c_i(t)$ crosses $\thr$. The steeper the slope of $c_i(t)$ around the
|
||||
threshold crossings, the less $T_1$ changes with scale variations.
|
||||
|
||||
|
||||
|
||||
\section{Discriminating species-specific song\\patterns in feature space}
|
||||
|
||||
\section{Conclusions \& outlook}
|
||||
|
||||
\textbf{Song recognition pathway: Grasshopper vs. model:}\\
|
||||
|
||||
Reference in New Issue
Block a user