Added newly processed species to fig_features_cross_species.pdf.

Wrote more of the results.
This commit is contained in:
j-hartling
2026-05-05 14:44:57 +02:00
parent 16014c02a0
commit 05e808ba30
10 changed files with 270 additions and 274 deletions

183
main.tex
View File

@@ -103,8 +103,8 @@
\newcommand{\xvar}{\sigma_{x}^{2}} % Variance of synthetic mixture
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song component variance
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
\newcommand{\pclp}{p(c_i,\,\tlp)} % Probability density (lowpass interval)
\newcommand{\pc}{p(c,\,T)} % Probability density (general interval)
\newcommand{\pclp}{p(c,\,\tlp)} % Probability density (lowpass interval)
\section{Exploring a grasshopper's sensory world}
@@ -758,8 +758,7 @@ saturation regime is, of course, desirable in the context of intensity
invariance, but it also means to pass up on the higher SNR values that are
achieved by $\env(t)$ for the same $\sca$ (up to several orders of magnitude,
Fig.\,\ref{fig:log-hp}d). This trade-off between intensity invariance and SNR
--- and the consequences it has further downstream along the pathway --- are
adressed in the following sections.
is a recurring phenomenon that is further addressed in the following sections.
\begin{figure}[!ht]
\centering
@@ -797,6 +796,92 @@ adressed in the following sections.
\subsection{Thresholding nonlinearity \& temporal averaging}
The third nonlinear transformation along the model pathway is the thresholding
nonlinearity $\nl$ that transforms each kernel response $c_i(t)$ into a binary
binary response $b_i(t)$, Eq.\,\ref{eq:binary}. This transformation takes place
after the convolutional filtering of $\adapt(t)$ with kernel $k_i(t)$,
Eq.\,\ref{eq:conv}, and is followed by the temporal averaging of $b_i(t)$ into
the feature set $f_i(t)$ by a lowpass filter, Eq.\,\ref{eq:lowpass}. The
effects of thresholding and temporal averaging are best illustrated based on a
single kernel~(Fig.\,\ref{fig:thresh-lp_single}) instead of the full set. For
this analysis, input $\adapt(t)$ was
rescaled~(Fig.\,\ref{fig:thresh-lp_single}a) and convolved with kernel $k(t)$.
The resulting kernel response $c(t)$ was passed through $H(c\,-\,\Theta)$ with
three different threshold values
$\Theta$~(Fig.\,\ref{fig:thresh-lp_single}b-d). Each resulting binary response
$b(t)$ was transformed into $f(t)$, whose average feature value serves as a
measure of intensity~(Fig.\,\ref{fig:thresh-lp_single}ef). The thresholding
nonlinearity $H(c\,-\,\Theta)$ categorizes the values of $c(t)$ into "relevant"
($c(t)>\Theta$, $b(t)=1$) and "irrelevant" ($c(t)\leq\Theta$, $b(t)=0$)
response values. It thereby splits the probability density $\pc$ of $c(t)$
within some observed time interval $T$ into two complementary parts around
$\Theta$:
\begin{equation}
\int_{\Theta}^{+\infty} \pc\,dc\,=\,1\,-\,\int_{-\infty}^{\Theta} \pc\,dc\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc\,=\,1
\label{eq:pdf_split}
\end{equation}
The right-sided part of the split $\pc$ corresponds to time $T_1$ where
$c(t)>\Theta$, while the left-sided part corresponds to time $T_0=T-T_1$ where
$c(t)\leq\Theta$. The semi-definite integral over the right-sided part of $\pc$
represents the ratio of time $T_1$ to total time $T$ because the indefinite
integral of a probability density is normalized to 1. The lowpass filtering of
$b(t)$ can be approximated as temporal averaging over a suitable time interval
$\tlp>\frac{1}{\fc}$ in order to express $f(t)$ as a similar temporal ratio
\begin{equation}
f(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b(t)\,\in\,\{0,\,1\}
\label{eq:feat_avg}
\end{equation}
of time $T_1$ during which $b(t)$ is 1 within the averaging interval $\tlp$.
Therefore, the value of $f(t)$ at every time point $t$ approximately signifies
the cumulative probability that $c(t)$ exceeds $\Theta$ during the
corresponding averaging interval $\tlp$:
\begin{equation}
f(t)\,\approx\,\int_{\Theta}^{+\infty} \pclp\,dc\,=\,P(c\,>\,\Theta,\,\tlp)
\label{eq:feat_prop}
\end{equation}
In a sense, $f(t)$ can be interpreted as some sort of duty cycle with respect
to $\Theta$. For example, a feature value of $f(t)=0.4$ means that $c(t)$
exceeds $\Theta$ for approximately 40\,\% of the time within $\tlp$ around $t$.
In the most extreme cases, $\Theta$ lays either above the maximum of $c(t)$ or
below the minimum of $c(t)$, which results in a minimum or maximum possible
feature value of $f(t)=0$~(Fig.\,\ref{fig:thresh-lp_single}d, left column) or
$f(t)=1$, respectively.
Importantly, $f(t)$ neither retains information about the timing of individual
threshold crossings nor the precise values of $c(t)$ apart from their relation
to $\Theta$. Accordingly, for a given $\Theta$, different $\sca$ can still
result in similar $T_1$ segments (and hence similar feature values) depending
on the magnitude of the derivative of $c(t)$ in temporal proximity to time
points at which $c(t)$ crosses $\Theta$: The steeper the slope of $c(t)$, the
less $T_1$ changes with variations in $\sca$. The most reliable way of
exploiting this invariant porperty of $f(t)$ is to set $\Theta$ to a value near
0, because these values are least affected by different scales of $c(t)$. For
sufficiently large $\sca$, $f(t)$ then approaches the same constant value in
both the noiseless and the noisy case~(Fig.\,\ref{fig:thresh-lp_single}e,
saturation regime).
The value of $f(t)$ in the saturation regime is independent of the precise
value of $\Theta$, but the value of $\sca$ at which the saturation regime is
reached decreses with $\Theta$~(Fig.\,\ref{fig:thresh-lp_single}e). Therefore,
a threshold value of $\Theta=0$ would be the optimal choice for achieving
intensity invariance at the lowest possible $\sca$. In stark contrast, the
closer $\Theta$ is to 0, the higher the pure-noise response of $f(t)$ and the
lower the resulting SNR of $f(t)$ between noise regime and saturation
regime~(Fig.\,\ref{fig:thresh-lp_single}b-d, left column, and
Fig.\,\ref{fig:thresh-lp_single}e). It is even possible to achieve an
"unlimited" SNR of $f(t)$ by setting $\Theta$ above the maximum of the
pure-noise $c(t)$, so that any value of $f(t)$ greater than 0 indicates the
presence of the song component $\soc(t)$ in input $\adapt(t)$ at the cost of
requiring a higher $\sca$ to reach the saturation regime. This trade-off
between intensity invariance and SNR has already been observed during the
previous analysis on logarithmic compression and
adaptation~(Fig.\,\ref{fig:log-hp}d). However, the parameters that determine
the SNR of $\adapt(t)$ are much less understood and likely relate to properties
of the signal, whereas the SNR of $f(t)$ depends on the choice of $\Theta$ and
can be more directly manipulated by the system.
Finally,
\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{figures/fig_invariance_thresh_lp_single.pdf}
@@ -1003,96 +1088,6 @@ adressed in the following sections.
\end{figure}
\FloatBarrier
The second key mechanism for the emergence of intensity invariance along the
model pathway takes place during the transformation of the kernel responses
$c_i(t)$ over the binary responses $b_i(t)$ into the finalized features
$f_i(t)$. Kernel response $c_i(t)$ quantifies the degree of similarity between
kernel $k_i(t)$ and the preprocessed signal $\adapt(t)$. The thresholding
nonlinearity $\nl$ categorizes the value of $c_i(t)$ at every time point $t$
into "relevant" ($c_i(t)>\thr$, $b_i(t)=1$) and "irrelevant" ($c_i(t)\leq\thr$,
$b_i(t)=0$) response values
By passing $c_i(t)$ through the thresholding
nonlinearity $\nl$, its amplitude values are binned
into one of two categories~(Eq.\,\ref{eq:binary}).
: $c_i(t)>\thr$
This mechanism is mediated by the thresholding nonlinearity $\nl$. By
passing $c_i(t)$ through the thresholding nonlinearity~(Eq.\,\ref{eq:binary}),
its probability density $\pc$ within some observed time interval $T$ is split
around threshold value $\thr$ into two complementary parts:
\begin{equation}
\int_{\thr}^{+\infty} \pc\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} \pc\,dc_i\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc_i\,=\,1
\label{eq:pdf_split}
\end{equation}
The right-sided part of the split $\pc$ corresponds to time $T_1$ where
$c_i(t)>\thr$, while the left-sided part corresponds to time $T_0=T-T_1$ where
$c_i(t)\leq\thr$. The semi-definite integral over the right-sided part of $\pc$
represents the ratio of time $T_1$ to total time $T$ because the indefinite
integral of a probability density is normalized to 1. Following the
thresholding nonlinearity, the resulting binary responses $b_i(t)$ are
lowpass-filtered~(Eq.\,\ref{eq:lowpass}) to obtain $f_i(t)$, which can be
approximated as temporal averaging over a suitable time interval
$\tlp>\frac{1}{\fc}$
\begin{equation}
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
\label{eq:feat_avg}
\end{equation}
Feature $f_i(t)$
If the lowpass
filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
averaging over a suitable time interval $\tlp>\frac{1}{\fc}$, then $f_i(t)$ can
be linked to a similar temporal ratio
% \begin{equation}
% f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
% \label{eq:feat_avg}
% \end{equation}
of time $T_1$ during which $b_i(t)$ is 1 within the total averaging interval
$\tlp$. Therefore, the value of $f_i(t)$ at every time point $t$ approximately
signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
corresponding averaging interval $\tlp$:
\begin{equation}
f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
\label{eq:feat_prop}
\end{equation}
In a sense, $f_i(t)$ resembles a duty cycle of some sort, which quantifies
purely temporal relations in the structure of $c_i(t)$ with no regard for
precise amplitude values apart from their relation to $\thr$.
Accordingly, a substantial amount of information about the degree of similarity
between signal $\adapt(t)$ and kernel $k_i(t)$ that is contained in $c_i(t)$ is
lost during its transformation into $f_i(t)$. Instead, $f_i(t)$ only retains
information about the temporal relation of $c_i(t)$ relative to $\thr$
This loss of amplitude information is the key to the intensity
invariance of $f_i(t)$: For a given $\thr$, different scales of $c_i(t)$ can
still result in similar $T_1$ segments depending on the magnitude of the
derivative of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$
crosses $\thr$. The steeper the slope of $c_i(t)$ around the threshold
crossings, the less $T_1$ changes with scale variations.
In a sense, $f_i(t)$ resembles a duty
cycle of some sort, as it quantifies purely temporal relations in the structure
of $c_i(t)$ with no regard for precise amplitude values apart from their
relation to $\thr$. This near-complete loss of amplitude information is the key
to the intensity invariance of $f_i(t)$: For a given $\thr$, different scales
of $c_i(t)$ can still result in similar $T_1$ segments depending on the
magnitude of the derivative of $c_i(t)$ in temporal proximity to time points at
which $c_i(t)$ crosses $\thr$. The steeper the slope of $c_i(t)$ around the
threshold crossings, the less $T_1$ changes with scale variations.
\section{Discriminating species-specific song\\patterns in feature space}
\section{Conclusions \& outlook}
\textbf{Song recognition pathway: Grasshopper vs. model:}\\