Managed 1st half of results text, began with 2nd half.

This commit is contained in:
j-hartling
2026-02-25 16:53:49 +01:00
parent c700e1723c
commit cc701a09f8
8 changed files with 216 additions and 2423 deletions

205
main.tex
View File

@@ -29,6 +29,9 @@
mincitenames=1
]{biblatex}
\addbibresource{cite.bib}
%\bibdata
%\bibstyle
%\citation
\title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system}
\author{Jona Hartling, Jan Benda}
@@ -82,7 +85,7 @@
\newcommand{\fwrh}{\text{FWRH}} % Gaussian full-width at relative height
\newcommand{\off}{\beta_0} % Offset for linear frequency approximation
% Math shorthands - Threshold nonlinearity:
% Math shorthands - Thresholding nonlinearity:
\newcommand{\thr}{\Theta_i} % Step function threshold value
\newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function
@@ -90,6 +93,7 @@
\newcommand{\soc}{s} % Song component of synthetic mixture
\newcommand{\noc}{\eta} % Noise component of synthetic mixture
\newcommand{\sca}{\alpha} % Multiplicative scale of song component
\newcommand{\xvar}{\sigma_{x}^{2}} % Variance of synthetic mixture
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song component variance
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
@@ -509,7 +513,7 @@ threshold value $\thr$ to obtain a binary response
\label{eq:binary}
\end{equation}
which can be thought of as a categorization into "relevant" and "irrelevant"
response values. In the grasshopper, these threshold nonlinearities might
response values. In the grasshopper, these thresholding nonlinearities might
either be part of the processing within the ascending neurons or take place
further downstream~(SOURCE). Finally, the responses of the ascending neurons
are assumed to be integrated somewhere in the supraesophageal
@@ -543,47 +547,38 @@ can be read out by a simple linear classifier.
\end{figure}
\FloatBarrier
\section{Two mechanisms driving the emergence of intensity-invariant song representations}
\section{Two mechanisms driving the emergence of intensity-invariant song representation}
% Still missing the SNR analysis. Should be able to write around it for now.
The robustness of song recognition is tied to the degree of intensity
invariance of the finalized feature representation. Ideally, the values of each
feature should depend only on the relative amplitude dynamics of the song
pattern but not on the overall intensity level of the song. In the grasshopper,
the emergence of intensity-invariant representations along the song recognition
pattern but not on the overall intensity of the song. In the grasshopper, the
emergence of intensity-invariant representations along the song recognition
pathway likely is a distributed process that involves different neuronal
populations, which raises the question of what the essential computational
mechanisms are that drive this process. Within the model pathway, we identified
two key mechanisms that render the song representation more invariant to
variations in baseline intensity. The two mechanisms each comprise a nonlinear
signal transformation followed by a linear signal transformation but differ in
the specific operations and the neural substrate involved, as outlined in the
following sections.
intensity variations. The two mechanisms each comprise a nonlinear signal
transformation followed by a linear signal transformation but differ in the
specific operations involved, as outlined in the following sections.
\subsection{Logarithmic compression \& spike-frequency adaptation}
The first emergence of intensity invariance along the model pathway occurs
during the preprocessing stage, in the transition from the signal envelope
$\env(t)$ to the logarithmically scaled envelope $\db(t)$ and then to the
intensity-adapted envelope $\adapt(t)$. In order to disentangle the interplay
of logarithmic compression and adaptation, we can rewrite
$\env(t)$~(Eq.\,\ref{eq:env}) as synthetic mixture
The first notable emergence of intensity invariance along the model pathway
occurs during the transformation of the signal envelope $\env(t)$ into the
logarithmically scaled envelope $\db(t)$ and then into the intensity-adapted
envelope $\adapt(t)$. In order to disentangle the interplay of logarithmic
compression and adaptation, $\env(t)$ can be rewritten as a synthetic mixture
\begin{equation}
\env(t)\,=\,\sca\,\cdot\,\soc(t)\,+\,\noc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
\label{eq:toy_env}
\end{equation}
of a song component $\soc(t)$ with variable multiplicative scale $\sca\geq0$
and a fixed-scale noise component $\noc(t)$. Both $\soc(t)$ and $\noc(t)$ are
assumed to have unit variance~($\svar=\nvar=1$). If $\soc(t)$ and $\noc(t)$ are
uncorrelated~($\soc(t)\perp\noc(t)$), the signal-to-noise ratio (SNR) of the
synthetic $\env(t)$ with ($\sca>0$) and without ($\sca=0$) song component
$\soc(t)$ is given by
\begin{equation}
\text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1
\label{eq:toy_snr}
\end{equation}
When simplifying the decibel transformation~(Eq.\,\ref{eq:log}), the logarithmically
scaled envelope $\db(t)$ can be expressed as a sum of two logarithmic terms
assumed to have unit variance. By conversion of $\env(t)$ to decibel
scale~(Eq.\,\ref{eq:log}), $\sca$ turns from a multiplicative scale in linear
space into an additive term, or offset, in logarithmic space
\begin{equation}
\begin{split}
\db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
@@ -591,99 +586,90 @@ scaled envelope $\db(t)$ can be expressed as a sum of two logarithmic terms
\end{split}
\label{eq:toy_log}
\end{equation}
\textbf{Logarithmic component:}\\
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
$\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\
$\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\
$\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\
$\rightarrow$ Normalization by $\dbref$ applies equally to all terms (no individual effects)
\textbf{Adaptation component:}\\
- Highpass filter over $\db(t)$ (Eq.\,\ref{eq:highpass}) can
be approximated as subtraction of the local signal offset within a suitable time
interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$)
%
which allows for its separation from $\soc(t)$ but introduces a scaling of
$\noc(t)$ by the inverse of $\sca$. The subsequent
highpass-filtering~(Eq.\,\ref{eq:highpass}) of $\db(t)$ can then be
approximated as a subtraction of the local offset within a suitable time
interval $0 \ll \thp < \frac{1}{\fc}$:
\begin{equation}
\begin{split}
\adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log\left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right]
\end{split}
\label{eq:toy_highpass}
\end{equation}
%
\textbf{Implication for intensity invariance:}\\
- Logarithmic scaling is essential for equalizing different song intensities\\
$\rightarrow$ Intensity information can be manipulated more easily when in form
of a signal offset in log-space than a multiplicative scale in linear space
- Scale $\alpha$ can only be redistributed, not entirely eliminated from $\adapt(t)$\\
$\rightarrow$ Turn initial scaling of song $s(t)$ by $\alpha$ into scaling of noise $\eta(t)$ by $\frac{1}{\alpha}$
- Capability to compensate for intensity variations, i.e. selective amplification
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
$\alpha\gg1$: Attenuation of $\eta(t)$ term $\rightarrow$ $s(t)$ dominates $\adapt(t)$\\
$\alpha\approx1$ Negligible effect on $\eta(t)$ term $\rightarrow$ $\adapt(t)=\log[s(t)+\eta(t)]$\\
$\alpha\ll1$: Amplification of $\eta(t)$ term $\rightarrow$ $\eta(t)$ dominates $\adapt(t)$\\
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
\subsection{Threshold nonlinearity \& temporal averaging}
Convolved $c_i(t)$ $\xrightarrow{\nl}$ Binary $b_i(t)$ $\xrightarrow{\lp}$ Feature $f_i(t)$
\textbf{Thresholding component:}\\
- Within an observed time interval $T$, $c_i(t)$ follows probability density $\pc$\\
- Within $T$, $c_i(t)$ exceeds threshold value $\thr$ for time $T_1$ ($T_1+T_0=T$)\\
- Threshold $\nl$ splits $\pc$ around $\thr$ in two complementary parts
%
This means that $\sca$ cannot be entirely eliminated from $\adapt(t)$, only
redistributed between $\soc(t)$ and $\noc(t)$. In consequence, if $\sca$ is
sufficiently large ($\sca\gg1$), $\noc(t)$ is attenuated to the point of being
negligible, so that $\adapt(t)$ represents $\soc(t)$ in a scale-free manner. If
$\soc(t)$ and $\noc(t)$ are at similar scales ($\sca\approx1$), $\adapt(t)$
largely resembles $\db(t)$. However, if $\sca$ is sufficiently small
($\sca\ll1$), $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation.
Therefore, the effective intensity invariance of $\adapt(t)$ relative to
$\env(t)$ is limited by the initial scaling of $\soc(t)$ relative to $\noc(t)$;
that is, the signal-to-noise ratio (SNR) of $\env(t)$ with ($\sca>0$) and
without ($\sca=0$) song component $\soc(t)$
\begin{equation}
\int_{\thr}^{+\infty} p(c_i,T)\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} p(c_i,T)\,dc_i\,=\,\frac{T_1}{T}
\text{SNR}(\sca)\,=\,\frac{\xvar}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1, \qquad \svar\,=\,\nvar\,=\,1
\label{eq:toy_snr}
\end{equation}
which depends quadratically on $\sca$ if $\soc(t)$ and $\noc(t)$ are
uncorrelated~($\soc(t)\perp\noc(t)$). In summary, the combination of
logarithmic compression and adaptation allows for the equalization of different
sufficiently large song scales, which is essential for intensity-invariant song
representation. However, this mechanism is unable to recover songs that have
already sunken below the noise floor, which emphasizes the importance of a
sufficiently high SNR at the intial reception of the signal for reliable song
recognition.
\subsection{Thresholding nonlinearity \& temporal averaging}
The second key mechanism for the emergence of intensity invariance along the
model pathway takes place during the transformation of the kernel responses
$c_i(t)$ over the binary responses $b_i(t)$ into the finalized features
$f_i(t)$. This mechanism is mediated by the thresholding nonlinearity $\nl$. By
passing $c_i(t)$ through the thresholding nonlinearity~(Eq.\,\ref{eq:binary}),
its probability density within some observed time interval $T$ is split around
threshold value $\thr$ into two complementary parts:
\begin{equation}
\int_{\thr}^{+\infty} \pc\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} \pc\,dc_i\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc_i\,=\,1
\label{eq:pdf_split}
\end{equation}
%
$\rightarrow$ Semi-definite integral over right-sided portion of split $\pc$ gives ratio
of time $T_1$ where $c_i(t)>\thr$ to total time $T$ due to normalization of $\pc$
%
Due to the normalization of $\pc$, the semi-definite integral over the
right-sided part of the split $\pc$ is the ratio of time $T_1$ during which
$c_i(t)$ exceeds $\thr$ within the total time $T$. If the subsequent lowpass
filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
averaging over a suitable time interval
$\tlp>\frac{1}{\fc}$
\begin{equation}
\infint \pc\,dc_i\,=\,1
\label{eq:pdf}
\end{equation}
%
\textbf{Averaging component:}\\
- Lowpass filter over binary response $b_i(t)$ (Eq.\,\ref{eq:lowpass}) can be
approximated as temporal averaging over a suitable time interval $\tlp$ ($\tlp > \frac{1}{\fc}$)\\
- Within $\tlp$, $b_i(t)$ takes a value of 1 ($c_i(t)>\thr$) for time $T_1$ ($T_1+T_0=\tlp$)
%
\begin{equation}
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
\label{eq:feat_avg}
\end{equation}
%
$\rightarrow$ Temporal averaging over $b_i(t)\in[0,1]$ (Eq.\,\ref{eq:binary}) gives
ratio of time $T_1$ where $c_i(t)>\thr$ to total averaging interval $\tlp$\\
$\rightarrow$ Feature $f_i(t)$ approximately represents supra-threshold fraction of $\tlp$
\textbf{Combined result:}\\
- Feature $f_i(t)$ can be linked to the distribution of $c_i(t)$ using Eqs.\,\ref{eq:pdf_split} \& \ref{eq:feat_avg}
%
feature $f_i(t)$ likewise represents a ratio of time $T_1$ during which
$b_i(t)$ is 1 within the total averaging interval $\tlp$. Since $b_i(t)$ is 1
where $c_i(t)>\thr$, $f_i(t)$ relates to the probability density of $c_i(t)$ by
\begin{equation}
f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
\label{eq:feat_prop}
\end{equation}
%
$\rightarrow$ Because the integral over a probability density is a cumulative
probability, the value of feature $f_i(t)$ (temporal compression of $b_i(t)$)
at every time point $t$ signifies the probability that convolution output
$c_i(t)$ exceeds the threshold value $\thr$ during the corresponding averaging
interval $\tlp$
Therefore, the value of $f_i(t)$ at every time point $t$ approximately
signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
corresponding averaging interval $\tlp$. Accordingly, the combination of
thresholding nonlinearity and temporal averaging constitutes a remapping of a
quantity that encodes temporal similarity between signal $\adapt(t)$ and kernel
$k_i(t)$ into a quantity that encodes a duty cycle with respect to $\thr$.
Accordingly, the combination of
thresholding nonlinearity and temporal averaging constitutes a remapping of the
amplitude-encoding quantity $c_i(t)$ into the duty cycle-encoding quantity
$f_i(t)$ by binning graded amplitude values into one of two categorical states.
This deliberate loss of precise amplitude information is the key to intensity
invariance of the finalized features, as different scales of $c_i(t)$ can
result in similar $T_1$ segments depending on the magnitude of the derivative
of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$ crosses
$\thr$.
\textbf{Implication for intensity invariance:}\\
- Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
@@ -743,6 +729,21 @@ large-scale AM, current overall intensity level)\\
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
output will be a flat line
\textbf{Log-HP: Implication for intensity invariance:}\\
- Logarithmic scaling is essential for equalizing different song intensities\\
$\rightarrow$ Intensity information can be manipulated more easily when in form
of a signal offset in log-space than a multiplicative scale in linear space
- Capability to compensate for intensity variations, i.e. selective amplification
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
The model pathway includes a rather large number of Gabor kernels compared to
the 15 to 20 ascending neurons in the grasshopper auditory
system~(\bcite{stumpner1991auditory}).