Began writing results :)

This commit is contained in:
j-hartling
2026-02-23 16:48:53 +01:00
parent 1ea2081eab
commit c700e1723c
10 changed files with 197 additions and 118 deletions

114
main.tex
View File

@@ -86,9 +86,12 @@
\newcommand{\thr}{\Theta_i} % Step function threshold value
\newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function
% Math shorthands - Minor symbols and helpers:
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song signal variance
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise signal variance
% Math shorthands - Intensity invariance analysis:
\newcommand{\soc}{s} % Song component of synthetic mixture
\newcommand{\noc}{\eta} % Noise component of synthetic mixture
\newcommand{\sca}{\alpha} % Multiplicative scale of song component
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song component variance
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
\newcommand{\pclp}{p(c_i,\,\tlp)} % Probability density (lowpass interval)
@@ -387,7 +390,7 @@ sigmoidal response curve over logarithmically compressed intensity
levels~(\bcite{suga1960peripheral}; \bcite{gollisch2002energy}). In the model
pathway, logarithmic compression is achieved by conversion to decibel scale
\begin{equation}
\db(t)\,=\,10\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,\max[\env(t)]
\db(t)\,=\,10\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,\max\big[\env(t)\big]
\label{eq:log}
\end{equation}
relative to the maximum intensity $\dbref$ of the signal envelope $\env(t)$.
@@ -492,12 +495,12 @@ the left of the two central lobes (odd kernels).
\label{tab:gabor_phases}
\end{table}
\FloatBarrier
These four groups of Gabor kernels allow for the extraction of different types
of signal features, such as the presence of peaks (even, $+$), troughs (even,
$-$), onsets (odd, $+$), and offsets (odd, $-$) at various time scales.
These four major groups of Gabor kernels allow for the extraction of different
types of signal features, such as the presence of peaks (even, $+$), troughs
(even, $-$), onsets (odd, $+$), and offsets (odd, $-$) at various time scales.
Following the convolutional template matching, each kernel-specific response
$c_i(t)$ is passed through a shifted Heaviside step-function $\nl$ with threshold
value $\thr$ to obtain a binary response
$c_i(t)$ is passed through a shifted Heaviside step-function $\nl$ with
threshold value $\thr$ to obtain a binary response
\begin{equation}
b_i(t,\,\thr)\,=\,\begin{cases}
\;1, \quad c_i(t)\,>\,\thr\\
@@ -528,6 +531,10 @@ can be read out by a simple linear classifier.
\includegraphics[width=\textwidth]{figures/fig_feat_stages.pdf}
\caption{\textbf{Representations of a song of \textit{O. rufipes} during
the feature extraction stage.}
Different colors indicate Gabor kernels with different
lobe number $\kn$ and sign, with lighter colors for higher
$\kn$~($1\,\leq\,\kn\,\leq\,4$; both $+$ and $-$ per $\kn$;
two kernel widths $\kw$ of $4\,$ms and $32\,$ms per sign).
\textbf{a}:~Kernel-specific filter responses.
\textbf{b}:~Binary responses.
\textbf{c}:~Finalized features.
@@ -536,55 +543,62 @@ can be read out by a simple linear classifier.
\end{figure}
\FloatBarrier
\section{Two mechanisms driving the emergence of intensity-invariant song representation}
\section{Two mechanisms driving the emergence of intensity-invariant song representations}
\textbf{Definition of invariance (general, systemic):}\\
Invariance = Property of a system to maintain a stable output with respect to a
set of relevant input parameters (variation to be represented) but irrespective
of one or more other parameters (variation to be discarded)
$\rightarrow$ Selective input-output decorrelation
% Still missing the SNR analysis. Should be able to write around it for now.
The robustness of song recognition is tied to the degree of intensity
invariance of the finalized feature representation. Ideally, the values of each
feature should depend only on the relative amplitude dynamics of the song
pattern but not on the overall intensity level of the song. In the grasshopper,
the emergence of intensity-invariant representations along the song recognition
pathway likely is a distributed process that involves different neuronal
populations, which raises the question of what the essential computational
mechanisms are that drive this process. Within the model pathway, we identified
two key mechanisms that render the song representation more invariant to
variations in baseline intensity. The two mechanisms each comprise a nonlinear
signal transformation followed by a linear signal transformation but differ in
the specific operations and the neural substrate involved, as outlined in the
following sections.
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
Intensity invariance = Time scale-selective sensitivity to certain faster
amplitude dynamics (song waveform, small-scale AM) and simultaneous
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
large-scale AM, current overall intensity level)\\
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
output will be a flat line
\subsection{Logarithmic compression \& spike-frequency adaptation}
\subsection{Logarithmic scaling \& spike-frequency adaptation}
Envelope $\env(t)$ $\xrightarrow{\text{dB}}$ Logarithmic $\db(t)$ $\xrightarrow{\hp}$ Adapted $\adapt(t)$
- Rewrite signal envelope $\env(t)$ (Eq.\,\ref{eq:env}) as a synthetic mixture:\\
1) Song signal $s(t)$ ($\svar=1$) with variable multiplicative scale $\alpha\geq0$\\
2) Fixed-scale additive noise $\eta(t)$ ($\nvar=1$)
%
The first emergence of intensity invariance along the model pathway occurs
during the preprocessing stage, in the transition from the signal envelope
$\env(t)$ to the logarithmically scaled envelope $\db(t)$ and then to the
intensity-adapted envelope $\adapt(t)$. In order to disentangle the interplay
of logarithmic compression and adaptation, we can rewrite
$\env(t)$~(Eq.\,\ref{eq:env}) as synthetic mixture
\begin{equation}
\env(t)\,=\,\alpha\,\cdot\,s(t)\,+\,\eta(t),\qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
\env(t)\,=\,\sca\,\cdot\,\soc(t)\,+\,\noc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
\label{eq:toy_env}
\end{equation}
%
- Signal-to-noise ratio (SNR): Ratio of variances of synthetic mixture
$\env(t)$ with ($\alpha>0$) and without ($\alpha=0$) song signal $s(t)$, assuming $s(t)\perp\eta(t)$
%
of a song component $\soc(t)$ with variable multiplicative scale $\sca\geq0$
and a fixed-scale noise component $\noc(t)$. Both $\soc(t)$ and $\noc(t)$ are
assumed to have unit variance~($\svar=\nvar=1$). If $\soc(t)$ and $\noc(t)$ are
uncorrelated~($\soc(t)\perp\noc(t)$), the signal-to-noise ratio (SNR) of the
synthetic $\env(t)$ with ($\sca>0$) and without ($\sca=0$) song component
$\soc(t)$ is given by
\begin{equation}
\text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1
\label{eq:toy_snr}
\end{equation}
%
\textbf{Logarithmic component:}\\
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
%
When simplifying the decibel transformation~(Eq.\,\ref{eq:log}), the logarithmically
scaled envelope $\db(t)$ can be expressed as a sum of two logarithmic terms
\begin{equation}
\begin{split}
\db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
&=\,\log \frac{\alpha}{\dbref}\,+\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
&=\,\log \frac{\alpha}{\dbref}\,+\,\log \left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right]
\end{split}
\label{eq:toy_log}
\end{equation}
%
\textbf{Logarithmic component:}\\
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
$\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\
$\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\
$\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\
@@ -597,7 +611,7 @@ interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$)
%
\begin{equation}
\begin{split}
\adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
\adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log\left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right]
\end{split}
\label{eq:toy_highpass}
\end{equation}
@@ -715,6 +729,20 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
\section{Conclusions \& outlook}
\textbf{Definition of invariance (general, systemic):}\\
Invariance = Property of a system to maintain a stable output with respect to a
set of relevant input parameters (variation to be represented) but irrespective
of one or more other parameters (variation to be discarded)
$\rightarrow$ Selective input-output decorrelation
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
Intensity invariance = Time scale-selective sensitivity to certain faster
amplitude dynamics (song waveform, small-scale AM) and simultaneous
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
large-scale AM, current overall intensity level)\\
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
output will be a flat line
The model pathway includes a rather large number of Gabor kernels compared to
the 15 to 20 ascending neurons in the grasshopper auditory
system~(\bcite{stumpner1991auditory}).