Managed 1st half of results text, began with 2nd half.

2026-02-25 16:53:49 +01:00
parent c700e1723c
commit cc701a09f8
8 changed files with 216 additions and 2423 deletions
--- a/main.tex
+++ b/main.tex
@@ -29,6 +29,9 @@
    mincitenames=1
    ]{biblatex}
 \addbibresource{cite.bib}
+%\bibdata
+%\bibstyle
+%\citation

 \title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system}
 \author{Jona Hartling, Jan Benda}
@@ -82,7 +85,7 @@
 \newcommand{\fwrh}{\text{FWRH}} % Gaussian full-width at relative height
 \newcommand{\off}{\beta_0} % Offset for linear frequency approximation

-% Math shorthands - Threshold nonlinearity:
+% Math shorthands - Thresholding nonlinearity:
 \newcommand{\thr}{\Theta_i} % Step function threshold value
 \newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function

@@ -90,6 +93,7 @@
 \newcommand{\soc}{s} % Song component of synthetic mixture
 \newcommand{\noc}{\eta} % Noise component of synthetic mixture
 \newcommand{\sca}{\alpha} % Multiplicative scale of song component
+\newcommand{\xvar}{\sigma_{x}^{2}} % Variance of synthetic mixture
 \newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song component variance
 \newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
 \newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
@@ -509,7 +513,7 @@ threshold value $\thr$ to obtain a binary response
    \label{eq:binary}
 \end{equation}
 which can be thought of as a categorization into "relevant" and "irrelevant"
-response values. In the grasshopper, these threshold nonlinearities might
+response values. In the grasshopper, these thresholding nonlinearities might
 either be part of the processing within the ascending neurons or take place
 further downstream~(SOURCE). Finally, the responses of the ascending neurons
 are assumed to be integrated somewhere in the supraesophageal
@@ -543,47 +547,38 @@ can be read out by a simple linear classifier.
 \end{figure}
 \FloatBarrier

-\section{Two mechanisms driving the emergence of intensity-invariant song representations}
+\section{Two mechanisms driving the emergence of intensity-invariant song representation}

 % Still missing the SNR analysis. Should be able to write around it for now.
 The robustness of song recognition is tied to the degree of intensity
 invariance of the finalized feature representation. Ideally, the values of each
 feature should depend only on the relative amplitude dynamics of the song
-pattern but not on the overall intensity level of the song. In the grasshopper,
-the emergence of intensity-invariant representations along the song recognition
+pattern but not on the overall intensity of the song. In the grasshopper, the
+emergence of intensity-invariant representations along the song recognition
 pathway likely is a distributed process that involves different neuronal
 populations, which raises the question of what the essential computational
 mechanisms are that drive this process. Within the model pathway, we identified
 two key mechanisms that render the song representation more invariant to
-variations in baseline intensity. The two mechanisms each comprise a nonlinear
-signal transformation followed by a linear signal transformation but differ in
-the specific operations and the neural substrate involved, as outlined in the
-following sections.
+intensity variations. The two mechanisms each comprise a nonlinear signal
+transformation followed by a linear signal transformation but differ in the
+specific operations involved, as outlined in the following sections.

 \subsection{Logarithmic compression \& spike-frequency adaptation}

-The first emergence of intensity invariance along the model pathway occurs
-during the preprocessing stage, in the transition from the signal envelope
-$\env(t)$ to the logarithmically scaled envelope $\db(t)$ and then to the
-intensity-adapted envelope $\adapt(t)$. In order to disentangle the interplay
-of logarithmic compression and adaptation, we can rewrite
-$\env(t)$~(Eq.\,\ref{eq:env}) as synthetic mixture
+The first notable emergence of intensity invariance along the model pathway
+occurs during the transformation of the signal envelope $\env(t)$ into the
+logarithmically scaled envelope $\db(t)$ and then into the intensity-adapted
+envelope $\adapt(t)$. In order to disentangle the interplay of logarithmic
+compression and adaptation, $\env(t)$ can be rewritten as a synthetic mixture
 \begin{equation}
    \env(t)\,=\,\sca\,\cdot\,\soc(t)\,+\,\noc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
    \label{eq:toy_env}
 \end{equation}
 of a song component $\soc(t)$ with variable multiplicative scale $\sca\geq0$
 and a fixed-scale noise component $\noc(t)$. Both $\soc(t)$ and $\noc(t)$ are
-assumed to have unit variance~($\svar=\nvar=1$). If $\soc(t)$ and $\noc(t)$ are
-uncorrelated~($\soc(t)\perp\noc(t)$), the signal-to-noise ratio (SNR) of the
-synthetic $\env(t)$ with ($\sca>0$) and without ($\sca=0$) song component
-$\soc(t)$ is given by
-\begin{equation}
-    \text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1
-    \label{eq:toy_snr}
-\end{equation}
-When simplifying the decibel transformation~(Eq.\,\ref{eq:log}), the logarithmically
-scaled envelope $\db(t)$ can be expressed as a sum of two logarithmic terms
+assumed to have unit variance. By conversion of $\env(t)$ to decibel
+scale~(Eq.\,\ref{eq:log}), $\sca$ turns from a multiplicative scale in linear
+space into an additive term, or offset, in logarithmic space
 \begin{equation}
    \begin{split}
        \db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
@@ -591,99 +586,90 @@ scaled envelope $\db(t)$ can be expressed as a sum of two logarithmic terms
    \end{split}
    \label{eq:toy_log}
 \end{equation}
-
-
-
-
-\textbf{Logarithmic component:}\\
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
-
-$\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\
-$\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\
-$\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\
-$\rightarrow$ Normalization by $\dbref$ applies equally to all terms (no individual effects)
-
-\textbf{Adaptation component:}\\
- Highpass filter over $\db(t)$ (Eq.\,\ref{eq:highpass}) can
-be approximated as subtraction of the local signal offset within a suitable time
-interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$)
-%
+which allows for its separation from $\soc(t)$ but introduces a scaling of
+$\noc(t)$ by the inverse of $\sca$. The subsequent
+highpass-filtering~(Eq.\,\ref{eq:highpass}) of $\db(t)$ can then be
+approximated as a subtraction of the local offset within a suitable time
+interval $0 \ll \thp < \frac{1}{\fc}$:
 \begin{equation}
    \begin{split}
    \adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log\left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right]
    \end{split}
    \label{eq:toy_highpass}
 \end{equation}
-%
-\textbf{Implication for intensity invariance:}\\
- Logarithmic scaling is essential for equalizing different song intensities\\
-$\rightarrow$ Intensity information can be manipulated more easily when in form
-of a signal offset in log-space than a multiplicative scale in linear space
-
- Scale $\alpha$ can only be redistributed, not entirely eliminated from $\adapt(t)$\\
-$\rightarrow$ Turn initial scaling of song $s(t)$ by $\alpha$ into scaling of noise $\eta(t)$ by $\frac{1}{\alpha}$
-
- Capability to compensate for intensity variations, i.e. selective amplification
-of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
-$\alpha\gg1$: Attenuation of $\eta(t)$ term $\rightarrow$ $s(t)$ dominates $\adapt(t)$\\
-$\alpha\approx1$ Negligible effect on $\eta(t)$ term $\rightarrow$ $\adapt(t)=\log[s(t)+\eta(t)]$\\
-$\alpha\ll1$: Amplification of $\eta(t)$ term $\rightarrow$ $\eta(t)$ dominates $\adapt(t)$\\
-$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
-$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
-
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
-$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
-
-\subsection{Threshold nonlinearity \& temporal averaging}
-
-Convolved $c_i(t)$ $\xrightarrow{\nl}$ Binary $b_i(t)$ $\xrightarrow{\lp}$ Feature $f_i(t)$
-
-\textbf{Thresholding component:}\\
- Within an observed time interval $T$, $c_i(t)$ follows probability density $\pc$\\
- Within $T$, $c_i(t)$ exceeds threshold value $\thr$ for time $T_1$ ($T_1+T_0=T$)\\
- Threshold $\nl$ splits $\pc$ around $\thr$ in two complementary parts
-%
+This means that $\sca$ cannot be entirely eliminated from $\adapt(t)$, only
+redistributed between $\soc(t)$ and $\noc(t)$. In consequence, if $\sca$ is
+sufficiently large ($\sca\gg1$), $\noc(t)$ is attenuated to the point of being
+negligible, so that $\adapt(t)$ represents $\soc(t)$ in a scale-free manner. If
+$\soc(t)$ and $\noc(t)$ are at similar scales ($\sca\approx1$), $\adapt(t)$
+largely resembles $\db(t)$. However, if $\sca$ is sufficiently small
+($\sca\ll1$), $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation.
+Therefore, the effective intensity invariance of $\adapt(t)$ relative to
+$\env(t)$ is limited by the initial scaling of $\soc(t)$ relative to $\noc(t)$;
+that is, the signal-to-noise ratio (SNR) of $\env(t)$ with ($\sca>0$) and
+without ($\sca=0$) song component $\soc(t)$
 \begin{equation}
-    \int_{\thr}^{+\infty} p(c_i,T)\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} p(c_i,T)\,dc_i\,=\,\frac{T_1}{T}
+    \text{SNR}(\sca)\,=\,\frac{\xvar}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1, \qquad \svar\,=\,\nvar\,=\,1
+    \label{eq:toy_snr}
+\end{equation}
+which depends quadratically on $\sca$ if $\soc(t)$ and $\noc(t)$ are
+uncorrelated~($\soc(t)\perp\noc(t)$). In summary, the combination of
+logarithmic compression and adaptation allows for the equalization of different
+sufficiently large song scales, which is essential for intensity-invariant song
+representation. However, this mechanism is unable to recover songs that have
+already sunken below the noise floor, which emphasizes the importance of a
+sufficiently high SNR at the intial reception of the signal for reliable song
+recognition.
+
+\subsection{Thresholding nonlinearity \& temporal averaging}
+
+The second key mechanism for the emergence of intensity invariance along the
+model pathway takes place during the transformation of the kernel responses
+$c_i(t)$ over the binary responses $b_i(t)$ into the finalized features
+$f_i(t)$. This mechanism is mediated by the thresholding nonlinearity $\nl$. By
+passing $c_i(t)$ through the thresholding nonlinearity~(Eq.\,\ref{eq:binary}),
+its probability density within some observed time interval $T$ is split around
+threshold value $\thr$ into two complementary parts:
+\begin{equation}
+    \int_{\thr}^{+\infty} \pc\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} \pc\,dc_i\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc_i\,=\,1
    \label{eq:pdf_split}
 \end{equation}
-%
-$\rightarrow$ Semi-definite integral over right-sided portion of split $\pc$ gives ratio
-of time $T_1$ where $c_i(t)>\thr$ to total time $T$ due to normalization of $\pc$
-%
+Due to the normalization of $\pc$, the semi-definite integral over the
+right-sided part of the split $\pc$ is the ratio of time $T_1$ during which
+$c_i(t)$ exceeds $\thr$ within the total time $T$. If the subsequent lowpass
+filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
+averaging over a suitable time interval
+$\tlp>\frac{1}{\fc}$
 \begin{equation}
-    \infint \pc\,dc_i\,=\,1
-    \label{eq:pdf}
-\end{equation}
-%
-\textbf{Averaging component:}\\
- Lowpass filter over binary response $b_i(t)$ (Eq.\,\ref{eq:lowpass}) can be
-approximated as temporal averaging over a suitable time interval $\tlp$ ($\tlp > \frac{1}{\fc}$)\\
- Within $\tlp$, $b_i(t)$ takes a value of 1 ($c_i(t)>\thr$) for time $T_1$ ($T_1+T_0=\tlp$)
-%
-\begin{equation}
-    f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}
+    f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
    \label{eq:feat_avg}
 \end{equation}
-%
-$\rightarrow$ Temporal averaging over $b_i(t)\in[0,1]$ (Eq.\,\ref{eq:binary}) gives
-ratio of time $T_1$ where $c_i(t)>\thr$ to total averaging interval $\tlp$\\
-$\rightarrow$ Feature $f_i(t)$ approximately represents supra-threshold fraction of $\tlp$
-
-\textbf{Combined result:}\\
- Feature $f_i(t)$ can be linked to the distribution of $c_i(t)$ using Eqs.\,\ref{eq:pdf_split} \& \ref{eq:feat_avg}
-%
+feature $f_i(t)$ likewise represents a ratio of time $T_1$ during which
+$b_i(t)$ is 1 within the total averaging interval $\tlp$. Since $b_i(t)$ is 1
+where $c_i(t)>\thr$, $f_i(t)$ relates to the probability density of $c_i(t)$ by
 \begin{equation}
    f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
    \label{eq:feat_prop}
 \end{equation}
-%
-$\rightarrow$ Because the integral over a probability density is a cumulative
-probability, the value of feature $f_i(t)$ (temporal compression of $b_i(t)$)
-at every time point $t$ signifies the probability that convolution output
-$c_i(t)$ exceeds the threshold value $\thr$ during the corresponding averaging
-interval $\tlp$ 
+Therefore, the value of $f_i(t)$ at every time point $t$ approximately
+signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
+corresponding averaging interval $\tlp$. Accordingly, the combination of
+thresholding nonlinearity and temporal averaging constitutes a remapping of a
+quantity that encodes temporal similarity between signal $\adapt(t)$ and kernel
+$k_i(t)$ into a quantity that encodes a duty cycle with respect to $\thr$.
+
+Accordingly, the combination of
+thresholding nonlinearity and temporal averaging constitutes a remapping of the
+amplitude-encoding quantity $c_i(t)$ into the duty cycle-encoding quantity
+$f_i(t)$ by binning graded amplitude values into one of two categorical states.
+This deliberate loss of precise amplitude information is the key to intensity
+invariance of the finalized features, as different scales of $c_i(t)$ can
+result in similar $T_1$ segments depending on the magnitude of the derivative
+of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$ crosses
+$\thr$.
+
+
+

 \textbf{Implication for intensity invariance:}\\
 - Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
@@ -743,6 +729,21 @@ large-scale AM, current overall intensity level)\\
 $\rightarrow$ Without time scale selectivity, any fully intensity-invariant
 output will be a flat line

+
+\textbf{Log-HP: Implication for intensity invariance:}\\
+- Logarithmic scaling is essential for equalizing different song intensities\\
+$\rightarrow$ Intensity information can be manipulated more easily when in form
+of a signal offset in log-space than a multiplicative scale in linear space
+
+- Capability to compensate for intensity variations, i.e. selective amplification
+of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
+$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
+$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
+
+- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
+$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
+
+
 The model pathway includes a rather large number of Gabor kernels compared to
 the 15 to 20 ascending neurons in the grasshopper auditory
 system~(\bcite{stumpner1991auditory}).