Syncing to home.

2026-02-27 16:10:14 +01:00
parent cc701a09f8
commit 1f61a4c70e
7 changed files with 179 additions and 157 deletions
--- a/main.tex
+++ b/main.tex
@@ -612,109 +612,114 @@ without ($\sca=0$) song component $\soc(t)$
    \text{SNR}(\sca)\,=\,\frac{\xvar}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1, \qquad \svar\,=\,\nvar\,=\,1
    \label{eq:toy_snr}
 \end{equation}
-which depends quadratically on $\sca$ if $\soc(t)$ and $\noc(t)$ are
-uncorrelated~($\soc(t)\perp\noc(t)$). In summary, the combination of
-logarithmic compression and adaptation allows for the equalization of different
-sufficiently large song scales, which is essential for intensity-invariant song
-representation. However, this mechanism is unable to recover songs that have
-already sunken below the noise floor, which emphasizes the importance of a
-sufficiently high SNR at the intial reception of the signal for reliable song
-recognition.
+which depends quadratically on $\sca$ if $\soc(t)\perp\noc(t)$. Overall, the
+combination of logarithmic compression and adaptation allows for the
+equalization of different sufficiently large song scales, which is essential
+for intensity-invariant song representation. However, this mechanism is unable
+to recover songs that have already sunken below the noise floor, which
+emphasizes the importance of a sufficiently high SNR at the intial reception of
+the signal for reliable song recognition.

 \subsection{Thresholding nonlinearity \& temporal averaging}

 The second key mechanism for the emergence of intensity invariance along the
 model pathway takes place during the transformation of the kernel responses
 $c_i(t)$ over the binary responses $b_i(t)$ into the finalized features
-$f_i(t)$. This mechanism is mediated by the thresholding nonlinearity $\nl$. By
+$f_i(t)$. Kernel response $c_i(t)$ quantifies the degree of similarity between
+kernel $k_i(t)$ and the preprocessed signal $\adapt(t)$. The thresholding
+nonlinearity $\nl$ categorizes the value of $c_i(t)$ at every time point $t$
+into "relevant" ($c_i(t)>\thr$, $b_i(t)=1$) and "irrelevant" ($c_i(t)\leq\thr$,
+$b_i(t)=0$) response values
+
+By passing $c_i(t)$ through the thresholding
+nonlinearity $\nl$, its amplitude values are binned
+into one of two categories~(Eq.\,\ref{eq:binary}).
+
+: $c_i(t)>\thr$
+
+
+
+
+This mechanism is mediated by the thresholding nonlinearity $\nl$. By
 passing $c_i(t)$ through the thresholding nonlinearity~(Eq.\,\ref{eq:binary}),
-its probability density within some observed time interval $T$ is split around
-threshold value $\thr$ into two complementary parts:
+its probability density $\pc$ within some observed time interval $T$ is split
+around threshold value $\thr$ into two complementary parts:
 \begin{equation}
    \int_{\thr}^{+\infty} \pc\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} \pc\,dc_i\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc_i\,=\,1
    \label{eq:pdf_split}
 \end{equation}
-Due to the normalization of $\pc$, the semi-definite integral over the
-right-sided part of the split $\pc$ is the ratio of time $T_1$ during which
-$c_i(t)$ exceeds $\thr$ within the total time $T$. If the subsequent lowpass
-filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
-averaging over a suitable time interval
+The right-sided part of the split $\pc$ corresponds to time $T_1$ where
+$c_i(t)>\thr$, while the left-sided part corresponds to time $T_0=T-T_1$ where
+$c_i(t)\leq\thr$. The semi-definite integral over the right-sided part of $\pc$
+represents the ratio of time $T_1$ to total time $T$ because the indefinite
+integral of a probability density is normalized to 1. Following the
+thresholding nonlinearity, the resulting binary responses $b_i(t)$ are
+lowpass-filtered~(Eq.\,\ref{eq:lowpass}) to obtain $f_i(t)$, which can be
+approximated as temporal averaging over a suitable time interval
 $\tlp>\frac{1}{\fc}$
 \begin{equation}
    f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
    \label{eq:feat_avg}
 \end{equation}
-feature $f_i(t)$ likewise represents a ratio of time $T_1$ during which
-$b_i(t)$ is 1 within the total averaging interval $\tlp$. Since $b_i(t)$ is 1
-where $c_i(t)>\thr$, $f_i(t)$ relates to the probability density of $c_i(t)$ by
+Feature $f_i(t)$ 
+
+If the lowpass
+filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
+averaging over a suitable time interval $\tlp>\frac{1}{\fc}$, then $f_i(t)$ can
+be linked to a similar temporal ratio
+% \begin{equation}
+%     f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
+%     \label{eq:feat_avg}
+% \end{equation}
+of time $T_1$ during which $b_i(t)$ is 1 within the total averaging interval
+$\tlp$. Therefore, the value of $f_i(t)$ at every time point $t$ approximately
+signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
+corresponding averaging interval $\tlp$:
 \begin{equation}
    f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
    \label{eq:feat_prop}
 \end{equation}
-Therefore, the value of $f_i(t)$ at every time point $t$ approximately
-signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
-corresponding averaging interval $\tlp$. Accordingly, the combination of
-thresholding nonlinearity and temporal averaging constitutes a remapping of a
-quantity that encodes temporal similarity between signal $\adapt(t)$ and kernel
-$k_i(t)$ into a quantity that encodes a duty cycle with respect to $\thr$.
+In a sense, $f_i(t)$ resembles a duty cycle of some sort, which quantifies
+purely temporal relations in the structure of $c_i(t)$ with no regard for
+precise amplitude values apart from their relation to $\thr$.

-Accordingly, the combination of
-thresholding nonlinearity and temporal averaging constitutes a remapping of the
-amplitude-encoding quantity $c_i(t)$ into the duty cycle-encoding quantity
-$f_i(t)$ by binning graded amplitude values into one of two categorical states.
-This deliberate loss of precise amplitude information is the key to intensity
-invariance of the finalized features, as different scales of $c_i(t)$ can
-result in similar $T_1$ segments depending on the magnitude of the derivative
-of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$ crosses
-$\thr$.
+Accordingly, a substantial amount of information about the degree of similarity
+between signal $\adapt(t)$ and kernel $k_i(t)$ that is contained in $c_i(t)$ is
+lost during its transformation into $f_i(t)$. Instead, $f_i(t)$ only retains
+information about the temporal relation of $c_i(t)$ relative to $\thr$
+
+
+This loss of amplitude information is the key to the intensity
+invariance of $f_i(t)$: For a given $\thr$, different scales of $c_i(t)$ can
+still result in similar $T_1$ segments depending on the magnitude of the
+derivative of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$
+crosses $\thr$. The steeper the slope of $c_i(t)$ around the threshold
+crossings, the less $T_1$ changes with scale variations.



+In a sense, $f_i(t)$ resembles a duty
+cycle of some sort, as it quantifies purely temporal relations in the structure
+of $c_i(t)$ with no regard for precise amplitude values apart from their
+relation to $\thr$. This near-complete loss of amplitude information is the key
+to the intensity invariance of $f_i(t)$: For a given $\thr$, different scales
+of $c_i(t)$ can still result in similar $T_1$ segments depending on the
+magnitude of the derivative of $c_i(t)$ in temporal proximity to time points at
+which $c_i(t)$ crosses $\thr$. The steeper the slope of $c_i(t)$ around the
+threshold crossings, the less $T_1$ changes with scale variations.

-\textbf{Implication for intensity invariance:}\\
- Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
-template waveform $k_i(t)$ and signal $\adapt(t)$ centered at time point $t$\\
-$\rightarrow$ Based on amplitudes on a graded scale

- Feature $f_i(t)$ quantifies the probability that amplitudes of $c_i(t)$
-exceed threshold value $\thr$ within interval $\tlp$ around time point $t$\\
-$\rightarrow$ Based on binned amplitudes corresponding to one of two categorical states
-$\rightarrow$ Deliberate loss of precise amplitude information\\
-$\rightarrow$ Emphasis on temporal structure (ratio of $T_1$ over $\tlp$)
-
- Thresholding of $c_i(t)$ and subsequent temporal averaging of $b_i(t)$ to
-obtain $f_i(t)$ constitutes a remapping of an amplitude-encoding quantity into a
-duty cycle-encoding quantity, mediated by threshold function $\nl$
-
- Different scales of $c_i(t)$ can result in similar $T_1$ segments depending
-on the magnitude of the derivative of $c_i(t)$ in temporal proximity to time
-points at which $c_i(t)$ crosses threshold value $\thr$\\
-$\rightarrow$ The steeper the slope of $c_i(t)$, the less $T_1$ changes with scale variations\\
-$\rightarrow$ If $T_1$ is invariant to scale variation in $c_i(t)$, then so is $f_i(t)$
-
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
-$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
-$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
-other criteria such as song-noise separation or diversity between features
-
- Nonlinear operations can be used to detach representations from graded physical
-stimulus (to fasciliate categorical behavioral decision-making?):\\
-1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
-$\rightarrow$ Closely following the AM of the acoustic stimulus\\
-2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
-$\rightarrow$ More decorrelated representation, compared to prior stages\\
-3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
-$\rightarrow$ Trading a graded scale for two or more categorical states\\
-4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
-$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
-5) Categorical behavioral decision-making requires further nonlinearities\\
-$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
-initiation of one behavior over another is categorical (e.g. approach/stay)

 \section{Discriminating species-specific song\\patterns in feature space}

 \section{Conclusions \& outlook}

+\textbf{Song recognition pathway: Grasshopper vs. model:}\\
+The model pathway includes a rather large number of Gabor kernels compared to
+the 15 to 20 ascending neurons in the grasshopper auditory
+system~(\bcite{stumpner1991auditory}). 
+
+
 \textbf{Definition of invariance (general, systemic):}\\
 Invariance = Property of a system to maintain a stable output with respect to a
 set of relevant input parameters (variation to be represented) but irrespective
@@ -743,9 +748,26 @@ $\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $
 - Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
 $\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR

+\textbf{Thresh-LP: Implication for intensity invariance:}\\
+- Role of song periodicity for feature representation!

-The model pathway includes a rather large number of Gabor kernels compared to
-the 15 to 20 ascending neurons in the grasshopper auditory
-system~(\bcite{stumpner1991auditory}). 
+- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
+$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
+$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
+other criteria such as song-noise separation or diversity between features
+
+- Nonlinear operations can be used to detach representations from graded physical
+stimulus (to fasciliate categorical behavioral decision-making?):\\
+1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
+$\rightarrow$ Closely following the AM of the acoustic stimulus\\
+2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
+$\rightarrow$ More decorrelated representation, compared to prior stages\\
+3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
+$\rightarrow$ Trading a graded scale for two or more categorical states\\
+4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
+$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
+5) Categorical behavioral decision-making requires further nonlinearities\\
+$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
+initiation of one behavior over another is categorical (e.g. approach/stay)

 \end{document}