Syncing to home.

This commit is contained in:
j-hartling
2026-02-27 16:10:14 +01:00
parent cc701a09f8
commit 1f61a4c70e
7 changed files with 179 additions and 157 deletions

172
main.tex
View File

@@ -612,109 +612,114 @@ without ($\sca=0$) song component $\soc(t)$
\text{SNR}(\sca)\,=\,\frac{\xvar}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1, \qquad \svar\,=\,\nvar\,=\,1
\label{eq:toy_snr}
\end{equation}
which depends quadratically on $\sca$ if $\soc(t)$ and $\noc(t)$ are
uncorrelated~($\soc(t)\perp\noc(t)$). In summary, the combination of
logarithmic compression and adaptation allows for the equalization of different
sufficiently large song scales, which is essential for intensity-invariant song
representation. However, this mechanism is unable to recover songs that have
already sunken below the noise floor, which emphasizes the importance of a
sufficiently high SNR at the intial reception of the signal for reliable song
recognition.
which depends quadratically on $\sca$ if $\soc(t)\perp\noc(t)$. Overall, the
combination of logarithmic compression and adaptation allows for the
equalization of different sufficiently large song scales, which is essential
for intensity-invariant song representation. However, this mechanism is unable
to recover songs that have already sunken below the noise floor, which
emphasizes the importance of a sufficiently high SNR at the intial reception of
the signal for reliable song recognition.
\subsection{Thresholding nonlinearity \& temporal averaging}
The second key mechanism for the emergence of intensity invariance along the
model pathway takes place during the transformation of the kernel responses
$c_i(t)$ over the binary responses $b_i(t)$ into the finalized features
$f_i(t)$. This mechanism is mediated by the thresholding nonlinearity $\nl$. By
$f_i(t)$. Kernel response $c_i(t)$ quantifies the degree of similarity between
kernel $k_i(t)$ and the preprocessed signal $\adapt(t)$. The thresholding
nonlinearity $\nl$ categorizes the value of $c_i(t)$ at every time point $t$
into "relevant" ($c_i(t)>\thr$, $b_i(t)=1$) and "irrelevant" ($c_i(t)\leq\thr$,
$b_i(t)=0$) response values
By passing $c_i(t)$ through the thresholding
nonlinearity $\nl$, its amplitude values are binned
into one of two categories~(Eq.\,\ref{eq:binary}).
: $c_i(t)>\thr$
This mechanism is mediated by the thresholding nonlinearity $\nl$. By
passing $c_i(t)$ through the thresholding nonlinearity~(Eq.\,\ref{eq:binary}),
its probability density within some observed time interval $T$ is split around
threshold value $\thr$ into two complementary parts:
its probability density $\pc$ within some observed time interval $T$ is split
around threshold value $\thr$ into two complementary parts:
\begin{equation}
\int_{\thr}^{+\infty} \pc\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} \pc\,dc_i\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc_i\,=\,1
\label{eq:pdf_split}
\end{equation}
Due to the normalization of $\pc$, the semi-definite integral over the
right-sided part of the split $\pc$ is the ratio of time $T_1$ during which
$c_i(t)$ exceeds $\thr$ within the total time $T$. If the subsequent lowpass
filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
averaging over a suitable time interval
The right-sided part of the split $\pc$ corresponds to time $T_1$ where
$c_i(t)>\thr$, while the left-sided part corresponds to time $T_0=T-T_1$ where
$c_i(t)\leq\thr$. The semi-definite integral over the right-sided part of $\pc$
represents the ratio of time $T_1$ to total time $T$ because the indefinite
integral of a probability density is normalized to 1. Following the
thresholding nonlinearity, the resulting binary responses $b_i(t)$ are
lowpass-filtered~(Eq.\,\ref{eq:lowpass}) to obtain $f_i(t)$, which can be
approximated as temporal averaging over a suitable time interval
$\tlp>\frac{1}{\fc}$
\begin{equation}
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
\label{eq:feat_avg}
\end{equation}
feature $f_i(t)$ likewise represents a ratio of time $T_1$ during which
$b_i(t)$ is 1 within the total averaging interval $\tlp$. Since $b_i(t)$ is 1
where $c_i(t)>\thr$, $f_i(t)$ relates to the probability density of $c_i(t)$ by
Feature $f_i(t)$
If the lowpass
filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
averaging over a suitable time interval $\tlp>\frac{1}{\fc}$, then $f_i(t)$ can
be linked to a similar temporal ratio
% \begin{equation}
% f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
% \label{eq:feat_avg}
% \end{equation}
of time $T_1$ during which $b_i(t)$ is 1 within the total averaging interval
$\tlp$. Therefore, the value of $f_i(t)$ at every time point $t$ approximately
signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
corresponding averaging interval $\tlp$:
\begin{equation}
f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
\label{eq:feat_prop}
\end{equation}
Therefore, the value of $f_i(t)$ at every time point $t$ approximately
signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
corresponding averaging interval $\tlp$. Accordingly, the combination of
thresholding nonlinearity and temporal averaging constitutes a remapping of a
quantity that encodes temporal similarity between signal $\adapt(t)$ and kernel
$k_i(t)$ into a quantity that encodes a duty cycle with respect to $\thr$.
In a sense, $f_i(t)$ resembles a duty cycle of some sort, which quantifies
purely temporal relations in the structure of $c_i(t)$ with no regard for
precise amplitude values apart from their relation to $\thr$.
Accordingly, the combination of
thresholding nonlinearity and temporal averaging constitutes a remapping of the
amplitude-encoding quantity $c_i(t)$ into the duty cycle-encoding quantity
$f_i(t)$ by binning graded amplitude values into one of two categorical states.
This deliberate loss of precise amplitude information is the key to intensity
invariance of the finalized features, as different scales of $c_i(t)$ can
result in similar $T_1$ segments depending on the magnitude of the derivative
of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$ crosses
$\thr$.
Accordingly, a substantial amount of information about the degree of similarity
between signal $\adapt(t)$ and kernel $k_i(t)$ that is contained in $c_i(t)$ is
lost during its transformation into $f_i(t)$. Instead, $f_i(t)$ only retains
information about the temporal relation of $c_i(t)$ relative to $\thr$
This loss of amplitude information is the key to the intensity
invariance of $f_i(t)$: For a given $\thr$, different scales of $c_i(t)$ can
still result in similar $T_1$ segments depending on the magnitude of the
derivative of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$
crosses $\thr$. The steeper the slope of $c_i(t)$ around the threshold
crossings, the less $T_1$ changes with scale variations.
In a sense, $f_i(t)$ resembles a duty
cycle of some sort, as it quantifies purely temporal relations in the structure
of $c_i(t)$ with no regard for precise amplitude values apart from their
relation to $\thr$. This near-complete loss of amplitude information is the key
to the intensity invariance of $f_i(t)$: For a given $\thr$, different scales
of $c_i(t)$ can still result in similar $T_1$ segments depending on the
magnitude of the derivative of $c_i(t)$ in temporal proximity to time points at
which $c_i(t)$ crosses $\thr$. The steeper the slope of $c_i(t)$ around the
threshold crossings, the less $T_1$ changes with scale variations.
\textbf{Implication for intensity invariance:}\\
- Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
template waveform $k_i(t)$ and signal $\adapt(t)$ centered at time point $t$\\
$\rightarrow$ Based on amplitudes on a graded scale
- Feature $f_i(t)$ quantifies the probability that amplitudes of $c_i(t)$
exceed threshold value $\thr$ within interval $\tlp$ around time point $t$\\
$\rightarrow$ Based on binned amplitudes corresponding to one of two categorical states
$\rightarrow$ Deliberate loss of precise amplitude information\\
$\rightarrow$ Emphasis on temporal structure (ratio of $T_1$ over $\tlp$)
- Thresholding of $c_i(t)$ and subsequent temporal averaging of $b_i(t)$ to
obtain $f_i(t)$ constitutes a remapping of an amplitude-encoding quantity into a
duty cycle-encoding quantity, mediated by threshold function $\nl$
- Different scales of $c_i(t)$ can result in similar $T_1$ segments depending
on the magnitude of the derivative of $c_i(t)$ in temporal proximity to time
points at which $c_i(t)$ crosses threshold value $\thr$\\
$\rightarrow$ The steeper the slope of $c_i(t)$, the less $T_1$ changes with scale variations\\
$\rightarrow$ If $T_1$ is invariant to scale variation in $c_i(t)$, then so is $f_i(t)$
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
other criteria such as song-noise separation or diversity between features
- Nonlinear operations can be used to detach representations from graded physical
stimulus (to fasciliate categorical behavioral decision-making?):\\
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
$\rightarrow$ More decorrelated representation, compared to prior stages\\
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
$\rightarrow$ Trading a graded scale for two or more categorical states\\
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
5) Categorical behavioral decision-making requires further nonlinearities\\
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
initiation of one behavior over another is categorical (e.g. approach/stay)
\section{Discriminating species-specific song\\patterns in feature space}
\section{Conclusions \& outlook}
\textbf{Song recognition pathway: Grasshopper vs. model:}\\
The model pathway includes a rather large number of Gabor kernels compared to
the 15 to 20 ascending neurons in the grasshopper auditory
system~(\bcite{stumpner1991auditory}).
\textbf{Definition of invariance (general, systemic):}\\
Invariance = Property of a system to maintain a stable output with respect to a
set of relevant input parameters (variation to be represented) but irrespective
@@ -743,9 +748,26 @@ $\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
\textbf{Thresh-LP: Implication for intensity invariance:}\\
- Role of song periodicity for feature representation!
The model pathway includes a rather large number of Gabor kernels compared to
the 15 to 20 ascending neurons in the grasshopper auditory
system~(\bcite{stumpner1991auditory}).
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
other criteria such as song-noise separation or diversity between features
- Nonlinear operations can be used to detach representations from graded physical
stimulus (to fasciliate categorical behavioral decision-making?):\\
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
$\rightarrow$ More decorrelated representation, compared to prior stages\\
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
$\rightarrow$ Trading a graded scale for two or more categorical states\\
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
5) Categorical behavioral decision-making requires further nonlinearities\\
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
initiation of one behavior over another is categorical (e.g. approach/stay)
\end{document}