Syncing to home.
This commit is contained in:
172
main.tex
172
main.tex
@@ -612,109 +612,114 @@ without ($\sca=0$) song component $\soc(t)$
|
||||
\text{SNR}(\sca)\,=\,\frac{\xvar}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1, \qquad \svar\,=\,\nvar\,=\,1
|
||||
\label{eq:toy_snr}
|
||||
\end{equation}
|
||||
which depends quadratically on $\sca$ if $\soc(t)$ and $\noc(t)$ are
|
||||
uncorrelated~($\soc(t)\perp\noc(t)$). In summary, the combination of
|
||||
logarithmic compression and adaptation allows for the equalization of different
|
||||
sufficiently large song scales, which is essential for intensity-invariant song
|
||||
representation. However, this mechanism is unable to recover songs that have
|
||||
already sunken below the noise floor, which emphasizes the importance of a
|
||||
sufficiently high SNR at the intial reception of the signal for reliable song
|
||||
recognition.
|
||||
which depends quadratically on $\sca$ if $\soc(t)\perp\noc(t)$. Overall, the
|
||||
combination of logarithmic compression and adaptation allows for the
|
||||
equalization of different sufficiently large song scales, which is essential
|
||||
for intensity-invariant song representation. However, this mechanism is unable
|
||||
to recover songs that have already sunken below the noise floor, which
|
||||
emphasizes the importance of a sufficiently high SNR at the intial reception of
|
||||
the signal for reliable song recognition.
|
||||
|
||||
\subsection{Thresholding nonlinearity \& temporal averaging}
|
||||
|
||||
The second key mechanism for the emergence of intensity invariance along the
|
||||
model pathway takes place during the transformation of the kernel responses
|
||||
$c_i(t)$ over the binary responses $b_i(t)$ into the finalized features
|
||||
$f_i(t)$. This mechanism is mediated by the thresholding nonlinearity $\nl$. By
|
||||
$f_i(t)$. Kernel response $c_i(t)$ quantifies the degree of similarity between
|
||||
kernel $k_i(t)$ and the preprocessed signal $\adapt(t)$. The thresholding
|
||||
nonlinearity $\nl$ categorizes the value of $c_i(t)$ at every time point $t$
|
||||
into "relevant" ($c_i(t)>\thr$, $b_i(t)=1$) and "irrelevant" ($c_i(t)\leq\thr$,
|
||||
$b_i(t)=0$) response values
|
||||
|
||||
By passing $c_i(t)$ through the thresholding
|
||||
nonlinearity $\nl$, its amplitude values are binned
|
||||
into one of two categories~(Eq.\,\ref{eq:binary}).
|
||||
|
||||
: $c_i(t)>\thr$
|
||||
|
||||
|
||||
|
||||
|
||||
This mechanism is mediated by the thresholding nonlinearity $\nl$. By
|
||||
passing $c_i(t)$ through the thresholding nonlinearity~(Eq.\,\ref{eq:binary}),
|
||||
its probability density within some observed time interval $T$ is split around
|
||||
threshold value $\thr$ into two complementary parts:
|
||||
its probability density $\pc$ within some observed time interval $T$ is split
|
||||
around threshold value $\thr$ into two complementary parts:
|
||||
\begin{equation}
|
||||
\int_{\thr}^{+\infty} \pc\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} \pc\,dc_i\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc_i\,=\,1
|
||||
\label{eq:pdf_split}
|
||||
\end{equation}
|
||||
Due to the normalization of $\pc$, the semi-definite integral over the
|
||||
right-sided part of the split $\pc$ is the ratio of time $T_1$ during which
|
||||
$c_i(t)$ exceeds $\thr$ within the total time $T$. If the subsequent lowpass
|
||||
filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
|
||||
averaging over a suitable time interval
|
||||
The right-sided part of the split $\pc$ corresponds to time $T_1$ where
|
||||
$c_i(t)>\thr$, while the left-sided part corresponds to time $T_0=T-T_1$ where
|
||||
$c_i(t)\leq\thr$. The semi-definite integral over the right-sided part of $\pc$
|
||||
represents the ratio of time $T_1$ to total time $T$ because the indefinite
|
||||
integral of a probability density is normalized to 1. Following the
|
||||
thresholding nonlinearity, the resulting binary responses $b_i(t)$ are
|
||||
lowpass-filtered~(Eq.\,\ref{eq:lowpass}) to obtain $f_i(t)$, which can be
|
||||
approximated as temporal averaging over a suitable time interval
|
||||
$\tlp>\frac{1}{\fc}$
|
||||
\begin{equation}
|
||||
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
|
||||
\label{eq:feat_avg}
|
||||
\end{equation}
|
||||
feature $f_i(t)$ likewise represents a ratio of time $T_1$ during which
|
||||
$b_i(t)$ is 1 within the total averaging interval $\tlp$. Since $b_i(t)$ is 1
|
||||
where $c_i(t)>\thr$, $f_i(t)$ relates to the probability density of $c_i(t)$ by
|
||||
Feature $f_i(t)$
|
||||
|
||||
If the lowpass
|
||||
filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
|
||||
averaging over a suitable time interval $\tlp>\frac{1}{\fc}$, then $f_i(t)$ can
|
||||
be linked to a similar temporal ratio
|
||||
% \begin{equation}
|
||||
% f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
|
||||
% \label{eq:feat_avg}
|
||||
% \end{equation}
|
||||
of time $T_1$ during which $b_i(t)$ is 1 within the total averaging interval
|
||||
$\tlp$. Therefore, the value of $f_i(t)$ at every time point $t$ approximately
|
||||
signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
|
||||
corresponding averaging interval $\tlp$:
|
||||
\begin{equation}
|
||||
f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
|
||||
\label{eq:feat_prop}
|
||||
\end{equation}
|
||||
Therefore, the value of $f_i(t)$ at every time point $t$ approximately
|
||||
signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
|
||||
corresponding averaging interval $\tlp$. Accordingly, the combination of
|
||||
thresholding nonlinearity and temporal averaging constitutes a remapping of a
|
||||
quantity that encodes temporal similarity between signal $\adapt(t)$ and kernel
|
||||
$k_i(t)$ into a quantity that encodes a duty cycle with respect to $\thr$.
|
||||
In a sense, $f_i(t)$ resembles a duty cycle of some sort, which quantifies
|
||||
purely temporal relations in the structure of $c_i(t)$ with no regard for
|
||||
precise amplitude values apart from their relation to $\thr$.
|
||||
|
||||
Accordingly, the combination of
|
||||
thresholding nonlinearity and temporal averaging constitutes a remapping of the
|
||||
amplitude-encoding quantity $c_i(t)$ into the duty cycle-encoding quantity
|
||||
$f_i(t)$ by binning graded amplitude values into one of two categorical states.
|
||||
This deliberate loss of precise amplitude information is the key to intensity
|
||||
invariance of the finalized features, as different scales of $c_i(t)$ can
|
||||
result in similar $T_1$ segments depending on the magnitude of the derivative
|
||||
of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$ crosses
|
||||
$\thr$.
|
||||
Accordingly, a substantial amount of information about the degree of similarity
|
||||
between signal $\adapt(t)$ and kernel $k_i(t)$ that is contained in $c_i(t)$ is
|
||||
lost during its transformation into $f_i(t)$. Instead, $f_i(t)$ only retains
|
||||
information about the temporal relation of $c_i(t)$ relative to $\thr$
|
||||
|
||||
|
||||
This loss of amplitude information is the key to the intensity
|
||||
invariance of $f_i(t)$: For a given $\thr$, different scales of $c_i(t)$ can
|
||||
still result in similar $T_1$ segments depending on the magnitude of the
|
||||
derivative of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$
|
||||
crosses $\thr$. The steeper the slope of $c_i(t)$ around the threshold
|
||||
crossings, the less $T_1$ changes with scale variations.
|
||||
|
||||
|
||||
|
||||
In a sense, $f_i(t)$ resembles a duty
|
||||
cycle of some sort, as it quantifies purely temporal relations in the structure
|
||||
of $c_i(t)$ with no regard for precise amplitude values apart from their
|
||||
relation to $\thr$. This near-complete loss of amplitude information is the key
|
||||
to the intensity invariance of $f_i(t)$: For a given $\thr$, different scales
|
||||
of $c_i(t)$ can still result in similar $T_1$ segments depending on the
|
||||
magnitude of the derivative of $c_i(t)$ in temporal proximity to time points at
|
||||
which $c_i(t)$ crosses $\thr$. The steeper the slope of $c_i(t)$ around the
|
||||
threshold crossings, the less $T_1$ changes with scale variations.
|
||||
|
||||
\textbf{Implication for intensity invariance:}\\
|
||||
- Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
|
||||
template waveform $k_i(t)$ and signal $\adapt(t)$ centered at time point $t$\\
|
||||
$\rightarrow$ Based on amplitudes on a graded scale
|
||||
|
||||
- Feature $f_i(t)$ quantifies the probability that amplitudes of $c_i(t)$
|
||||
exceed threshold value $\thr$ within interval $\tlp$ around time point $t$\\
|
||||
$\rightarrow$ Based on binned amplitudes corresponding to one of two categorical states
|
||||
$\rightarrow$ Deliberate loss of precise amplitude information\\
|
||||
$\rightarrow$ Emphasis on temporal structure (ratio of $T_1$ over $\tlp$)
|
||||
|
||||
- Thresholding of $c_i(t)$ and subsequent temporal averaging of $b_i(t)$ to
|
||||
obtain $f_i(t)$ constitutes a remapping of an amplitude-encoding quantity into a
|
||||
duty cycle-encoding quantity, mediated by threshold function $\nl$
|
||||
|
||||
- Different scales of $c_i(t)$ can result in similar $T_1$ segments depending
|
||||
on the magnitude of the derivative of $c_i(t)$ in temporal proximity to time
|
||||
points at which $c_i(t)$ crosses threshold value $\thr$\\
|
||||
$\rightarrow$ The steeper the slope of $c_i(t)$, the less $T_1$ changes with scale variations\\
|
||||
$\rightarrow$ If $T_1$ is invariant to scale variation in $c_i(t)$, then so is $f_i(t)$
|
||||
|
||||
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
||||
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
||||
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
||||
other criteria such as song-noise separation or diversity between features
|
||||
|
||||
- Nonlinear operations can be used to detach representations from graded physical
|
||||
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
||||
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
||||
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
||||
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
||||
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
||||
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
||||
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
||||
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
||||
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
||||
5) Categorical behavioral decision-making requires further nonlinearities\\
|
||||
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
||||
initiation of one behavior over another is categorical (e.g. approach/stay)
|
||||
|
||||
\section{Discriminating species-specific song\\patterns in feature space}
|
||||
|
||||
\section{Conclusions \& outlook}
|
||||
|
||||
\textbf{Song recognition pathway: Grasshopper vs. model:}\\
|
||||
The model pathway includes a rather large number of Gabor kernels compared to
|
||||
the 15 to 20 ascending neurons in the grasshopper auditory
|
||||
system~(\bcite{stumpner1991auditory}).
|
||||
|
||||
|
||||
\textbf{Definition of invariance (general, systemic):}\\
|
||||
Invariance = Property of a system to maintain a stable output with respect to a
|
||||
set of relevant input parameters (variation to be represented) but irrespective
|
||||
@@ -743,9 +748,26 @@ $\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $
|
||||
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
|
||||
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
|
||||
|
||||
\textbf{Thresh-LP: Implication for intensity invariance:}\\
|
||||
- Role of song periodicity for feature representation!
|
||||
|
||||
The model pathway includes a rather large number of Gabor kernels compared to
|
||||
the 15 to 20 ascending neurons in the grasshopper auditory
|
||||
system~(\bcite{stumpner1991auditory}).
|
||||
- Suggests a relatively simple rule for optimal choice of threshold value $\thr$:\\
|
||||
$\rightarrow$ Find amplitude $c_i$ that maximizes absolute derivative of $c_i(t)$ over time\\
|
||||
$\rightarrow$ Optimal with respect to intensity invariance of $f_i(t)$, not necessarily for
|
||||
other criteria such as song-noise separation or diversity between features
|
||||
|
||||
- Nonlinear operations can be used to detach representations from graded physical
|
||||
stimulus (to fasciliate categorical behavioral decision-making?):\\
|
||||
1) Capture sufficiently precise amplitude information: $\env(t)$, $\adapt(t)$\\
|
||||
$\rightarrow$ Closely following the AM of the acoustic stimulus\\
|
||||
2) Quantify relevant stimulus properties on a graded scale: $c_i(t)$\\
|
||||
$\rightarrow$ More decorrelated representation, compared to prior stages\\
|
||||
3) Nonlinearity: Distinguish between "relevant vs irrelevant" values: $b_i(t)$\\
|
||||
$\rightarrow$ Trading a graded scale for two or more categorical states\\
|
||||
4) Represent stimulus properties under relevance constraint: $f_i(t)$\\
|
||||
$\rightarrow$ Graded again but highly decorrelated from the acoustic stimulus\\
|
||||
5) Categorical behavioral decision-making requires further nonlinearities\\
|
||||
$\rightarrow$ Parameters of a behavioral response may be graded (e.g. approach speed),
|
||||
initiation of one behavior over another is categorical (e.g. approach/stay)
|
||||
|
||||
\end{document}
|
||||
Reference in New Issue
Block a user