Began writing results :)
This commit is contained in:
114
main.tex
114
main.tex
@@ -86,9 +86,12 @@
|
||||
\newcommand{\thr}{\Theta_i} % Step function threshold value
|
||||
\newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function
|
||||
|
||||
% Math shorthands - Minor symbols and helpers:
|
||||
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song signal variance
|
||||
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise signal variance
|
||||
% Math shorthands - Intensity invariance analysis:
|
||||
\newcommand{\soc}{s} % Song component of synthetic mixture
|
||||
\newcommand{\noc}{\eta} % Noise component of synthetic mixture
|
||||
\newcommand{\sca}{\alpha} % Multiplicative scale of song component
|
||||
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song component variance
|
||||
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
|
||||
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
|
||||
\newcommand{\pclp}{p(c_i,\,\tlp)} % Probability density (lowpass interval)
|
||||
|
||||
@@ -387,7 +390,7 @@ sigmoidal response curve over logarithmically compressed intensity
|
||||
levels~(\bcite{suga1960peripheral}; \bcite{gollisch2002energy}). In the model
|
||||
pathway, logarithmic compression is achieved by conversion to decibel scale
|
||||
\begin{equation}
|
||||
\db(t)\,=\,10\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,\max[\env(t)]
|
||||
\db(t)\,=\,10\,\cdot\,\dec \frac{\env(t)}{\dbref}, \qquad \dbref\,=\,\max\big[\env(t)\big]
|
||||
\label{eq:log}
|
||||
\end{equation}
|
||||
relative to the maximum intensity $\dbref$ of the signal envelope $\env(t)$.
|
||||
@@ -492,12 +495,12 @@ the left of the two central lobes (odd kernels).
|
||||
\label{tab:gabor_phases}
|
||||
\end{table}
|
||||
\FloatBarrier
|
||||
These four groups of Gabor kernels allow for the extraction of different types
|
||||
of signal features, such as the presence of peaks (even, $+$), troughs (even,
|
||||
$-$), onsets (odd, $+$), and offsets (odd, $-$) at various time scales.
|
||||
These four major groups of Gabor kernels allow for the extraction of different
|
||||
types of signal features, such as the presence of peaks (even, $+$), troughs
|
||||
(even, $-$), onsets (odd, $+$), and offsets (odd, $-$) at various time scales.
|
||||
Following the convolutional template matching, each kernel-specific response
|
||||
$c_i(t)$ is passed through a shifted Heaviside step-function $\nl$ with threshold
|
||||
value $\thr$ to obtain a binary response
|
||||
$c_i(t)$ is passed through a shifted Heaviside step-function $\nl$ with
|
||||
threshold value $\thr$ to obtain a binary response
|
||||
\begin{equation}
|
||||
b_i(t,\,\thr)\,=\,\begin{cases}
|
||||
\;1, \quad c_i(t)\,>\,\thr\\
|
||||
@@ -528,6 +531,10 @@ can be read out by a simple linear classifier.
|
||||
\includegraphics[width=\textwidth]{figures/fig_feat_stages.pdf}
|
||||
\caption{\textbf{Representations of a song of \textit{O. rufipes} during
|
||||
the feature extraction stage.}
|
||||
Different colors indicate Gabor kernels with different
|
||||
lobe number $\kn$ and sign, with lighter colors for higher
|
||||
$\kn$~($1\,\leq\,\kn\,\leq\,4$; both $+$ and $-$ per $\kn$;
|
||||
two kernel widths $\kw$ of $4\,$ms and $32\,$ms per sign).
|
||||
\textbf{a}:~Kernel-specific filter responses.
|
||||
\textbf{b}:~Binary responses.
|
||||
\textbf{c}:~Finalized features.
|
||||
@@ -536,55 +543,62 @@ can be read out by a simple linear classifier.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\section{Two mechanisms driving the emergence of intensity-invariant song representation}
|
||||
\section{Two mechanisms driving the emergence of intensity-invariant song representations}
|
||||
|
||||
\textbf{Definition of invariance (general, systemic):}\\
|
||||
Invariance = Property of a system to maintain a stable output with respect to a
|
||||
set of relevant input parameters (variation to be represented) but irrespective
|
||||
of one or more other parameters (variation to be discarded)
|
||||
$\rightarrow$ Selective input-output decorrelation
|
||||
% Still missing the SNR analysis. Should be able to write around it for now.
|
||||
The robustness of song recognition is tied to the degree of intensity
|
||||
invariance of the finalized feature representation. Ideally, the values of each
|
||||
feature should depend only on the relative amplitude dynamics of the song
|
||||
pattern but not on the overall intensity level of the song. In the grasshopper,
|
||||
the emergence of intensity-invariant representations along the song recognition
|
||||
pathway likely is a distributed process that involves different neuronal
|
||||
populations, which raises the question of what the essential computational
|
||||
mechanisms are that drive this process. Within the model pathway, we identified
|
||||
two key mechanisms that render the song representation more invariant to
|
||||
variations in baseline intensity. The two mechanisms each comprise a nonlinear
|
||||
signal transformation followed by a linear signal transformation but differ in
|
||||
the specific operations and the neural substrate involved, as outlined in the
|
||||
following sections.
|
||||
|
||||
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
|
||||
Intensity invariance = Time scale-selective sensitivity to certain faster
|
||||
amplitude dynamics (song waveform, small-scale AM) and simultaneous
|
||||
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
|
||||
large-scale AM, current overall intensity level)\\
|
||||
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
|
||||
output will be a flat line
|
||||
\subsection{Logarithmic compression \& spike-frequency adaptation}
|
||||
|
||||
\subsection{Logarithmic scaling \& spike-frequency adaptation}
|
||||
|
||||
Envelope $\env(t)$ $\xrightarrow{\text{dB}}$ Logarithmic $\db(t)$ $\xrightarrow{\hp}$ Adapted $\adapt(t)$
|
||||
|
||||
- Rewrite signal envelope $\env(t)$ (Eq.\,\ref{eq:env}) as a synthetic mixture:\\
|
||||
1) Song signal $s(t)$ ($\svar=1$) with variable multiplicative scale $\alpha\geq0$\\
|
||||
2) Fixed-scale additive noise $\eta(t)$ ($\nvar=1$)
|
||||
%
|
||||
The first emergence of intensity invariance along the model pathway occurs
|
||||
during the preprocessing stage, in the transition from the signal envelope
|
||||
$\env(t)$ to the logarithmically scaled envelope $\db(t)$ and then to the
|
||||
intensity-adapted envelope $\adapt(t)$. In order to disentangle the interplay
|
||||
of logarithmic compression and adaptation, we can rewrite
|
||||
$\env(t)$~(Eq.\,\ref{eq:env}) as synthetic mixture
|
||||
\begin{equation}
|
||||
\env(t)\,=\,\alpha\,\cdot\,s(t)\,+\,\eta(t),\qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
|
||||
\env(t)\,=\,\sca\,\cdot\,\soc(t)\,+\,\noc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
|
||||
\label{eq:toy_env}
|
||||
\end{equation}
|
||||
%
|
||||
- Signal-to-noise ratio (SNR): Ratio of variances of synthetic mixture
|
||||
$\env(t)$ with ($\alpha>0$) and without ($\alpha=0$) song signal $s(t)$, assuming $s(t)\perp\eta(t)$
|
||||
%
|
||||
of a song component $\soc(t)$ with variable multiplicative scale $\sca\geq0$
|
||||
and a fixed-scale noise component $\noc(t)$. Both $\soc(t)$ and $\noc(t)$ are
|
||||
assumed to have unit variance~($\svar=\nvar=1$). If $\soc(t)$ and $\noc(t)$ are
|
||||
uncorrelated~($\soc(t)\perp\noc(t)$), the signal-to-noise ratio (SNR) of the
|
||||
synthetic $\env(t)$ with ($\sca>0$) and without ($\sca=0$) song component
|
||||
$\soc(t)$ is given by
|
||||
\begin{equation}
|
||||
\text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1
|
||||
\label{eq:toy_snr}
|
||||
\end{equation}
|
||||
%
|
||||
\textbf{Logarithmic component:}\\
|
||||
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
|
||||
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
|
||||
%
|
||||
When simplifying the decibel transformation~(Eq.\,\ref{eq:log}), the logarithmically
|
||||
scaled envelope $\db(t)$ can be expressed as a sum of two logarithmic terms
|
||||
\begin{equation}
|
||||
\begin{split}
|
||||
\db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
|
||||
&=\,\log \frac{\alpha}{\dbref}\,+\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
|
||||
&=\,\log \frac{\alpha}{\dbref}\,+\,\log \left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right]
|
||||
\end{split}
|
||||
\label{eq:toy_log}
|
||||
\end{equation}
|
||||
%
|
||||
|
||||
|
||||
|
||||
|
||||
\textbf{Logarithmic component:}\\
|
||||
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
|
||||
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
|
||||
|
||||
$\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\
|
||||
$\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\
|
||||
$\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\
|
||||
@@ -597,7 +611,7 @@ interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$)
|
||||
%
|
||||
\begin{equation}
|
||||
\begin{split}
|
||||
\adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log b_ig[s(t)\,+\,\frac{\eta(t)}{\alpha}b_ig]
|
||||
\adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log\left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right]
|
||||
\end{split}
|
||||
\label{eq:toy_highpass}
|
||||
\end{equation}
|
||||
@@ -715,6 +729,20 @@ initiation of one behavior over another is categorical (e.g. approach/stay)
|
||||
|
||||
\section{Conclusions \& outlook}
|
||||
|
||||
\textbf{Definition of invariance (general, systemic):}\\
|
||||
Invariance = Property of a system to maintain a stable output with respect to a
|
||||
set of relevant input parameters (variation to be represented) but irrespective
|
||||
of one or more other parameters (variation to be discarded)
|
||||
$\rightarrow$ Selective input-output decorrelation
|
||||
|
||||
\textbf{Definition of intensity invariance (context of neurons and songs):}\\
|
||||
Intensity invariance = Time scale-selective sensitivity to certain faster
|
||||
amplitude dynamics (song waveform, small-scale AM) and simultaneous
|
||||
insensitivity to slower, more sustained amplitude dynamics (transient baseline,
|
||||
large-scale AM, current overall intensity level)\\
|
||||
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
|
||||
output will be a flat line
|
||||
|
||||
The model pathway includes a rather large number of Gabor kernels compared to
|
||||
the 15 to 20 ascending neurons in the grasshopper auditory
|
||||
system~(\bcite{stumpner1991auditory}).
|
||||
|
||||
Reference in New Issue
Block a user