Managed 1st half of results text, began with 2nd half.
This commit is contained in:
205
main.tex
205
main.tex
@@ -29,6 +29,9 @@
|
||||
mincitenames=1
|
||||
]{biblatex}
|
||||
\addbibresource{cite.bib}
|
||||
%\bibdata
|
||||
%\bibstyle
|
||||
%\citation
|
||||
|
||||
\title{Emergent intensity invariance in a physiologically inspired model of the grasshopper auditory system}
|
||||
\author{Jona Hartling, Jan Benda}
|
||||
@@ -82,7 +85,7 @@
|
||||
\newcommand{\fwrh}{\text{FWRH}} % Gaussian full-width at relative height
|
||||
\newcommand{\off}{\beta_0} % Offset for linear frequency approximation
|
||||
|
||||
% Math shorthands - Threshold nonlinearity:
|
||||
% Math shorthands - Thresholding nonlinearity:
|
||||
\newcommand{\thr}{\Theta_i} % Step function threshold value
|
||||
\newcommand{\nl}{H(c_i\,-\,\thr)} % Shifted Heaviside step function
|
||||
|
||||
@@ -90,6 +93,7 @@
|
||||
\newcommand{\soc}{s} % Song component of synthetic mixture
|
||||
\newcommand{\noc}{\eta} % Noise component of synthetic mixture
|
||||
\newcommand{\sca}{\alpha} % Multiplicative scale of song component
|
||||
\newcommand{\xvar}{\sigma_{x}^{2}} % Variance of synthetic mixture
|
||||
\newcommand{\svar}{\sigma_{\text{s}}^{2}} % Song component variance
|
||||
\newcommand{\nvar}{\sigma_{\eta}^{2}} % Noise component variance
|
||||
\newcommand{\pc}{p(c_i,\,T)} % Probability density (general interval)
|
||||
@@ -509,7 +513,7 @@ threshold value $\thr$ to obtain a binary response
|
||||
\label{eq:binary}
|
||||
\end{equation}
|
||||
which can be thought of as a categorization into "relevant" and "irrelevant"
|
||||
response values. In the grasshopper, these threshold nonlinearities might
|
||||
response values. In the grasshopper, these thresholding nonlinearities might
|
||||
either be part of the processing within the ascending neurons or take place
|
||||
further downstream~(SOURCE). Finally, the responses of the ascending neurons
|
||||
are assumed to be integrated somewhere in the supraesophageal
|
||||
@@ -543,47 +547,38 @@ can be read out by a simple linear classifier.
|
||||
\end{figure}
|
||||
\FloatBarrier
|
||||
|
||||
\section{Two mechanisms driving the emergence of intensity-invariant song representations}
|
||||
\section{Two mechanisms driving the emergence of intensity-invariant song representation}
|
||||
|
||||
% Still missing the SNR analysis. Should be able to write around it for now.
|
||||
The robustness of song recognition is tied to the degree of intensity
|
||||
invariance of the finalized feature representation. Ideally, the values of each
|
||||
feature should depend only on the relative amplitude dynamics of the song
|
||||
pattern but not on the overall intensity level of the song. In the grasshopper,
|
||||
the emergence of intensity-invariant representations along the song recognition
|
||||
pattern but not on the overall intensity of the song. In the grasshopper, the
|
||||
emergence of intensity-invariant representations along the song recognition
|
||||
pathway likely is a distributed process that involves different neuronal
|
||||
populations, which raises the question of what the essential computational
|
||||
mechanisms are that drive this process. Within the model pathway, we identified
|
||||
two key mechanisms that render the song representation more invariant to
|
||||
variations in baseline intensity. The two mechanisms each comprise a nonlinear
|
||||
signal transformation followed by a linear signal transformation but differ in
|
||||
the specific operations and the neural substrate involved, as outlined in the
|
||||
following sections.
|
||||
intensity variations. The two mechanisms each comprise a nonlinear signal
|
||||
transformation followed by a linear signal transformation but differ in the
|
||||
specific operations involved, as outlined in the following sections.
|
||||
|
||||
\subsection{Logarithmic compression \& spike-frequency adaptation}
|
||||
|
||||
The first emergence of intensity invariance along the model pathway occurs
|
||||
during the preprocessing stage, in the transition from the signal envelope
|
||||
$\env(t)$ to the logarithmically scaled envelope $\db(t)$ and then to the
|
||||
intensity-adapted envelope $\adapt(t)$. In order to disentangle the interplay
|
||||
of logarithmic compression and adaptation, we can rewrite
|
||||
$\env(t)$~(Eq.\,\ref{eq:env}) as synthetic mixture
|
||||
The first notable emergence of intensity invariance along the model pathway
|
||||
occurs during the transformation of the signal envelope $\env(t)$ into the
|
||||
logarithmically scaled envelope $\db(t)$ and then into the intensity-adapted
|
||||
envelope $\adapt(t)$. In order to disentangle the interplay of logarithmic
|
||||
compression and adaptation, $\env(t)$ can be rewritten as a synthetic mixture
|
||||
\begin{equation}
|
||||
\env(t)\,=\,\sca\,\cdot\,\soc(t)\,+\,\noc(t), \qquad \env(t)\,>\,0\enspace\forall\enspace t\,\in\,\mathbb{R}
|
||||
\label{eq:toy_env}
|
||||
\end{equation}
|
||||
of a song component $\soc(t)$ with variable multiplicative scale $\sca\geq0$
|
||||
and a fixed-scale noise component $\noc(t)$. Both $\soc(t)$ and $\noc(t)$ are
|
||||
assumed to have unit variance~($\svar=\nvar=1$). If $\soc(t)$ and $\noc(t)$ are
|
||||
uncorrelated~($\soc(t)\perp\noc(t)$), the signal-to-noise ratio (SNR) of the
|
||||
synthetic $\env(t)$ with ($\sca>0$) and without ($\sca=0$) song component
|
||||
$\soc(t)$ is given by
|
||||
\begin{equation}
|
||||
\text{SNR}\,=\,\frac{\sigma_{s+\eta}^{2}}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1
|
||||
\label{eq:toy_snr}
|
||||
\end{equation}
|
||||
When simplifying the decibel transformation~(Eq.\,\ref{eq:log}), the logarithmically
|
||||
scaled envelope $\db(t)$ can be expressed as a sum of two logarithmic terms
|
||||
assumed to have unit variance. By conversion of $\env(t)$ to decibel
|
||||
scale~(Eq.\,\ref{eq:log}), $\sca$ turns from a multiplicative scale in linear
|
||||
space into an additive term, or offset, in logarithmic space
|
||||
\begin{equation}
|
||||
\begin{split}
|
||||
\db(t)\,&=\,\log \frac{\alpha\,\cdot\,s(t)\,+\,\eta(t)}{\dbref}\\
|
||||
@@ -591,99 +586,90 @@ scaled envelope $\db(t)$ can be expressed as a sum of two logarithmic terms
|
||||
\end{split}
|
||||
\label{eq:toy_log}
|
||||
\end{equation}
|
||||
|
||||
|
||||
|
||||
|
||||
\textbf{Logarithmic component:}\\
|
||||
- Simplify decibel transformation (Eq.\,\ref{eq:log}) and apply to synthetic $\env(t)$\\
|
||||
- Isolate scale $\alpha$ and reference $\dbref$ using logarithm product/quotient laws
|
||||
|
||||
$\rightarrow$ In log-space, a multiplicative scaling factor becomes additive\\
|
||||
$\rightarrow$ Allows for the separation of song signal $s(t)$ and its scale $\alpha$\\
|
||||
$\rightarrow$ Introduces scaling of noise term $\eta(t)$ by the inverse of $\alpha$\\
|
||||
$\rightarrow$ Normalization by $\dbref$ applies equally to all terms (no individual effects)
|
||||
|
||||
\textbf{Adaptation component:}\\
|
||||
- Highpass filter over $\db(t)$ (Eq.\,\ref{eq:highpass}) can
|
||||
be approximated as subtraction of the local signal offset within a suitable time
|
||||
interval $\thp$ ($0 \ll \thp < \frac{1}{\fc}$)
|
||||
%
|
||||
which allows for its separation from $\soc(t)$ but introduces a scaling of
|
||||
$\noc(t)$ by the inverse of $\sca$. The subsequent
|
||||
highpass-filtering~(Eq.\,\ref{eq:highpass}) of $\db(t)$ can then be
|
||||
approximated as a subtraction of the local offset within a suitable time
|
||||
interval $0 \ll \thp < \frac{1}{\fc}$:
|
||||
\begin{equation}
|
||||
\begin{split}
|
||||
\adapt(t)\,\approx\,\db(t)\,-\,\log \frac{\alpha}{\dbref}\,=\,\log\left[s(t)\,+\,\frac{\eta(t)}{\alpha}\right]
|
||||
\end{split}
|
||||
\label{eq:toy_highpass}
|
||||
\end{equation}
|
||||
%
|
||||
\textbf{Implication for intensity invariance:}\\
|
||||
- Logarithmic scaling is essential for equalizing different song intensities\\
|
||||
$\rightarrow$ Intensity information can be manipulated more easily when in form
|
||||
of a signal offset in log-space than a multiplicative scale in linear space
|
||||
|
||||
- Scale $\alpha$ can only be redistributed, not entirely eliminated from $\adapt(t)$\\
|
||||
$\rightarrow$ Turn initial scaling of song $s(t)$ by $\alpha$ into scaling of noise $\eta(t)$ by $\frac{1}{\alpha}$
|
||||
|
||||
- Capability to compensate for intensity variations, i.e. selective amplification
|
||||
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
|
||||
$\alpha\gg1$: Attenuation of $\eta(t)$ term $\rightarrow$ $s(t)$ dominates $\adapt(t)$\\
|
||||
$\alpha\approx1$ Negligible effect on $\eta(t)$ term $\rightarrow$ $\adapt(t)=\log[s(t)+\eta(t)]$\\
|
||||
$\alpha\ll1$: Amplification of $\eta(t)$ term $\rightarrow$ $\eta(t)$ dominates $\adapt(t)$\\
|
||||
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
|
||||
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
|
||||
|
||||
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
|
||||
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
|
||||
|
||||
\subsection{Threshold nonlinearity \& temporal averaging}
|
||||
|
||||
Convolved $c_i(t)$ $\xrightarrow{\nl}$ Binary $b_i(t)$ $\xrightarrow{\lp}$ Feature $f_i(t)$
|
||||
|
||||
\textbf{Thresholding component:}\\
|
||||
- Within an observed time interval $T$, $c_i(t)$ follows probability density $\pc$\\
|
||||
- Within $T$, $c_i(t)$ exceeds threshold value $\thr$ for time $T_1$ ($T_1+T_0=T$)\\
|
||||
- Threshold $\nl$ splits $\pc$ around $\thr$ in two complementary parts
|
||||
%
|
||||
This means that $\sca$ cannot be entirely eliminated from $\adapt(t)$, only
|
||||
redistributed between $\soc(t)$ and $\noc(t)$. In consequence, if $\sca$ is
|
||||
sufficiently large ($\sca\gg1$), $\noc(t)$ is attenuated to the point of being
|
||||
negligible, so that $\adapt(t)$ represents $\soc(t)$ in a scale-free manner. If
|
||||
$\soc(t)$ and $\noc(t)$ are at similar scales ($\sca\approx1$), $\adapt(t)$
|
||||
largely resembles $\db(t)$. However, if $\sca$ is sufficiently small
|
||||
($\sca\ll1$), $\noc(t)$ masks $\soc(t)$ even after the intensity adaptation.
|
||||
Therefore, the effective intensity invariance of $\adapt(t)$ relative to
|
||||
$\env(t)$ is limited by the initial scaling of $\soc(t)$ relative to $\noc(t)$;
|
||||
that is, the signal-to-noise ratio (SNR) of $\env(t)$ with ($\sca>0$) and
|
||||
without ($\sca=0$) song component $\soc(t)$
|
||||
\begin{equation}
|
||||
\int_{\thr}^{+\infty} p(c_i,T)\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} p(c_i,T)\,dc_i\,=\,\frac{T_1}{T}
|
||||
\text{SNR}(\sca)\,=\,\frac{\xvar}{\nvar}\,=\,\frac{\alpha^{2}\,\cdot\,\svar\,+\,\nvar}{\nvar}\,=\,\alpha^{2}\,+\,1, \qquad \svar\,=\,\nvar\,=\,1
|
||||
\label{eq:toy_snr}
|
||||
\end{equation}
|
||||
which depends quadratically on $\sca$ if $\soc(t)$ and $\noc(t)$ are
|
||||
uncorrelated~($\soc(t)\perp\noc(t)$). In summary, the combination of
|
||||
logarithmic compression and adaptation allows for the equalization of different
|
||||
sufficiently large song scales, which is essential for intensity-invariant song
|
||||
representation. However, this mechanism is unable to recover songs that have
|
||||
already sunken below the noise floor, which emphasizes the importance of a
|
||||
sufficiently high SNR at the intial reception of the signal for reliable song
|
||||
recognition.
|
||||
|
||||
\subsection{Thresholding nonlinearity \& temporal averaging}
|
||||
|
||||
The second key mechanism for the emergence of intensity invariance along the
|
||||
model pathway takes place during the transformation of the kernel responses
|
||||
$c_i(t)$ over the binary responses $b_i(t)$ into the finalized features
|
||||
$f_i(t)$. This mechanism is mediated by the thresholding nonlinearity $\nl$. By
|
||||
passing $c_i(t)$ through the thresholding nonlinearity~(Eq.\,\ref{eq:binary}),
|
||||
its probability density within some observed time interval $T$ is split around
|
||||
threshold value $\thr$ into two complementary parts:
|
||||
\begin{equation}
|
||||
\int_{\thr}^{+\infty} \pc\,dc_i\,=\,1\,-\,\int_{-\infty}^{\thr} \pc\,dc_i\,=\,\frac{T_1}{T}, \qquad \infint \pc\,dc_i\,=\,1
|
||||
\label{eq:pdf_split}
|
||||
\end{equation}
|
||||
%
|
||||
$\rightarrow$ Semi-definite integral over right-sided portion of split $\pc$ gives ratio
|
||||
of time $T_1$ where $c_i(t)>\thr$ to total time $T$ due to normalization of $\pc$
|
||||
%
|
||||
Due to the normalization of $\pc$, the semi-definite integral over the
|
||||
right-sided part of the split $\pc$ is the ratio of time $T_1$ during which
|
||||
$c_i(t)$ exceeds $\thr$ within the total time $T$. If the subsequent lowpass
|
||||
filter~(Eq.\,\ref{eq:lowpass}) over $b_i(t)$ is approximated as temporal
|
||||
averaging over a suitable time interval
|
||||
$\tlp>\frac{1}{\fc}$
|
||||
\begin{equation}
|
||||
\infint \pc\,dc_i\,=\,1
|
||||
\label{eq:pdf}
|
||||
\end{equation}
|
||||
%
|
||||
\textbf{Averaging component:}\\
|
||||
- Lowpass filter over binary response $b_i(t)$ (Eq.\,\ref{eq:lowpass}) can be
|
||||
approximated as temporal averaging over a suitable time interval $\tlp$ ($\tlp > \frac{1}{\fc}$)\\
|
||||
- Within $\tlp$, $b_i(t)$ takes a value of 1 ($c_i(t)>\thr$) for time $T_1$ ($T_1+T_0=\tlp$)
|
||||
%
|
||||
\begin{equation}
|
||||
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}
|
||||
f_i(t)\,\approx\,\frac{1}{\tlp} \int_{t}^{t\,+\,\tlp} b_i(\tau)\,d\tau\,=\,\frac{T_1}{\tlp}, \qquad b_i(t)\,\in\,\{0,\,1\}
|
||||
\label{eq:feat_avg}
|
||||
\end{equation}
|
||||
%
|
||||
$\rightarrow$ Temporal averaging over $b_i(t)\in[0,1]$ (Eq.\,\ref{eq:binary}) gives
|
||||
ratio of time $T_1$ where $c_i(t)>\thr$ to total averaging interval $\tlp$\\
|
||||
$\rightarrow$ Feature $f_i(t)$ approximately represents supra-threshold fraction of $\tlp$
|
||||
|
||||
\textbf{Combined result:}\\
|
||||
- Feature $f_i(t)$ can be linked to the distribution of $c_i(t)$ using Eqs.\,\ref{eq:pdf_split} \& \ref{eq:feat_avg}
|
||||
%
|
||||
feature $f_i(t)$ likewise represents a ratio of time $T_1$ during which
|
||||
$b_i(t)$ is 1 within the total averaging interval $\tlp$. Since $b_i(t)$ is 1
|
||||
where $c_i(t)>\thr$, $f_i(t)$ relates to the probability density of $c_i(t)$ by
|
||||
\begin{equation}
|
||||
f_i(t)\,\approx\,\int_{\thr}^{+\infty} \pclp\,dc_i\,=\,P(c_i\,>\,\thr,\,\tlp)
|
||||
\label{eq:feat_prop}
|
||||
\end{equation}
|
||||
%
|
||||
$\rightarrow$ Because the integral over a probability density is a cumulative
|
||||
probability, the value of feature $f_i(t)$ (temporal compression of $b_i(t)$)
|
||||
at every time point $t$ signifies the probability that convolution output
|
||||
$c_i(t)$ exceeds the threshold value $\thr$ during the corresponding averaging
|
||||
interval $\tlp$
|
||||
Therefore, the value of $f_i(t)$ at every time point $t$ approximately
|
||||
signifies the cumulative probability that $c_i(t)$ exceeds $\thr$ during the
|
||||
corresponding averaging interval $\tlp$. Accordingly, the combination of
|
||||
thresholding nonlinearity and temporal averaging constitutes a remapping of a
|
||||
quantity that encodes temporal similarity between signal $\adapt(t)$ and kernel
|
||||
$k_i(t)$ into a quantity that encodes a duty cycle with respect to $\thr$.
|
||||
|
||||
Accordingly, the combination of
|
||||
thresholding nonlinearity and temporal averaging constitutes a remapping of the
|
||||
amplitude-encoding quantity $c_i(t)$ into the duty cycle-encoding quantity
|
||||
$f_i(t)$ by binning graded amplitude values into one of two categorical states.
|
||||
This deliberate loss of precise amplitude information is the key to intensity
|
||||
invariance of the finalized features, as different scales of $c_i(t)$ can
|
||||
result in similar $T_1$ segments depending on the magnitude of the derivative
|
||||
of $c_i(t)$ in temporal proximity to time points at which $c_i(t)$ crosses
|
||||
$\thr$.
|
||||
|
||||
|
||||
|
||||
|
||||
\textbf{Implication for intensity invariance:}\\
|
||||
- Convolution output $c_i(t)$ quantifies temporal similarity between amplitudes of
|
||||
@@ -743,6 +729,21 @@ large-scale AM, current overall intensity level)\\
|
||||
$\rightarrow$ Without time scale selectivity, any fully intensity-invariant
|
||||
output will be a flat line
|
||||
|
||||
|
||||
\textbf{Log-HP: Implication for intensity invariance:}\\
|
||||
- Logarithmic scaling is essential for equalizing different song intensities\\
|
||||
$\rightarrow$ Intensity information can be manipulated more easily when in form
|
||||
of a signal offset in log-space than a multiplicative scale in linear space
|
||||
|
||||
- Capability to compensate for intensity variations, i.e. selective amplification
|
||||
of output $\adapt(t)$ relative to input $\env(t)$, is limited by input SNR (Eq.\,\ref{eq:toy_snr}):\\
|
||||
$\rightarrow$ Ability to equalize between different sufficiently large scales of $s(t)$\\
|
||||
$\rightarrow$ Inability to recover $s(t)$ when initially masked by noise floor $\eta(t)$
|
||||
|
||||
- Logarithmic scaling emphasizes small amplitudes (song onsets, noise floor) \\
|
||||
$\rightarrow$ Recurring trade-off: Equalizing signal intensity vs preserving initial SNR
|
||||
|
||||
|
||||
The model pathway includes a rather large number of Gabor kernels compared to
|
||||
the 15 to 20 ascending neurons in the grasshopper auditory
|
||||
system~(\bcite{stumpner1991auditory}).
|
||||
|
||||
Reference in New Issue
Block a user