Wrote field data methods. Added some very general at the beginning of the methods section.
This commit is contained in:
65
main.tex
65
main.tex
@@ -283,7 +283,7 @@ grasshopper auditory pathway, from the initial reception of sound waves up to
|
||||
the generation of a high-dimensional, time-varying feature representation that
|
||||
is suitable for species-specific song recognition. We provide a side-by-side
|
||||
account of the known physiological processing steps and their functional
|
||||
approximation by basic mathematical operations. We then elaborate on two key
|
||||
approximation by basic mathematical operations. We then elaborate on the key
|
||||
mechanisms that drive the emergence of intensity-invariant song representations
|
||||
within the auditory pathway.
|
||||
|
||||
@@ -317,6 +317,14 @@ within the auditory pathway.
|
||||
% $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
|
||||
|
||||
\section{Methods}
|
||||
% This maybe does not quite fit here, but it is the most general part of the
|
||||
% methods and applies throughout the whole section, so I put it here for now.
|
||||
All modeling, data analysis, and data visualization was performed in
|
||||
Python~3.12.3 except for the pathway overview~(Fig.\,\ref{fig:pathway}), which
|
||||
was assembled in Inkscape~1.2. The code base for the model pathway is available
|
||||
as the \textit{thunderhopper} package, version 1.0, on PyPi. Any audio data was
|
||||
inspected and edited with the help of the \textit{audian} package, version 2.4,
|
||||
on PyPi.
|
||||
|
||||
\subsection{Functional model of the grasshopper song recognition pathway}
|
||||
|
||||
@@ -617,7 +625,7 @@ mollis}~(Tab.\,\ref{tab:species_list}).
|
||||
\label{tab:species_list}
|
||||
\end{table}
|
||||
|
||||
\subsubsection{Generating synthetic input signals}
|
||||
\subsubsection{Generation of synthetic input signals}
|
||||
|
||||
Different processing steps along the model pathway were tested for intensity
|
||||
invariance by generating synthetic input signals $x(t)$ of varying intensity,
|
||||
@@ -668,6 +676,7 @@ according to either Eq.\,\ref{eq:noiseless} in the noiseless case or
|
||||
Eq.\,\ref{eq:noisy} in the noisy case.
|
||||
|
||||
\subsubsection{Quantifying signal intensity across representations}
|
||||
\label{sec:intensity_measures}
|
||||
|
||||
All intensity measures were calculated over a manually labeled segment within
|
||||
each song. Segments always excluded the first and last few syllables to allow
|
||||
@@ -726,6 +735,26 @@ such, the ratio of intensity measures is referred to as SNR in the following.
|
||||
|
||||
\subsection{Field data-based analysis of the model pathway}
|
||||
|
||||
Field recordings were taken on a meadow in the vicinity of the University of
|
||||
Tübingen, Germany, during the day in August~2024. All recordings were taken
|
||||
using a custom hand-held microphone array that was assembled from eight
|
||||
omnidirectional AV-TEFE TCM141 condenser microphones. The microphones were
|
||||
arranged in a linear configuration with a spacing of 30\,cm between adjacent
|
||||
microphones and oriented in the same direction along the axis of the array. All
|
||||
microphones were connected to a custom 8-channel amplificitation and
|
||||
digitization system based on a Teensy 4.1 microcontroller with real-time clock
|
||||
and microSD card storage. Recordings were written to the microSD card
|
||||
in~\textit{.wav}~format with a sampling rate of 96\,kHz and an amplitude scale
|
||||
in arbitrary units. The microphone array was held at a height of approximately
|
||||
30\,cm above the ground, which was slightly above the height of most
|
||||
surrounding vegetation and at the same height as the singing grasshopper. The
|
||||
array was moved as close to the grasshopper as possible without interrupting
|
||||
its song production, which amounts to an approximate offset distance of 10\,cm
|
||||
between the animal and the leading microphone. Care was taken to maintain a
|
||||
stable position and height of the microphone array during recording. The
|
||||
resulting recordings were then processed through the model pathway and analysed
|
||||
according to the procedure described in Section~\ref{sec:intensity_measures}.
|
||||
|
||||
\section{Results}
|
||||
|
||||
\subsection{Mechanisms driving the emergence of intensity invariance}
|
||||
@@ -809,7 +838,7 @@ more robust input representation and higher input SNR.
|
||||
$\env(t)$ for different $\sca$.
|
||||
\textbf{a}:~Noiseless case.
|
||||
\textbf{b}:~Noisy case.
|
||||
\textbf{Bottom}:~Intensity metrics over a range of $\sca$.
|
||||
\textbf{Bottom}:~Intensity measures over a range of $\sca$.
|
||||
\textbf{c}:~Noiseless case: Standard deviations $\sigma_x$ of
|
||||
$\filt(t)$ and $\env(t)$.
|
||||
\textbf{d}:~Noisy case: Ratios of $\sigma_x$ of $\filt(t)$ and
|
||||
@@ -948,7 +977,7 @@ is a recurring phenomenon that is further addressed in the following sections.
|
||||
$\db(t)$, and $\adapt(t)$ for different $\sca$.
|
||||
\textbf{a}:~Noiseless case.
|
||||
\textbf{b}:~Noisy case.
|
||||
\textbf{Bottom}:~Intensity metrics over a range of $\sca$.
|
||||
\textbf{Bottom}:~Intensity measures over a range of $\sca$.
|
||||
\textbf{c}:~Noiseless case: Standard deviations $\sigma_x$
|
||||
of $\env(t)$, $\db(t)$, and $\adapt(t)$.
|
||||
\textbf{d}:~Noisy case: Ratios of $\sigma_x$ of $\env(t)$,
|
||||
@@ -1225,15 +1254,15 @@ compression step~(Fig.\,\ref{fig:pipeline_short}).
|
||||
For this analysis, input $\raw(t)$ --- including both song component $\soc(t)$
|
||||
and noise component $\noc(t)$ --- was rescaled and processed throughout all
|
||||
steps of the model pathway~(Fig.\,\ref{fig:pipeline_full}a) up to the feature
|
||||
set $f_i(t)$. As before, the standard deviation was used as intensity metric
|
||||
set $f_i(t)$. As before, the standard deviation was used as intensity measure
|
||||
for each resulting representation except $b_i(t)$ and $f_i(t)$. For $f_i(t)$,
|
||||
the average feature value $\muf$ was used, while $b_i(t)$ was omitted from the
|
||||
analysis. Plotting each intensity metric over
|
||||
analysis. Plotting each intensity measure over
|
||||
$\sca$~(Fig.\,\ref{fig:pipeline_full}b) reinforces many of the previous
|
||||
observations. For ease of visualization, the kernel-specific curves for
|
||||
$c_i(t)$ and $f_i(t)$ were summarized by their median. Representations prior to
|
||||
logarithmic compression --- $\filt(t)$ and $\env(t)$ --- show a linear increase
|
||||
of the intensity metric for larger $\sca$ on a double-logarithmic scale.
|
||||
of the intensity measure for larger $\sca$ on a double-logarithmic scale.
|
||||
Representations after logarithmic compression --- $\db(t)$, $\adapt(t)$, and
|
||||
$c_i(t)$ --- are the first to reach a saturation regime and do so at
|
||||
approximately the same $\sca$ because they are separated only by linear
|
||||
@@ -1243,7 +1272,7 @@ that of $c_i(t)$, which suggests that the second mechanism of thresholding and
|
||||
temporal averaging can indeed improve intensity invariance beyond the first
|
||||
mechanism of logarithmic compression and adaptation. The difference in
|
||||
saturation points is best illustrated based on the ratio of each intensity
|
||||
metric to the respective pure-noise reference
|
||||
measure to the respective pure-noise reference
|
||||
value~(Fig.\,\ref{fig:pipeline_full}d). However, compressing $f_i(t)$ into a
|
||||
median across $k_i(t)$ conceils many kernel-specific details. It is therefore
|
||||
necessary to consider the development of each $f_i(t)$ over $\sca$
|
||||
@@ -1299,13 +1328,13 @@ in principle, work together towards an intensity-invariant song representation.
|
||||
\textbf{a}:~Example representations of $\filt(t)$,
|
||||
$\env(t)$, $\db(t)$, $\adapt(t)$, $c_i(t)$, and $f_i(t)$
|
||||
for different $\sca$.
|
||||
\textbf{b}:~Intensity metrics over $\sca$. For $c_i(t)$
|
||||
\textbf{b}:~Intensity measures over $\sca$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernels is shown. Dots
|
||||
indicate $95\,\%$ curve span for $\db(t)$, $\adapt(t)$,
|
||||
$c_i(t)$, and $f_i(t)$.
|
||||
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
|
||||
$f_i(t)$ over $\sca$.
|
||||
\textbf{d}:~Ratios of intensity metrics to the respective
|
||||
\textbf{d}:~Ratios of intensity measures to the respective
|
||||
reference value for input $\raw(t)=\noc(t)$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernel-specific ratios is
|
||||
shown.
|
||||
@@ -1329,7 +1358,7 @@ $\adapt(t)$ is merely a highpass filtered version of $\env(t)$; and $\db(t)$ is
|
||||
missing entirely~(Fig.\,\ref{fig:pipeline_short}a). As expected, all
|
||||
representations prior to the thresholding nonlinearity $\nl$ --- $\filt(t)$,
|
||||
$\env(t)$, $\adapt(t)$, and $c_i(t)$ --- show a linear increase of the
|
||||
intensity metric for larger $\sca$, while $f_i(t)$ is the only representation
|
||||
intensity measure for larger $\sca$, while $f_i(t)$ is the only representation
|
||||
to reach a saturation regime~(Fig.\,\ref{fig:pipeline_short}bd). The
|
||||
saturated $\muf$ are distributed over a much broader range of values than in
|
||||
the previous analysis~(Fig.\,\ref{fig:pipeline_short}c). Intriguingly, the
|
||||
@@ -1382,12 +1411,12 @@ guaranteed simply by disabling logarithmic compression.
|
||||
\textbf{a}:~Example representations of $\filt(t)$,
|
||||
$\env(t)$, $\adapt(t)$, $c_i(t)$, and $f_i(t)$ for
|
||||
different $\sca$.
|
||||
\textbf{b}:~Intensity metrics over $\sca$. For $c_i(t)$
|
||||
\textbf{b}:~Intensity measures over $\sca$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernels is shown. Dots
|
||||
indicate $95\,\%$ curve span for $f_i(t)$.
|
||||
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
|
||||
$f_i(t)$ over $\sca$.
|
||||
\textbf{d}:~Ratios of intensity metrics to the respective
|
||||
\textbf{d}:~Ratios of intensity measures to the respective
|
||||
reference value for input $\raw(t)=\noc(t)$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernel-specific ratios is
|
||||
shown.
|
||||
@@ -1416,14 +1445,14 @@ $d$ from the sender, ranging from $10\,$cm to $220\,$cm with intervals of
|
||||
$30\,$cm between microphones. The precise value of $\sca$ that corresponds to a
|
||||
given $d$ cannot be determined in a straightforward manner, but $\sca$ is
|
||||
expected to be inversely proportional to $d$ based on the inverse-square law of
|
||||
sound propagation. All intensity metrics and ratios thereof were hence plotted
|
||||
sound propagation. All intensity measures and ratios thereof were hence plotted
|
||||
over $1/d$ on a double-logarithmic scale, which is insofar comparable to
|
||||
previous analyses that a decade on the $1/d$ axis corresponds to a decade on
|
||||
the $\sca$ axis. To complicate matters further, the $1/d$ axis is sampled too
|
||||
sparsely to determine saturation points as before based on the $95\,\%$ curve
|
||||
span. Instead, one has to rely on the slope of the curve to assess if, and at
|
||||
which $1/d$, a given representation reaches a saturation regime. Bearing these
|
||||
limitations in mind, the intensity metrics of each representation over
|
||||
limitations in mind, the intensity measures of each representation over
|
||||
$1/d$~(Fig.\,\ref{fig:pipeline_field}b) follow a pattern that is consistent
|
||||
with the results of the previous simulation-based
|
||||
analysis~(Fig.\,\ref{fig:pipeline_full}b): The standard deviations of
|
||||
@@ -1439,7 +1468,7 @@ $d=10\,$cm corresponds to a value of $\sca$ between 10 and 20 based on
|
||||
comparison with the simulation-based analysis~(Fig.\,\ref{fig:pipeline_full}b).
|
||||
The saturated $\muf$ are distributed over a comparably narrow range of values,
|
||||
which could in parts be a property of the songs of \textit{P. parallelus}~(see
|
||||
also Fig.\,\ref{fig:thresh-lp_species}bc). The ratios of each intensity metric
|
||||
also Fig.\,\ref{fig:thresh-lp_species}bc). The ratios of each intensity measure
|
||||
to the respective pure-noise reference value are not aligned across
|
||||
representations~(Fig.\,\ref{fig:pipeline_field}d) or
|
||||
kernels~(Fig.\,\ref{fig:pipeline_field}ef) but serve to consolidate the
|
||||
@@ -1468,11 +1497,11 @@ distances~(Fig.\,\ref{fig:pipeline_field}a, bottom row).
|
||||
\textbf{a}:~$\filt(t)$, $\env(t)$, $\db(t)$, $\adapt(t)$,
|
||||
$c_i(t)$, and $f_i(t)$ at each $d$. A noise segment from
|
||||
the same recording is shown for reference.
|
||||
\textbf{b}:~Intensity metrics over $d$. For $c_i(t)$
|
||||
\textbf{b}:~Intensity measures over $d$. For $c_i(t)$
|
||||
and $f_i(t)$, the median over kernels is shown.
|
||||
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
|
||||
$f_i(t)$ over $d$.
|
||||
\textbf{d}:~Ratios of intensity metrics to the respective
|
||||
\textbf{d}:~Ratios of intensity measures to the respective
|
||||
value obtained from the noise reference. For $c_i(t)$ and
|
||||
$f_i(t)$, the median over kernel-specific ratios is shown.
|
||||
\textbf{e}:~Ratios of standard deviation $\sigma_{c_i}$ of
|
||||
|
||||
Reference in New Issue
Block a user