Wrote field data methods. Added some very general at the beginning of the methods section.

This commit is contained in:
j-hartling
2026-05-14 14:37:02 +02:00
parent 688f153bef
commit cbd0af7a5f
9 changed files with 172 additions and 164 deletions

View File

@@ -283,7 +283,7 @@ grasshopper auditory pathway, from the initial reception of sound waves up to
the generation of a high-dimensional, time-varying feature representation that
is suitable for species-specific song recognition. We provide a side-by-side
account of the known physiological processing steps and their functional
approximation by basic mathematical operations. We then elaborate on two key
approximation by basic mathematical operations. We then elaborate on the key
mechanisms that drive the emergence of intensity-invariant song representations
within the auditory pathway.
@@ -317,6 +317,14 @@ within the auditory pathway.
% $\rightarrow$ Abstract, simplify, formalize $\rightarrow$ Functional model framework
\section{Methods}
% This maybe does not quite fit here, but it is the most general part of the
% methods and applies throughout the whole section, so I put it here for now.
All modeling, data analysis, and data visualization was performed in
Python~3.12.3 except for the pathway overview~(Fig.\,\ref{fig:pathway}), which
was assembled in Inkscape~1.2. The code base for the model pathway is available
as the \textit{thunderhopper} package, version 1.0, on PyPi. Any audio data was
inspected and edited with the help of the \textit{audian} package, version 2.4,
on PyPi.
\subsection{Functional model of the grasshopper song recognition pathway}
@@ -617,7 +625,7 @@ mollis}~(Tab.\,\ref{tab:species_list}).
\label{tab:species_list}
\end{table}
\subsubsection{Generating synthetic input signals}
\subsubsection{Generation of synthetic input signals}
Different processing steps along the model pathway were tested for intensity
invariance by generating synthetic input signals $x(t)$ of varying intensity,
@@ -668,6 +676,7 @@ according to either Eq.\,\ref{eq:noiseless} in the noiseless case or
Eq.\,\ref{eq:noisy} in the noisy case.
\subsubsection{Quantifying signal intensity across representations}
\label{sec:intensity_measures}
All intensity measures were calculated over a manually labeled segment within
each song. Segments always excluded the first and last few syllables to allow
@@ -726,6 +735,26 @@ such, the ratio of intensity measures is referred to as SNR in the following.
\subsection{Field data-based analysis of the model pathway}
Field recordings were taken on a meadow in the vicinity of the University of
Tübingen, Germany, during the day in August~2024. All recordings were taken
using a custom hand-held microphone array that was assembled from eight
omnidirectional AV-TEFE TCM141 condenser microphones. The microphones were
arranged in a linear configuration with a spacing of 30\,cm between adjacent
microphones and oriented in the same direction along the axis of the array. All
microphones were connected to a custom 8-channel amplificitation and
digitization system based on a Teensy 4.1 microcontroller with real-time clock
and microSD card storage. Recordings were written to the microSD card
in~\textit{.wav}~format with a sampling rate of 96\,kHz and an amplitude scale
in arbitrary units. The microphone array was held at a height of approximately
30\,cm above the ground, which was slightly above the height of most
surrounding vegetation and at the same height as the singing grasshopper. The
array was moved as close to the grasshopper as possible without interrupting
its song production, which amounts to an approximate offset distance of 10\,cm
between the animal and the leading microphone. Care was taken to maintain a
stable position and height of the microphone array during recording. The
resulting recordings were then processed through the model pathway and analysed
according to the procedure described in Section~\ref{sec:intensity_measures}.
\section{Results}
\subsection{Mechanisms driving the emergence of intensity invariance}
@@ -809,7 +838,7 @@ more robust input representation and higher input SNR.
$\env(t)$ for different $\sca$.
\textbf{a}:~Noiseless case.
\textbf{b}:~Noisy case.
\textbf{Bottom}:~Intensity metrics over a range of $\sca$.
\textbf{Bottom}:~Intensity measures over a range of $\sca$.
\textbf{c}:~Noiseless case: Standard deviations $\sigma_x$ of
$\filt(t)$ and $\env(t)$.
\textbf{d}:~Noisy case: Ratios of $\sigma_x$ of $\filt(t)$ and
@@ -948,7 +977,7 @@ is a recurring phenomenon that is further addressed in the following sections.
$\db(t)$, and $\adapt(t)$ for different $\sca$.
\textbf{a}:~Noiseless case.
\textbf{b}:~Noisy case.
\textbf{Bottom}:~Intensity metrics over a range of $\sca$.
\textbf{Bottom}:~Intensity measures over a range of $\sca$.
\textbf{c}:~Noiseless case: Standard deviations $\sigma_x$
of $\env(t)$, $\db(t)$, and $\adapt(t)$.
\textbf{d}:~Noisy case: Ratios of $\sigma_x$ of $\env(t)$,
@@ -1225,15 +1254,15 @@ compression step~(Fig.\,\ref{fig:pipeline_short}).
For this analysis, input $\raw(t)$ --- including both song component $\soc(t)$
and noise component $\noc(t)$ --- was rescaled and processed throughout all
steps of the model pathway~(Fig.\,\ref{fig:pipeline_full}a) up to the feature
set $f_i(t)$. As before, the standard deviation was used as intensity metric
set $f_i(t)$. As before, the standard deviation was used as intensity measure
for each resulting representation except $b_i(t)$ and $f_i(t)$. For $f_i(t)$,
the average feature value $\muf$ was used, while $b_i(t)$ was omitted from the
analysis. Plotting each intensity metric over
analysis. Plotting each intensity measure over
$\sca$~(Fig.\,\ref{fig:pipeline_full}b) reinforces many of the previous
observations. For ease of visualization, the kernel-specific curves for
$c_i(t)$ and $f_i(t)$ were summarized by their median. Representations prior to
logarithmic compression --- $\filt(t)$ and $\env(t)$ --- show a linear increase
of the intensity metric for larger $\sca$ on a double-logarithmic scale.
of the intensity measure for larger $\sca$ on a double-logarithmic scale.
Representations after logarithmic compression --- $\db(t)$, $\adapt(t)$, and
$c_i(t)$ --- are the first to reach a saturation regime and do so at
approximately the same $\sca$ because they are separated only by linear
@@ -1243,7 +1272,7 @@ that of $c_i(t)$, which suggests that the second mechanism of thresholding and
temporal averaging can indeed improve intensity invariance beyond the first
mechanism of logarithmic compression and adaptation. The difference in
saturation points is best illustrated based on the ratio of each intensity
metric to the respective pure-noise reference
measure to the respective pure-noise reference
value~(Fig.\,\ref{fig:pipeline_full}d). However, compressing $f_i(t)$ into a
median across $k_i(t)$ conceils many kernel-specific details. It is therefore
necessary to consider the development of each $f_i(t)$ over $\sca$
@@ -1299,13 +1328,13 @@ in principle, work together towards an intensity-invariant song representation.
\textbf{a}:~Example representations of $\filt(t)$,
$\env(t)$, $\db(t)$, $\adapt(t)$, $c_i(t)$, and $f_i(t)$
for different $\sca$.
\textbf{b}:~Intensity metrics over $\sca$. For $c_i(t)$
\textbf{b}:~Intensity measures over $\sca$. For $c_i(t)$
and $f_i(t)$, the median over kernels is shown. Dots
indicate $95\,\%$ curve span for $\db(t)$, $\adapt(t)$,
$c_i(t)$, and $f_i(t)$.
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
$f_i(t)$ over $\sca$.
\textbf{d}:~Ratios of intensity metrics to the respective
\textbf{d}:~Ratios of intensity measures to the respective
reference value for input $\raw(t)=\noc(t)$. For $c_i(t)$
and $f_i(t)$, the median over kernel-specific ratios is
shown.
@@ -1329,7 +1358,7 @@ $\adapt(t)$ is merely a highpass filtered version of $\env(t)$; and $\db(t)$ is
missing entirely~(Fig.\,\ref{fig:pipeline_short}a). As expected, all
representations prior to the thresholding nonlinearity $\nl$ --- $\filt(t)$,
$\env(t)$, $\adapt(t)$, and $c_i(t)$ --- show a linear increase of the
intensity metric for larger $\sca$, while $f_i(t)$ is the only representation
intensity measure for larger $\sca$, while $f_i(t)$ is the only representation
to reach a saturation regime~(Fig.\,\ref{fig:pipeline_short}bd). The
saturated $\muf$ are distributed over a much broader range of values than in
the previous analysis~(Fig.\,\ref{fig:pipeline_short}c). Intriguingly, the
@@ -1382,12 +1411,12 @@ guaranteed simply by disabling logarithmic compression.
\textbf{a}:~Example representations of $\filt(t)$,
$\env(t)$, $\adapt(t)$, $c_i(t)$, and $f_i(t)$ for
different $\sca$.
\textbf{b}:~Intensity metrics over $\sca$. For $c_i(t)$
\textbf{b}:~Intensity measures over $\sca$. For $c_i(t)$
and $f_i(t)$, the median over kernels is shown. Dots
indicate $95\,\%$ curve span for $f_i(t)$.
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
$f_i(t)$ over $\sca$.
\textbf{d}:~Ratios of intensity metrics to the respective
\textbf{d}:~Ratios of intensity measures to the respective
reference value for input $\raw(t)=\noc(t)$. For $c_i(t)$
and $f_i(t)$, the median over kernel-specific ratios is
shown.
@@ -1416,14 +1445,14 @@ $d$ from the sender, ranging from $10\,$cm to $220\,$cm with intervals of
$30\,$cm between microphones. The precise value of $\sca$ that corresponds to a
given $d$ cannot be determined in a straightforward manner, but $\sca$ is
expected to be inversely proportional to $d$ based on the inverse-square law of
sound propagation. All intensity metrics and ratios thereof were hence plotted
sound propagation. All intensity measures and ratios thereof were hence plotted
over $1/d$ on a double-logarithmic scale, which is insofar comparable to
previous analyses that a decade on the $1/d$ axis corresponds to a decade on
the $\sca$ axis. To complicate matters further, the $1/d$ axis is sampled too
sparsely to determine saturation points as before based on the $95\,\%$ curve
span. Instead, one has to rely on the slope of the curve to assess if, and at
which $1/d$, a given representation reaches a saturation regime. Bearing these
limitations in mind, the intensity metrics of each representation over
limitations in mind, the intensity measures of each representation over
$1/d$~(Fig.\,\ref{fig:pipeline_field}b) follow a pattern that is consistent
with the results of the previous simulation-based
analysis~(Fig.\,\ref{fig:pipeline_full}b): The standard deviations of
@@ -1439,7 +1468,7 @@ $d=10\,$cm corresponds to a value of $\sca$ between 10 and 20 based on
comparison with the simulation-based analysis~(Fig.\,\ref{fig:pipeline_full}b).
The saturated $\muf$ are distributed over a comparably narrow range of values,
which could in parts be a property of the songs of \textit{P. parallelus}~(see
also Fig.\,\ref{fig:thresh-lp_species}bc). The ratios of each intensity metric
also Fig.\,\ref{fig:thresh-lp_species}bc). The ratios of each intensity measure
to the respective pure-noise reference value are not aligned across
representations~(Fig.\,\ref{fig:pipeline_field}d) or
kernels~(Fig.\,\ref{fig:pipeline_field}ef) but serve to consolidate the
@@ -1468,11 +1497,11 @@ distances~(Fig.\,\ref{fig:pipeline_field}a, bottom row).
\textbf{a}:~$\filt(t)$, $\env(t)$, $\db(t)$, $\adapt(t)$,
$c_i(t)$, and $f_i(t)$ at each $d$. A noise segment from
the same recording is shown for reference.
\textbf{b}:~Intensity metrics over $d$. For $c_i(t)$
\textbf{b}:~Intensity measures over $d$. For $c_i(t)$
and $f_i(t)$, the median over kernels is shown.
\textbf{c}:~Average value $\mu_{f_i}$ of each feature
$f_i(t)$ over $d$.
\textbf{d}:~Ratios of intensity metrics to the respective
\textbf{d}:~Ratios of intensity measures to the respective
value obtained from the noise reference. For $c_i(t)$ and
$f_i(t)$, the median over kernel-specific ratios is shown.
\textbf{e}:~Ratios of standard deviation $\sigma_{c_i}$ of