tracking_raab2018/tex/main.tex

% ToDos
%
% Discussion: finish; add section on natural sensory scenes or just link to our other manuscript?
% RK: Bennett (1971) könnte was zu Amplituden von verschiedenen Arten haben.
%
% Introduction: on tracking animals in natural habitats
%
% add: references to Figures 11, 12, 13
%
% We might want to spell out that both EOD and movement activity are representative for other days we recorded? We would need to check that, though.


% DATA TODOS
%
% MAYBE
% what is the fraction of fish we captured while swimming up/downwards? did we miss many?
% -> check x and y distribution of fish tracks to see if it is skewed towards the outer curve
%
% are Apteronotus females more stationary than males?
%
% movement speeds: t-test differences between species, up and down speeds


\documentclass[11pt]{article}

\title{Random tracking title}

\author{Till Raab, ..., ..., Jan Benda}

%%%%% page style --------------------------------
\usepackage[left=20mm,right=20mm,top=20mm,bottom=20mm]{geometry}
\pagestyle{myheadings}
\usepackage{lineno}
\linenumbers
\usepackage{setspace}

% section style ---------------------------------
\usepackage[sf,bf,it,big,clearempty]{titlesec}
\setcounter{secnumdepth}{-1}

% units -----------------------------------------
\usepackage[mediumqspace,Gray,squaren]{SIunits}      % \ohm, \micro

% -----------------------------------------------
\usepackage[english]{babel}
\usepackage{pslatex}   % nice font for pdf file
\usepackage{xcolor}
\usepackage{graphicx}
% \usepackage[font={sf,doublespacing}]{caption}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp}
\usepackage{array}
\usepackage{amsmath}


% figures --------------------------------------
% captions:
\usepackage[format=plain,singlelinecheck=off,labelfont=bf,font={small,sf,doublespacing}]{caption}

% put caption on separate float:
\newcommand{\breakfloat}{\end{figure}\begin{figure}[t]}

% references to panels of a figure within the caption:
\newcommand{\figitem}[1]{\textsf{\bfseries\uppercase{#1}}}
% references to panels of a figure within the text:
\newcommand{\panel}[1]{\textsf{\uppercase{#1}}}
% references to figures:
\newcommand{\fref}[1]{\textup{\ref{#1}}}
\newcommand{\subfref}[2]{\textup{\ref{#1}}\,\panel{#2}}
% references to figures in normal text:
\newcommand{\fig}{Fig.}
\newcommand{\Fig}{Figure}
\newcommand{\figs}{Figs.}
\newcommand{\Figs}{Figures}
\newcommand{\figref}[1]{\fig~\fref{#1}}
\newcommand{\Figref}[1]{\Fig~\fref{#1}}
\newcommand{\figsref}[1]{\figs~\fref{#1}}
\newcommand{\Figsref}[1]{\Figs~\fref{#1}}
\newcommand{\subfigref}[2]{\fig~\subfref{#1}{#2}}
\newcommand{\Subfigref}[2]{\Fig~\subfref{#1}{#2}}
\newcommand{\subfigsref}[2]{\figs~\subfref{#1}{#2}}
\newcommand{\Subfigsref}[2]{\Figs~\subfref{#1}{#2}}
% references to figures within brackets:
\newcommand{\figb}{Fig.}
\newcommand{\figsb}{Figs.}
\newcommand{\figrefb}[1]{\figb~\fref{#1}}
\newcommand{\figsrefb}[1]{\figsb~\fref{#1}}
\newcommand{\subfigrefb}[2]{\figb~\subfref{#1}{#2}}
\newcommand{\subfigsrefb}[2]{\figsb~\subfref{#1}{#2}}

% bibliography ----------------------------------
%\usepackage[round,colon]{natbib}
%\renewcommand{\bibsection}{\section{References}}
%\setlength{\bibsep}{0pt}
%\setlength{\bibhang}{1.5em}
%\bibliographystyle{jneurosci}

\usepackage[breaklinks=true,colorlinks=true,citecolor=blue!30!black,urlcolor=blue!30!black,linkcolor=blue!30!black]{hyperref}

% notes -----------------------------------------
\usepackage{ifthen}

\newcommand{\note}[2][]{{\itshape[\textbf{\ifthenelse{\equal{#1}{}}{}{#1: }#2]}}}
\newcommand{\notejb}[1]{\note[JB]{#1}}
\newcommand{\notejh}[1]{\note[JH]{#1}}
\newcommand{\red}[1]{\textcolor{green!70!black}{#1}}


% -----------------------------------------------
\hyphenation{mo-da-li-ties court-ship}

% -----------------------------------------------

\begin{document}

\maketitle

\begin{abstract}
	work in progress
\end{abstract}

\section{Introduction}
	work in progress

\pagebreak
\section{Materials and Methods}

\subsection{Field site}
The study site was a small stream called Rio Rubiano in the Colombian part of tropical grassland plain “Los Llanos” near San Martin, Province Meta. The recording site was an easy to access part of the Rio Rubiano near the Finca Altamira (3°76’52.70”N, 73°67’53.41”W) which also served as accommodation. The river bed consists of rocks with a diameter ranging from a few to up to 50 cm and the riverbank consists mainly of soil, rocks and the roots of the surrounding vegetation. The very location the recording equipment was installed was a part of the river where the river width is approximately 9\,m and water depth is around 20\,cm (distance between water surface and stone layer on the riverbed). The temperature of the clear water of Rio Rubiano fluctuated between 23 and 27\,°C on a daily basis and showed a conductivity ranging from 2\,${\mu}$S/cm to 7\,${\mu}$S/cm. Data acquisition started in April 2016, i.e. during the start of the rainy season.

\subsection{Field monitiring system}

The recording system used to obtain our date is similar to the one used by [Henninger et al. (2018)] in the Republic of Panamá. It consists of a custom-build 64-channel electrode and amplifier system (npi electronics GmbH, Tamm, Germany) powered by 12\,V car batteries. Signals detected by the electrodes (low-noise headstages embeded in epoxy resin (1 $\times$ gain, 10 $\times$ 5 $\times$ 5\,mm)) were amplified by the main amplifier (100 $\times$ gain, 1st order high-pass filter 100\,Hz, low-pass 10\,kHz)  before being digitalized with 20\,kHz per channel with 16-bit amplitude resolution using a custom build computer with two digital-analog converter cards (PCI-6259, National Instruments, Austin, Texas, USA). Data acquisition and storage for offline analysis were managed by a custom software written in C++ (https://github.com/bendalab/fishgrid). The maximum of 64 electrodes mounted on 8 PVC tubes were arranged in an 8 by 8 electrode grid (50 cm spacing) covering an area of 350$\times$350\,cm. All 64 electrodes were used throughout the whole recording period. Each PVC tube, equipped with 8 electrodes got tied to a rope crossing the river, forming a structure allowing small shifts in electrode distance but being resilient to destruction by rapidly changing environmental factors, i.e. rising water levels after heavy rainfall.


\subsection{Extraction of EOD frequencies}

Different species of wavetype weakly electric fish and even individuals within the same species show individual EOD frequencies. We use fast fourier transformations of 95$\%$ overlapping data snippets of 2 seconds to evaluated the frequency composition of each recording channel, i.e. recording electrode, consistent troughout our recordings. Since EODs of wavetype weakly electric fish are composed of a fundamental discharge frequency and its harminics, we detect peaks in the sum of the powerspectra and assign them in groups of fundamental and harmonic frequencies. Fundamental EOD frequnecies as well as their powers in the powerspectra of the different electrodes and the respective time of detection is stored for further analysis.

\subsection{Tracking of individual EODs}
%In order to track individual EOD frequency traces for each individual recorded we developed an algorithm based on Python3 using two independent signal variables to reliable assign EOD frequency traces.

In order to deduct individual EOD frequency traces from the extraced EOD frequencies we considered various tracking techniques and finally developed  a reliable tracking algorithm based on EOD frequency and field structures comparison.

\subsubsection{$\Delta$-EODf (Electric organ discharge frequency difference)}

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{3}}
  \caption{\label{dEODf} Spectrogram and part of EOD frequency traces from EOD signals of multiple fish including two EOD frequency traces crossing in the course of an EOD frequency rise. Single EOD signals marked in green and yellow form potential candidates to be connected to the EOD signal marked in black. The yellow and green bar represent the respective EOD frequency errors $\Delta f_0$ and $\Delta f_1$, as part of the tracking algorithm.}
\end{figure}

Since, the EOD frequency of wave-type weakly electric fish represent on of the most stable oscillating signals known across natural systems and, thus, keeps stable for long periods of time it seems unambiguous to use this signal parameter as main tracking criterion. However, tracking individual EOD frequencies over long periods of time gives rise to further challenges. When wave-type weakly electric fish produce communication signals, such as EOD frequency rises, EOD frequency traces of different individuals frequently cross each other, sometimes multiple times within a short time window. This specially occurs in larger groups of wave-type weakly electric fish. These EOD frequency trace crossings give rise to several algorithmic problems. First, detecting both peaks during the Powerspectrum analysis in the very moment of the EOD frequency traces crossing often fails resulting in missing datapoints for one EOD frequency trace. As a result, correct tracking of EOD frequency traces, only based on EOD frequency comparison, is at chance level during these crossing events.

\subsubsection{$\Delta$-F (Field difference)}

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{9}}
  \caption{\label{dField} Electric field properties deducted from EOD frequency power on different electrodes of potential partners in the course of tracking, respective to fig.\ref{dEODf}. Field properties on the left (green and yellow signal in fig.\ref{dEODf}) are compared to the field properies on the right (black signal in fig.\ref{dEODf}). The centered top field difference results from the substraction of the field properties of the green and black signal in fig.\ref{dEODf}, the centered bottom respectively the difference between the yellow and black signal. $\Delta field_0$ and $\Delta field_1$ represent the mean-square error of the respective difle differences.}
\end{figure}

To address EOD frequency tracking errors arising from crossing EOD frequency traces, e.g. during the events of EOD frequency rises, we use the individual absolute fields properties as second tracking parameter. Due to our multi electrode recoding setup we are able to estimate the strength of each individual EOD signal at multiple locations within our electrode-grid by extracting the power of the according EOD frequency in the Powerspectrum of each electrode. These two-dimensional representations of the electric fields vary between individuals depending on their very location within the electrode-grid. After normalizing the individual electric fields to eliminate the impact of absolute field strength the obtained field proportions can be used as a second tracking parameter by calculation of the difference between two field proportions using the mean-square-error of different field proportions.

\subsubsection{Error values composed from $\Delta$-EODf and $\Delta$-Field}

The simple comparison of EOD frequency difference and field structure difference is not sufficient to determine the likelihood of two signals originating from the same individual. Therefore, relative EOD frequency errors and relative field errors, both ranging between 0 and 1, are calculated.

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{10}}
  \caption{\label{rel_errors} Determination of relative field and EOD frequency errors. Up: All possible field errors of signals within a time window of 30\,s (3 $\times$ compare range) showing a maximum EOD frequency differen of 10\,Hz were colected as possible field error distribution. Its cumulative sum histogram is displayed in red. Relative field errors for both, $\Delta field_0$ and $\Delta field_1$, are difined as the proportion of possible field errors smaller than the respective field errors. Smaller field errors result in smaller relative field errors and, thus, increase the likelihood of two signals belonging to one identity. Down: Relative frequency errors are calculated based on a boltzmann function. Relative frequency errors above 1\,Hz already represent a special event within a EOD frequency trace and, thus, result in the maximum relative frequency error.}
\end{figure}

\subsubsection{Frequency error determination}

With respect to EOD frequency differences we assume EOD frequency differences of above 1 Hz to be equally unlike to originate from the same individual and, thus, result in the maximum relative EOD frequency error. EOD frequencies below 1 Hz result in smaller relative EOD frequency errors and are calculated from a Boltzmann-function resulting in smaller relative EOD frequency difference the lower the real EOD frequency difference is.

\subsubsection{Field error determination}

Considering the difference in field structure the absolute field structure errors are highly dependent on the amount of electrodes used in the recording setup. Therefore, to estimate the relative field structure error we first estimate the distribution of possible field structure errors in a 30 seconds window around the currently datapoints of interest. These possible field structure errors are define as those field structures with a smaller EOD frequency difference than 10 Hz. Deducted from the distribution of possible field structure differences the relative field structure difference of two field structures is the proportion of smaller field structure differences in the distribution of possible field structure differences.

\subsubsection{Total error definition}

The absolute error between two signals is calculated using a cost-function evaluating both, relative EOD frequency error and relative field structure error. Since frequency changes of several Hz within EOD frequency traces are possible due to the uttering of communication signals like EOD frequency rises and rapid spatial changes comparably uncommon (see Results) we use the cost-function displayed in equation 1 to estimate the total error value between different detected EOD signals.

\subsubsection{Assign temporal EOD frequency traces}

To enable the analysis of recordings with	limitless duration, the actual tracking algorithm is two-staged. First, we assign so called temporal identities for EOD signals detected in a 30 seconds window. Therefore, we calculated the total error for every possible connection within this 30 seconds window. Besides the limitation of a maximum EOD frequency difference of 10 Hz the possible EOD signal pairs are limited by a maximum compare range of 10 seconds, i.e. two signals that shall be connected show a maximum time difference of 10 seconds. According to the obtained error values temporal identities are assembled starting from the smallest total error representing the best connection to the largest total error representing the least good connection. Connections that would interfere with already existing temporal identities are not made since already made connections are based on smaller total errors and therefore are more likely to be correct. The resulting temporal identities are based on the best possible connections, but only the centered 10 seconds of the 30 seconds window represent valid connections since the connections within the head and tail 10 seconds did not take into account all possible connections within $\pm$ compare range.

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{1}}
  \caption{\label{tmp_idents} ---}
\end{figure}

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{4}}
  \caption{\label{error_matrix} Pre sorting of EOD signals into temporal identities. A-C: Detected origin signals within a 30\,s time window can be assigned to target signals with a maximum time lag of 10\,s. Signals are assigned to each other based on their respective error values. The resulting temporal identity traces are formed based on the lowest possible error values. D: Error values of possible origin signals and target signals sorted by their temporal occurance. E: Eror values of possible origin signals and target signals sorted by their temporal identity. Only every second identity is displayed indicated for plotting reasons.}
\end{figure}

\subsubsection{Running connection}

The assembly of the center 10 seconds of the temporal identities, containing valid connections, and already tracked real identities, again is based on total errors between respective EOD signals. Total error values between signals of already assigned read identities and signals within the centered 10 seconds of temporal identities are identified and, again, assembled based on the total error values preferring lower total errors before larger ones. Temporal identities, which could not be assigned to a already existing real identity form a new real identity. The window to identify temporal identities is shifted by the compare range, i.e. 10 seconds, and the identification of temporal identities and their assignment to already tracked real identities continuous until the end of the recording is reached.

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{2}}
  \caption{\label{running_connection}	Running connection. A: Temporal identities assigned via lowest error value connections. The black and grey bars together indicate the whole data snippet temporal identities have been assigned before. Connections within the grey areas are assumed to be unsteady since signals within compare range but outside the gray are could point to those signals as well but are neglected in this particular temporal identity assignment. Connections within the black bar area indicate valid connections since every signal possible connected to those signals lie within the temporal assignment range. B: Already assigned identities, i.e. potential connection partners for the temporal identities to assigne, are shown. C: The yellow and green signals (already discussed in fig. \ref{dEODf} and fig. \ref{dField}) represent the origin signals of the respective identities to best fit to a signal of one temporal identity, indicated in black. The yellow-to-black error is lower than the green-to-black error. D: Signals of temporal identities got assigned to signals of already assigned identities based on their lowest error value to each other.}
\end{figure}

\pagebreak
\section{Results}

\subsection{d-EODf and d-Field of same identity signals vs. non-same identity signals}

To determine the utility of the used tracking parameters and, thus, our tracking algorithm we performed post-hoc analysis on the assigned EOD frequency traces after visual examination. The EOD frequency errors within subjects within the compare range of 10\,s is signigicantly lower than EOD frequency errors between subjects (Abb.\ref{EODf_error_shift}). However, the events of EOD frequency rises cause an overlap of EOD frequency errors of within subjectsignal comparison and between subject signal comparison.

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{7}}
  \caption{\label{EODf_error_shift} Random caption}
\end{figure}

Assessing field errors, within subject signals comparions also resulted in significantly lower Field error values compared to between subject Field errors. However, the distribution of within subject errors and between subject field errors did not differe as much as the respective distributions of EOD frequency errors. Interestingly, within subject field errors stayed unaltered low in comparison to between subject field errors within the 10\,s compare window.

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{6}}
  \caption{\label{Field_error_shift} Random caption}
\end{figure}

Since EOD frequency error like Field error within subjects are mostly independent from time we could exlude the time factor from the tracking algorithm.


\subsection{Roc analysis}

To further validate the use of the tracking parameters we carried out ROC analysis evaluating the discriminability of within and between subject field errors, EOD frequency errors and total errors. AUC values decreased for all error with increasing time lag. However, AUC values for all errors stayed on a very high level. Interstingly, while AUC values concerning field error and EOD frequency error dropped after a certain time lag to a base level, the AUC value for the combined total error stayed comparatively high and enables us to reject the usage of time lag as additional parameter for the calculation of the total error.

\begin{figure}[h!]
  \centerline{\includegraphics[width=1.\linewidth]{12}}
  \caption{\label{ROC} Random caption}
\end{figure}

\end{document}