scientificComputing/debugging/lecture/debugging.tex

\chapter{Debugging}

When we write a program from scratch we almost always make
mistakes. Accordingly a quite substantial amount of time is invested
into finding and fixing errors. This process is called
\codeterm{debugging}. Don't be frustrated that a self-written program
does not work as intended and produces errors. It is quite exceptional
if a program appears to be working on the first try and, in fact,
should leave you suspicious.

In this chapter we will talk about typical mistakes, how to read and
understand error messages, how to actually debug your program code and
some hints that help to minimize errors.

\section{Types of errors and error messages}

There are a number of different classes of programming errors.

\paragraph{\codeterm{Syntax error}:}
The most common and easiest to fix type of error. A syntax error
violates the rules (spelling and grammar) of the programming
language. For example every opening parenthesis must be matched by a
closing one or every \keyword{for} loop has to be closed by an
\keyword{end}. Usually, the respective error messages are clear and
the editor will point out and highlight most \codeterm{syntax error}s.

\begin{lstlisting}[label=syntaxerror, caption={Unbalanced parenthesis error.}]
>> mean(random_numbers
                      |
Error: Expression or statement is incorrect--possibly unbalanced (, {, or [.

Did you mean:
>>   mean(random_numbers)
\end{lstlisting}


\Paragraph{\codeterm{Indexing error}:}
\paragraph{\codeterm{Assignment error}:}
\paragraph{Name error:}
\paragraph{Arithmetic error:}
\paragraph{Logical error:}


\section{Avoiding errors}
It would be great if we could just sit down write a program, run it
and be done. Most likely this will not happen. Rather, we will make
mistakes and have to bebug the code. There are a few guidelines that
help to reduce the number of errors.

\subsection{Keep it small and simple}

\shortquote{Debugging time increases as a square of the program's
  size.}{Chris Wenham}

Break down your programming problems into small parts (functions) that
do exactly one thing. This has already been discussed in the context
of writing scripts and functions. In parts this is just a matter of
feeling overwhelmed by 1000 lines of code. Further, with each task
that you incorporate into the same script the probability of naming
conflicts (same or similar names for variables) increases. Remembering
the meaning of a certain variable that was defined in the beginning of
the script is just hard.


\shortquote{Everyone knows that debugging is twice as hard as writing
  a program in the first place. So if you're as clever as you can be
  when you write it, how will you ever debug it?}{Brian Kernighan}

Many tasks within an analysis can be squashed into a single line of
code. This saves some space in the file, reduces the effort of coming up
with variable names and simply looks so much more competent than a
collection of very simple lines. Consider the following listing
(listing~\ref{easyvscomplicated}). Both parts of the listing solve the
same problem but the second one breaks the task down to a sequence of
easy-to-understand commands. Finding logical and also syntactic errors is
much easier in the second case. The first version is perfectly fine
but it requires a deep understanding of the applied
functions and also the task at hand.

\begin{lstlisting}[label=easyvscomplicated, caption={Converting a series of spike times into the firing rate as a function of time. Many tasks can be solved with a single line of code. But is this readable?}]
% the one-liner
rate = conv(full(sparse(1, round(spike_times/dt), 1, 1, length(time))), kernel, 'same');

% easier to read
rate = zeros(size(time));
spike_indices = round(spike_times/dt);
rate(spike_indices) = 1;
rate = conv(rate, kernel, 'same');
\end{lstlisting}

The preferred way depends on several considerations. (i)
How deep is your personal understanding of the programming language?
(ii) What about the programming skills of your target audience or
other people that may depend on your code? (iii) Is one solution
faster or uses less resources than the other? (iv) How much do you
have to invest into the development of the most elegant solution
relative to its importance in the project? The decision is up to you.

\subsection{Read error messages carefully and call programs from the command line.}


\section{Error messages}


\begin{ibox}[tp]{\label{stacktracebox}Stacktrace or Stack Traceback}


\end{ibox}


Es hilft ungemein, wenn zusammengeh\"orige Skripte und Funktionen im
gleichen Ordner auf der Festplatte zu finden sind. Es bietet sich also
an, f\"ur jede Analyse einen eigenen Ordner anzulegen und in diesem
die zugeh\"origen \codeterm{m-files} abzulegen. Auf eine tiefere
Schachtelung in weitere Unterordner kann in der Regel verzichtet
werden. \matlab{} erzeugt einen ``MATLAB'' Ordner im eigenen
\file{Documents} (Linux) oder \file{Eigene Dokumente} (Windows)
Ordner. Es bietet sich an, diesen Ordner als Wurzelverzeichnis f\"ur
eigene Arbeiten zu verwenden. Nat\"urlich kann auch jeder andere Ort
gew\"ahlt werden. In dem Beispiel in \figref{fileorganizationfig} wird
innerhalb dieses Ordners f\"ur jedes Projekt ein eigener Unterordner
erstellt, in welchem wiederum f\"ur jedes Problem, jede Analyse ein
weiterer Unterodner erstellt wird. In diesen liegen sowohl die
ben\"otigten \codeterm{m-files} also auch die Resultate der Analyse
(Abbildungen, Daten-Dateien). Zu bemerken sind noch zwei weitere
Dinge. Im Projektordner existiert ein Skript (analysis.m), das dazu
gedacht ist, alle Analysen aufzurufen. Des Weiteren gibt es parallel
zu den Projektordnern einen \file{functions}-Ordner in dem Funktionen
liegen, die in mehr als einem Projekt oder einer Analyse gebraucht
werden.

\begin{figure}[tp]
  \includegraphics[width=0.75\textwidth]{no_bug}
  \titlecaption{\label{fileorganizationfig} M\"ogliche Organisation von
    Programmcode im Dateisystem.}{ F\"ur jedes Projekt werden
    Unterordner f\"ur die einzelnen Analysen angelegt. Auf Ebene des
    Projektes k\"onnte es ein Skript (hier ``analysis.m'') geben,
    welches alle Analysen in den Unterordnern anst\"o{\ss}t.}
\end{figure}


\Section{Namensgebung von Funktionen und Skripten}

\matlab{} sucht Funktionen und Skripte ausschlie{\ss}lich anhand des
Namens. Dabei spielt die Gro{\ss}- und Kleinschreibung eine Rolle. Die
beiden Dateien \file{test\_funktion.m} und \file{Test\_Funktion.m}
zwei unterschiedliche Funktionen benennen k\"onnen. Diese Art der
Variation des Namens ist nat\"urlich nicht sinnvoll. Sie tr\"agt keine
Information \"uber den Unterschied der beiden Funktionen. Auch sagt
der Name nahezu nichts \"uber den Zweck der Funktion aus.

Die Namensgebung f\"allt mitunter nicht leicht --- manchmal ist es
sogar der schwierigste Aspekt des Programmierens!  Ausdrucksstarke
Namen zu finden lohnt sich aber. Ausdrucksstark bedeutet, dass sich
aus dem Namen R\"uckschl\"usse auf den Zweck ziehen lassen sollte.

\begin{important}[Benennung von Funktionen und Skripten]
  Die Namen von Funktionen und Skripten sollten m\"oglichst viel \"uber
  die Funktionsweise oder den Zweck aussagen (\file{firingrates.m}
  statt \file{uebung.m}). Gute Namen f\"ur Funktionen und Skripte sind
  die beste Dokumentation.
\end{important}

In Namen verbietet \matlab{} verbietet Leerzeichen, Sonderzeichen und
Umlaute. Namen d\"urfen auch nicht mit Zahlen anfangen. Es mach f\"ur
die Namensgebung selbst keine weiteren Vorgaben. Allerdings folgt die
Benennung der in \matlab{} vordefinierten Funktionen gewissen Mustern:
\begin{itemize}
\item Namen werden immer klein geschrieben.
\item Es werden gerne Abk\"urzungen eingesetzt (z.B. \code{xcorr()}
  f\"ur die Kreuzkorrelation oder \code{repmat()} f\"ur ``repeat matrix'')
\item Funktionen, die zwischen Formaten konvertieren sind immer nach
  dem Muster ``format2format'' (z.B. \code{num2str()} f\"ur die
  Konvertierung ``number to string'', Umwandlung eines numerischen
  Wertes in einen Text) benannt.
\end{itemize}


\begin{important}[Benennung von Variablen]
  Die Namen von Variablen sollten m\"oglichst viel \"uber ihren Inhalt
  aussagen (\varcode{spike\_count} statt \varcode{x}). Gute Namen
  f\"ur Variablen sind die beste Dokumentation.
\end{important}


\begin{lstlisting}[label=chaoticcode, caption={Un\"ubersichtliche Implementation des Random-walk.}]

\end{lstlisting}

\pagebreak[4]

\begin{lstlisting}[label=cleancode, caption={\"Ubersichtliche Implementation des Random-walk.}]
num_runs = 10;
max_steps = 1000;
positions = zeros(max_steps, num_runs);

for run = 1:num_runs
    for step = 2:max_steps
        x = randn(1);
        if x < 0
            positions(step, run) = positions(step-1, run) + 1;
        elseif x > 0
            positions(step, run) = positions(step-1, run) - 1;
        end
    end
end
\end{lstlisting}


% \begin{exercise}{logicalVector.m}{logicalVector.out}
%   Erstelle einen Vektor \varcode{x} mit den Werten 0--10.
%   \begin{enumerate}
%   \item F\"uhre aus: \varcode{y = x < 5}
%   \item Gib den Inhalt von \varcode{y} auf dem Bildschirm aus.
%   \item  Was ist der Datentyp von \varcode{y}?
%   \item  Gibt alle Elemente aus \varcode{x} zur\"uck, die kleiner als 5 sind.
%   \end{enumerate}
%   \pagebreak[4]
% \end{exercise}