323 lines
13 KiB
TeX
323 lines
13 KiB
TeX
\chapter{Debugging}
|
|
|
|
\centerline{\includegraphics[width=0.7\textwidth]{xkcd_debugger}\rotatebox{90}{\footnotesize\url{www.xkcd.com}}}\vspace{4ex}
|
|
|
|
|
|
When writing a program from scratch we almost always make
|
|
mistakes. Accordingly, a quite substantial amount of time is invested
|
|
into finding and fixing errors. This process is called
|
|
\codeterm{debugging}. Don't be frustrated that a self-written program
|
|
does not work as intended and produces errors. It is quite exceptional
|
|
if a program appears to be working on the first try and, in fact,
|
|
should leave you suspicious.
|
|
|
|
In this chapter we will talk about typical mistakes, how to read and
|
|
understand error messages, how to actually debug your program code and
|
|
some hints that help to minimize errors.
|
|
|
|
\section{Types of errors and error messages}
|
|
|
|
There are a number of different classes of programming errors and it
|
|
is good to know the common ones. When we make a programming error
|
|
there are some that will lead to corrupted syntax, or invalid
|
|
operations and \matlab{} will \codeterm{throw} an error. Throwing an
|
|
error ends the execution of a program and there will be an error
|
|
messages shown in the command window. With such messages \matlab{}
|
|
tries to explain what went wrong and provide a hint on the possible
|
|
cause.
|
|
|
|
Bugs that lead to the termination of the execution may be annoying but
|
|
are generally easier to find and fix than logical errors that stay
|
|
hidden and the results of, e.g. an analysis, are seemingly correct.
|
|
|
|
\begin{important}[Try --- catch]
|
|
There are ways to \codeterm{catch} errors during \codeterm{runtime}
|
|
(i.e. when the program is executed) and handle them in the program.
|
|
|
|
\begin{lstlisting}[label=trycatch, caption={Try catch clause}]
|
|
try
|
|
y = function_that_throws_an_error(x);
|
|
catch
|
|
y = 0;
|
|
end
|
|
\end{lstlisting}
|
|
|
|
This way of solving errors may seem rather convenient but is
|
|
risky. Having a function throwing an error and catching it in the
|
|
\codeterm{catch} clause will keep your command line clean but may
|
|
obscure logical errors! Take care when using the \codeterm{try-catch
|
|
clause}.
|
|
\end{important}
|
|
|
|
|
|
\subsection{\codeterm{Syntax error}}
|
|
The most common and easiest to fix type of error. A syntax error
|
|
violates the rules (spelling and grammar) of the programming
|
|
language. For example every opening parenthesis must be matched by a
|
|
closing one or every \keyword{for} loop has to be closed by an
|
|
\keyword{end}. Usually, the respective error messages are clear and
|
|
the editor will point out and highlight most \codeterm{syntax error}s.
|
|
|
|
\begin{lstlisting}[label=syntaxerror, caption={Unbalanced parenthesis error.}]
|
|
>> mean(random_numbers
|
|
|
|
|
Error: Expression or statement is incorrect--possibly unbalanced (, {, or [.
|
|
|
|
Did you mean:
|
|
>> mean(random_numbers)
|
|
\end{lstlisting}
|
|
|
|
\subsection{\codeterm{Indexing error}}
|
|
Second on the list of common errors are the indexing errors. Usually
|
|
\matlab{} gives rather precise infromation about the cause, once you
|
|
know what they mean. Consider the following code.
|
|
|
|
\begin{lstlisting}[label=indexerror, caption={Indexing errors.}]
|
|
>> my_array = (1:100);
|
|
>> % first try: index 0
|
|
>> my_array(0)
|
|
Subscript indices must either be real positive integers or logicals.
|
|
|
|
>> % second try: negative index
|
|
>> my_array(-1)
|
|
Subscript indices must either be real positive integers or logicals.
|
|
|
|
>> % third try: a floating point number
|
|
>> my_array(5.7)
|
|
Subscript indices must either be real positive integers or logicals.
|
|
|
|
>> % fourth try: a character
|
|
>> my_array('z')
|
|
Index exceeds matrix dimensions.
|
|
|
|
>> % fifth try: another character
|
|
>> my_array('A')
|
|
ans =
|
|
65 % wtf ?!?
|
|
\end{lstlisting}
|
|
|
|
The first two indexing attempts in listing \ref{indexerror} are rather
|
|
clear. We are trying to access elements with indices that are
|
|
invalid. Remember, indices in \matlab{} start with 1. Negative numbers
|
|
and zero are not permitted. In the third attemp we index using a
|
|
floating point number. This fails because indices have to be 'integer'
|
|
values. Using a character as an index (fourth attempt) leads to a
|
|
different error message that says that the index exceeds the matrix
|
|
dimensions. This indicates that we are trying to read data behind the
|
|
length of our variable \codevar{my\_array} which has 100 elements.
|
|
One could have expected that the character is an invalid index, but
|
|
apparently it is valid but simply too large. The fith attempt finally
|
|
succeeds. But why? \matlab{} implicitely converts the \codeterm{char}
|
|
to a number and uses this number to address the element in
|
|
\varcode{my\_array}. The \codeterm{char} has the ASCII code 65 and
|
|
thus the 65th element of \varcode{my_array} is returned.
|
|
|
|
\subsection{\codeterm{Assignment error}}
|
|
This error occurs when we want to write data into a vector.
|
|
|
|
\subsection{Name error}
|
|
\subsection{Arithmetic error}
|
|
|
|
\section{Logical error}
|
|
Sometimes a program runs smoothly and terminates without any
|
|
error. This, however, does not necessarily mean that the program is
|
|
correct. We may have made a \codeterm{logical error}. Logical errors
|
|
are hard to find, \matlab{} has no chance to find this error and can
|
|
not help us fixing bugs origination from these. We are on our own but
|
|
there are a few strategies that should help us.
|
|
|
|
\begin{enumerate}
|
|
\item Be sceptical: especially when a program executes without any
|
|
complaint on the first try.
|
|
\item Clean code: Structure your code that you can easily read
|
|
it. Comment, but only where necessary. Correctly indent your
|
|
code. Use descriptive variable and function names.
|
|
\item Keep it simple (below).
|
|
\item Read error messages, try to understand what \matlab{} wants to
|
|
tell.
|
|
\item Use scripts and functions and call them from the command
|
|
line. \matlab{} can then provide you with more information. It will
|
|
then point to the line where the error happens.
|
|
\item If you still find yourself in trouble: Apply debugging
|
|
strategies to find and fix bugs (below).
|
|
\end{enumerate}
|
|
|
|
|
|
\subsection{Avoiding errors}
|
|
It would be great if we could just sit down write a program, run it
|
|
and be done. Most likely this will not happen. Rather, we will make
|
|
mistakes and have to bebug the code. There are a few guidelines that
|
|
help to reduce the number of errors.
|
|
|
|
\subsection{The Kiss principle: 'Keep it small and simple' or 'simple and stupid'}
|
|
|
|
\shortquote{Debugging time increases as a square of the program's
|
|
size.}{Chris Wenham}
|
|
|
|
Break down your programming problems into small parts (functions) that
|
|
do exactly one thing. This has already been discussed in the context
|
|
of writing scripts and functions. In parts this is just a matter of
|
|
feeling overwhelmed by 1000 lines of code. Further, with each task
|
|
that you incorporate into the same script the probability of naming
|
|
conflicts (same or similar names for variables) increases. Remembering
|
|
the meaning of a certain variable that was defined in the beginning of
|
|
the script is just hard.
|
|
|
|
|
|
\shortquote{Everyone knows that debugging is twice as hard as writing
|
|
a program in the first place. So if you're as clever as you can be
|
|
when you write it, how will you ever debug it?}{Brian Kernighan}
|
|
|
|
Many tasks within an analysis can be squashed into a single line of
|
|
code. This saves some space in the file, reduces the effort of coming
|
|
up with variable names and simply looks so much more competent than a
|
|
collection of very simple lines. Consider the following listing
|
|
(listing~\ref{easyvscomplicated}). Both parts of the listing solve the
|
|
same problem but the second one breaks the task down to a sequence of
|
|
easy-to-understand commands. Finding logical and also syntactic errors
|
|
is much easier in the second case. The first version is perfectly fine
|
|
but it requires a deep understanding of the applied functions and also
|
|
the task at hand.
|
|
|
|
\begin{lstlisting}[label=easyvscomplicated, caption={Converting a series of spike times into the firing rate as a function of time. Many tasks can be solved with a single line of code. But is this readable?}]
|
|
% the one-liner
|
|
rate = conv(full(sparse(1, round(spike_times/dt), 1, 1, length(time))), kernel, 'same');
|
|
|
|
% easier to read
|
|
rate = zeros(size(time));
|
|
spike_indices = round(spike_times/dt);
|
|
rate(spike_indices) = 1;
|
|
rate = conv(rate, kernel, 'same');
|
|
\end{lstlisting}
|
|
|
|
The preferred way depends on several considerations. (i) How deep is
|
|
your personal understanding of the programming language? (ii) What
|
|
about the programming skills of your target audience or other people
|
|
that may depend on your code? (iii) Is one solution faster or uses
|
|
less resources than the other? (iv) How much do you have to invest
|
|
into the development of the most elegant solution relative to its
|
|
importance in the project? The decision is up to you.
|
|
|
|
\subsection{Read error messages carefully and call programs from the command line.}
|
|
|
|
|
|
|
|
\section{Error messages}
|
|
|
|
|
|
\begin{ibox}[tp]{\label{stacktracebox}Stacktrace or Stack Traceback}
|
|
|
|
|
|
\end{ibox}
|
|
|
|
|
|
Es hilft ungemein, wenn zusammengeh\"orige Skripte und Funktionen im
|
|
gleichen Ordner auf der Festplatte zu finden sind. Es bietet sich also
|
|
an, f\"ur jede Analyse einen eigenen Ordner anzulegen und in diesem
|
|
die zugeh\"origen \codeterm{m-files} abzulegen. Auf eine tiefere
|
|
Schachtelung in weitere Unterordner kann in der Regel verzichtet
|
|
werden. \matlab{} erzeugt einen ``MATLAB'' Ordner im eigenen
|
|
\file{Documents} (Linux) oder \file{Eigene Dokumente} (Windows)
|
|
Ordner. Es bietet sich an, diesen Ordner als Wurzelverzeichnis f\"ur
|
|
eigene Arbeiten zu verwenden. Nat\"urlich kann auch jeder andere Ort
|
|
gew\"ahlt werden. In dem Beispiel in \figref{fileorganizationfig} wird
|
|
innerhalb dieses Ordners f\"ur jedes Projekt ein eigener Unterordner
|
|
erstellt, in welchem wiederum f\"ur jedes Problem, jede Analyse ein
|
|
weiterer Unterodner erstellt wird. In diesen liegen sowohl die
|
|
ben\"otigten \codeterm{m-files} also auch die Resultate der Analyse
|
|
(Abbildungen, Daten-Dateien). Zu bemerken sind noch zwei weitere
|
|
Dinge. Im Projektordner existiert ein Skript (analysis.m), das dazu
|
|
gedacht ist, alle Analysen aufzurufen. Des Weiteren gibt es parallel
|
|
zu den Projektordnern einen \file{functions}-Ordner in dem Funktionen
|
|
liegen, die in mehr als einem Projekt oder einer Analyse gebraucht
|
|
werden.
|
|
|
|
\begin{figure}[tp]
|
|
\includegraphics[width=0.75\textwidth]{no_bug}
|
|
\titlecaption{\label{fileorganizationfig} M\"ogliche Organisation von
|
|
Programmcode im Dateisystem.}{ F\"ur jedes Projekt werden
|
|
Unterordner f\"ur die einzelnen Analysen angelegt. Auf Ebene des
|
|
Projektes k\"onnte es ein Skript (hier ``analysis.m'') geben,
|
|
welches alle Analysen in den Unterordnern anst\"o{\ss}t.}
|
|
\end{figure}
|
|
|
|
|
|
\Section{Namensgebung von Funktionen und Skripten}
|
|
|
|
\matlab{} sucht Funktionen und Skripte ausschlie{\ss}lich anhand des
|
|
Namens. Dabei spielt die Gro{\ss}- und Kleinschreibung eine Rolle. Die
|
|
beiden Dateien \file{test\_funktion.m} und \file{Test\_Funktion.m}
|
|
zwei unterschiedliche Funktionen benennen k\"onnen. Diese Art der
|
|
Variation des Namens ist nat\"urlich nicht sinnvoll. Sie tr\"agt keine
|
|
Information \"uber den Unterschied der beiden Funktionen. Auch sagt
|
|
der Name nahezu nichts \"uber den Zweck der Funktion aus.
|
|
|
|
Die Namensgebung f\"allt mitunter nicht leicht --- manchmal ist es
|
|
sogar der schwierigste Aspekt des Programmierens! Ausdrucksstarke
|
|
Namen zu finden lohnt sich aber. Ausdrucksstark bedeutet, dass sich
|
|
aus dem Namen R\"uckschl\"usse auf den Zweck ziehen lassen sollte.
|
|
|
|
\begin{important}[Benennung von Funktionen und Skripten]
|
|
Die Namen von Funktionen und Skripten sollten m\"oglichst viel \"uber
|
|
die Funktionsweise oder den Zweck aussagen (\file{firingrates.m}
|
|
statt \file{uebung.m}). Gute Namen f\"ur Funktionen und Skripte sind
|
|
die beste Dokumentation.
|
|
\end{important}
|
|
|
|
In Namen verbietet \matlab{} verbietet Leerzeichen, Sonderzeichen und
|
|
Umlaute. Namen d\"urfen auch nicht mit Zahlen anfangen. Es mach f\"ur
|
|
die Namensgebung selbst keine weiteren Vorgaben. Allerdings folgt die
|
|
Benennung der in \matlab{} vordefinierten Funktionen gewissen Mustern:
|
|
\begin{itemize}
|
|
\item Namen werden immer klein geschrieben.
|
|
\item Es werden gerne Abk\"urzungen eingesetzt (z.B. \code{xcorr()}
|
|
f\"ur die Kreuzkorrelation oder \code{repmat()} f\"ur ``repeat matrix'')
|
|
\item Funktionen, die zwischen Formaten konvertieren sind immer nach
|
|
dem Muster ``format2format'' (z.B. \code{num2str()} f\"ur die
|
|
Konvertierung ``number to string'', Umwandlung eines numerischen
|
|
Wertes in einen Text) benannt.
|
|
\end{itemize}
|
|
|
|
|
|
\begin{important}[Benennung von Variablen]
|
|
Die Namen von Variablen sollten m\"oglichst viel \"uber ihren Inhalt
|
|
aussagen (\varcode{spike\_count} statt \varcode{x}). Gute Namen
|
|
f\"ur Variablen sind die beste Dokumentation.
|
|
\end{important}
|
|
|
|
|
|
\begin{lstlisting}[label=chaoticcode, caption={Un\"ubersichtliche Implementation des Random-walk.}]
|
|
|
|
\end{lstlisting}
|
|
|
|
\pagebreak[4]
|
|
|
|
\begin{lstlisting}[label=cleancode, caption={\"Ubersichtliche Implementation des Random-walk.}]
|
|
num_runs = 10;
|
|
max_steps = 1000;
|
|
positions = zeros(max_steps, num_runs);
|
|
|
|
for run = 1:num_runs
|
|
for step = 2:max_steps
|
|
x = randn(1);
|
|
if x < 0
|
|
positions(step, run) = positions(step-1, run) + 1;
|
|
elseif x > 0
|
|
positions(step, run) = positions(step-1, run) - 1;
|
|
end
|
|
end
|
|
end
|
|
\end{lstlisting}
|
|
|
|
|
|
% \begin{exercise}{logicalVector.m}{logicalVector.out}
|
|
% Erstelle einen Vektor \varcode{x} mit den Werten 0--10.
|
|
% \begin{enumerate}
|
|
% \item F\"uhre aus: \varcode{y = x < 5}
|
|
% \item Gib den Inhalt von \varcode{y} auf dem Bildschirm aus.
|
|
% \item Was ist der Datentyp von \varcode{y}?
|
|
% \item Gibt alle Elemente aus \varcode{x} zur\"uck, die kleiner als 5 sind.
|
|
% \end{enumerate}
|
|
% \pagebreak[4]
|
|
% \end{exercise}
|