\chapter{Debugging} \centerline{\includegraphics[width=0.7\textwidth]{xkcd_debugger}\rotatebox{90}{\footnotesize\url{www.xkcd.com}}}\vspace{4ex} When writing a program from scratch we almost always make mistakes. Accordingly, a quite substantial amount of time is invested into finding and fixing errors. This process is called \codeterm{debugging}. Don't be frustrated that a self-written program does not work as intended and produces errors. It is quite exceptional if a program appears to be working on the first try and, in fact, should leave you suspicious. In this chapter we will talk about typical mistakes, how to read and understand error messages, how to actually debug your program code and some hints that help to minimize errors. \section{Types of errors and error messages} There are a number of different classes of programming errors and it is good to know the common ones. When we make a programming error there are some that will lead to corrupted syntax, or invalid operations and \matlab{} will \codeterm{throw} an error. Throwing an error ends the execution of a program and there will be an error messages shown in the command window. With such messages \matlab{} tries to explain what went wrong and provide a hint on the possible cause. Bugs that lead to the termination of the execution may be annoying but are generally easier to find and fix than logical errors that stay hidden and the results of, e.g. an analysis, are seemingly correct. \begin{important}[Try --- catch] There are ways to \codeterm{catch} errors during \codeterm{runtime} (i.e. when the program is executed) and handle them in the program. \begin{lstlisting}[label=trycatch, caption={Try catch clause}] try y = function_that_throws_an_error(x); catch y = 0; end \end{lstlisting} This way of solving errors may seem rather convenient but is risky. Having a function throwing an error and catching it in the \codeterm{catch} clause will keep your command line clean but may obscure logical errors! Take care when using the \codeterm{try-catch clause}. \end{important} \subsection{\codeterm{Syntax error}} The most common and easiest to fix type of error. A syntax error violates the rules (spelling and grammar) of the programming language. For example every opening parenthesis must be matched by a closing one or every \keyword{for} loop has to be closed by an \keyword{end}. Usually, the respective error messages are clear and the editor will point out and highlight most \codeterm{syntax error}s. \begin{lstlisting}[label=syntaxerror, caption={Unbalanced parenthesis error.}] >> mean(random_numbers | Error: Expression or statement is incorrect--possibly unbalanced (, {, or [. Did you mean: >> mean(random_numbers) \end{lstlisting} \subsection{\codeterm{Indexing error}} Second on the list of common errors are the indexing errors. Usually \matlab{} gives rather precise infromation about the cause, once you know what they mean. Consider the following code. \begin{lstlisting}[label=indexerror, caption={Indexing errors.}] >> my_array = (1:100); >> % first try: index 0 >> my_array(0) Subscript indices must either be real positive integers or logicals. >> % second try: negative index >> my_array(-1) Subscript indices must either be real positive integers or logicals. >> % third try: a floating point number >> my_array(5.7) Subscript indices must either be real positive integers or logicals. >> % fourth try: a character >> my_array('z') Index exceeds matrix dimensions. >> % fifth try: another character >> my_array('A') ans = 65 % wtf ?!? \end{lstlisting} The first two indexing attempts in listing \ref{indexerror_listing} are rather clear. We are trying to access elements with indices that are invalid. Remember, indices in \matlab{} start with 1. Negative numbers and zero are not permitted. In the third attemp we index using a floating point number. This fails because indices have to be 'integer' values. Using a character as an index (fourth attempt) leads to a different error message that says that the index exceeds the matrix dimensions. This indicates that we are trying to read data behind the length of our variable \codevar{my\_array} which has 100 elements. One could have expected that the character is an invalid index, but apparently it is valid but simply too large. The fith attempt finally succeeds. But why? \matlab{} implicitely converts the \codeterm{char} to a number and uses this number to address the element in \varcode{my\_array}. \subsection{\codeterm{Assignment error}} This error occurs when we want to write data into a vector. \paragraph{Name error:} \paragraph{Arithmetic error:} \section{Logical error} Sometimes a program runs smoothly and terminates without any error. This, however, does not necessarily mean that the program is correct. We may have made a \codeterm{logical error}. Logical errors are hard to find, \matlab{} has no chance to find this error and can not help us fixing bugs origination from these. We are on our own but there are a few strategies that should help us. \begin{enumerate} \item Be sceptical: especially when a program executes without any complaint on the first try. \item Clean code: Structure your code that you can easily read it. Comment, but only where necessary. Correctly indent your code. Use descriptive variable and function names. \item Keep it simple (below). \item Read error messages, try to understand what \matlab{} wants to tell. \item Use scripts and functions and call them from the command line. \matlab{} can then provide you with more information. It will then point to the line where the error happens. \item If you still find yourself in trouble: Apply debugging strategies to find and fix bugs (below). \end{enumerate} \subsection{Avoiding errors} It would be great if we could just sit down write a program, run it and be done. Most likely this will not happen. Rather, we will make mistakes and have to bebug the code. There are a few guidelines that help to reduce the number of errors. \subsection{The Kiss principle: 'Keep it small and simple' or 'simple and stupid'} \shortquote{Debugging time increases as a square of the program's size.}{Chris Wenham} Break down your programming problems into small parts (functions) that do exactly one thing. This has already been discussed in the context of writing scripts and functions. In parts this is just a matter of feeling overwhelmed by 1000 lines of code. Further, with each task that you incorporate into the same script the probability of naming conflicts (same or similar names for variables) increases. Remembering the meaning of a certain variable that was defined in the beginning of the script is just hard. \shortquote{Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?}{Brian Kernighan} Many tasks within an analysis can be squashed into a single line of code. This saves some space in the file, reduces the effort of coming up with variable names and simply looks so much more competent than a collection of very simple lines. Consider the following listing (listing~\ref{easyvscomplicated}). Both parts of the listing solve the same problem but the second one breaks the task down to a sequence of easy-to-understand commands. Finding logical and also syntactic errors is much easier in the second case. The first version is perfectly fine but it requires a deep understanding of the applied functions and also the task at hand. \begin{lstlisting}[label=easyvscomplicated, caption={Converting a series of spike times into the firing rate as a function of time. Many tasks can be solved with a single line of code. But is this readable?}] % the one-liner rate = conv(full(sparse(1, round(spike_times/dt), 1, 1, length(time))), kernel, 'same'); % easier to read rate = zeros(size(time)); spike_indices = round(spike_times/dt); rate(spike_indices) = 1; rate = conv(rate, kernel, 'same'); \end{lstlisting} The preferred way depends on several considerations. (i) How deep is your personal understanding of the programming language? (ii) What about the programming skills of your target audience or other people that may depend on your code? (iii) Is one solution faster or uses less resources than the other? (iv) How much do you have to invest into the development of the most elegant solution relative to its importance in the project? The decision is up to you. \subsection{Read error messages carefully and call programs from the command line.} \section{Error messages} \begin{ibox}[tp]{\label{stacktracebox}Stacktrace or Stack Traceback} \end{ibox} Es hilft ungemein, wenn zusammengeh\"orige Skripte und Funktionen im gleichen Ordner auf der Festplatte zu finden sind. Es bietet sich also an, f\"ur jede Analyse einen eigenen Ordner anzulegen und in diesem die zugeh\"origen \codeterm{m-files} abzulegen. Auf eine tiefere Schachtelung in weitere Unterordner kann in der Regel verzichtet werden. \matlab{} erzeugt einen ``MATLAB'' Ordner im eigenen \file{Documents} (Linux) oder \file{Eigene Dokumente} (Windows) Ordner. Es bietet sich an, diesen Ordner als Wurzelverzeichnis f\"ur eigene Arbeiten zu verwenden. Nat\"urlich kann auch jeder andere Ort gew\"ahlt werden. In dem Beispiel in \figref{fileorganizationfig} wird innerhalb dieses Ordners f\"ur jedes Projekt ein eigener Unterordner erstellt, in welchem wiederum f\"ur jedes Problem, jede Analyse ein weiterer Unterodner erstellt wird. In diesen liegen sowohl die ben\"otigten \codeterm{m-files} also auch die Resultate der Analyse (Abbildungen, Daten-Dateien). Zu bemerken sind noch zwei weitere Dinge. Im Projektordner existiert ein Skript (analysis.m), das dazu gedacht ist, alle Analysen aufzurufen. Des Weiteren gibt es parallel zu den Projektordnern einen \file{functions}-Ordner in dem Funktionen liegen, die in mehr als einem Projekt oder einer Analyse gebraucht werden. \begin{figure}[tp] \includegraphics[width=0.75\textwidth]{no_bug} \titlecaption{\label{fileorganizationfig} M\"ogliche Organisation von Programmcode im Dateisystem.}{ F\"ur jedes Projekt werden Unterordner f\"ur die einzelnen Analysen angelegt. Auf Ebene des Projektes k\"onnte es ein Skript (hier ``analysis.m'') geben, welches alle Analysen in den Unterordnern anst\"o{\ss}t.} \end{figure} \Section{Namensgebung von Funktionen und Skripten} \matlab{} sucht Funktionen und Skripte ausschlie{\ss}lich anhand des Namens. Dabei spielt die Gro{\ss}- und Kleinschreibung eine Rolle. Die beiden Dateien \file{test\_funktion.m} und \file{Test\_Funktion.m} zwei unterschiedliche Funktionen benennen k\"onnen. Diese Art der Variation des Namens ist nat\"urlich nicht sinnvoll. Sie tr\"agt keine Information \"uber den Unterschied der beiden Funktionen. Auch sagt der Name nahezu nichts \"uber den Zweck der Funktion aus. Die Namensgebung f\"allt mitunter nicht leicht --- manchmal ist es sogar der schwierigste Aspekt des Programmierens! Ausdrucksstarke Namen zu finden lohnt sich aber. Ausdrucksstark bedeutet, dass sich aus dem Namen R\"uckschl\"usse auf den Zweck ziehen lassen sollte. \begin{important}[Benennung von Funktionen und Skripten] Die Namen von Funktionen und Skripten sollten m\"oglichst viel \"uber die Funktionsweise oder den Zweck aussagen (\file{firingrates.m} statt \file{uebung.m}). Gute Namen f\"ur Funktionen und Skripte sind die beste Dokumentation. \end{important} In Namen verbietet \matlab{} verbietet Leerzeichen, Sonderzeichen und Umlaute. Namen d\"urfen auch nicht mit Zahlen anfangen. Es mach f\"ur die Namensgebung selbst keine weiteren Vorgaben. Allerdings folgt die Benennung der in \matlab{} vordefinierten Funktionen gewissen Mustern: \begin{itemize} \item Namen werden immer klein geschrieben. \item Es werden gerne Abk\"urzungen eingesetzt (z.B. \code{xcorr()} f\"ur die Kreuzkorrelation oder \code{repmat()} f\"ur ``repeat matrix'') \item Funktionen, die zwischen Formaten konvertieren sind immer nach dem Muster ``format2format'' (z.B. \code{num2str()} f\"ur die Konvertierung ``number to string'', Umwandlung eines numerischen Wertes in einen Text) benannt. \end{itemize} \begin{important}[Benennung von Variablen] Die Namen von Variablen sollten m\"oglichst viel \"uber ihren Inhalt aussagen (\varcode{spike\_count} statt \varcode{x}). Gute Namen f\"ur Variablen sind die beste Dokumentation. \end{important} \begin{lstlisting}[label=chaoticcode, caption={Un\"ubersichtliche Implementation des Random-walk.}] \end{lstlisting} \pagebreak[4] \begin{lstlisting}[label=cleancode, caption={\"Ubersichtliche Implementation des Random-walk.}] num_runs = 10; max_steps = 1000; positions = zeros(max_steps, num_runs); for run = 1:num_runs for step = 2:max_steps x = randn(1); if x < 0 positions(step, run) = positions(step-1, run) + 1; elseif x > 0 positions(step, run) = positions(step-1, run) - 1; end end end \end{lstlisting} % \begin{exercise}{logicalVector.m}{logicalVector.out} % Erstelle einen Vektor \varcode{x} mit den Werten 0--10. % \begin{enumerate} % \item F\"uhre aus: \varcode{y = x < 5} % \item Gib den Inhalt von \varcode{y} auf dem Bildschirm aus. % \item Was ist der Datentyp von \varcode{y}? % \item Gibt alle Elemente aus \varcode{x} zur\"uck, die kleiner als 5 sind. % \end{enumerate} % \pagebreak[4] % \end{exercise}