\chapter{Debugging} \centerline{\includegraphics[width=0.7\textwidth]{xkcd_debugger}\rotatebox{90}{\footnotesize\url{www.xkcd.com}}}\vspace{4ex} When writing a program from scratch we almost always make mistakes. Accordingly, a quite substantial amount of time is invested into finding and fixing errors. This process is called \codeterm{debugging}. Don't be frustrated that a self-written program does not work as intended and produces errors. It is quite exceptional if a program appears to be working on the first try and, in fact, should leave you suspicious. In this chapter we will talk about typical mistakes, how to read and understand error messages, how to actually debug your program code and some hints that help to minimize errors. \section{Types of errors and error messages} There are a number of different classes of programming errors and it is good to know the common ones. Some of your programming errors will will lead to violations of the syntax or to invalid operations that will cause \matlab{} to \codeterm{throw} an error. Throwing an error ends the execution of a program and there will be an error messages shown in the command window. With such messages \matlab{} tries to explain what went wrong and to provide a hint on the possible cause. Bugs that lead to the termination of the execution may be annoying but are generally easier to find and to fix than logical errors that stay hidden and the results of, e.g. an analysis, are seemingly correct. \begin{important}[Try --- catch] There are ways to \codeterm{catch} errors during \codeterm{runtime} (i.e. when the program is executed) and handle them in the program. \begin{lstlisting}[label=trycatch, caption={Try catch clause}] try y = function_that_throws_an_error(x); catch y = 0; end \end{lstlisting} This way of solving errors may seem rather convenient but is risky. Having a function throwing an error and catching it in the \codeterm{catch} clause will keep your command line clean but may obscure logical errors! Take care when using the \codeterm{try-catch clause}. \end{important} \subsection{\codeterm{Syntax errors}}\label{syntax_error} The most common and easiest to fix type of error. A syntax error violates the rules (spelling and grammar) of the programming language. For example every opening parenthesis must be matched by a closing one or every \code{for} loop has to be closed by an \code{end}. Usually, the respective error messages are clear and the editor will point out and highlight most \codeterm{syntax error}s. \begin{lstlisting}[label=syntaxerror, caption={Unbalanced parenthesis error.}] >> mean(random_numbers | Error: Expression or statement is incorrect--possibly unbalanced (, {, or [. Did you mean: >> mean(random_numbers) \end{lstlisting} \subsection{\codeterm{Indexing error}}\label{index_error} Second on the list of common errors are the indexing errors. Usually \matlab{} gives rather precise infromation about the cause, once you know what they mean. Consider the following code. \begin{lstlisting}[label=indexerror, caption={Indexing errors.}] >> my_array = (1:100); >> % first try: index 0 >> my_array(0) Subscript indices must either be real positive integers or logicals. >> % second try: negative index >> my_array(-1) Subscript indices must either be real positive integers or logicals. >> % third try: a floating point number >> my_array(5.7) Subscript indices must either be real positive integers or logicals. >> % fourth try: a character >> my_array('z') Index exceeds matrix dimensions. >> % fifth try: another character >> my_array('A') ans = 65 % wtf ?!? \end{lstlisting} The first two indexing attempts in listing \ref{indexerror} are rather clear. We are trying to access elements with indices that are invalid. Remember, indices in \matlab{} start with 1. Negative numbers and zero are not permitted. In the third attemp we index using a floating point number. This fails because indices have to be 'integer' values. Using a character as an index (fourth attempt) leads to a different error message that says that the index exceeds the matrix dimensions. This indicates that we are trying to read data behind the length of our variable \varcode{my\_array} which has 100 elements. One could have expected that the character is an invalid index, but apparently it is valid but simply too large. The fith attempt finally succeeds. But why? \matlab{} implicitely converts the \codeterm{char} to a number and uses this number to address the element in \varcode{my\_array}. The \codeterm{char} has the ASCII code 65 and thus the 65th element of \varcode{my\_array} is returned. \subsection{\codeterm{Assignment error}} Related to the Indexing error this error occurs when we want to write data into a variable, that does not fit into it. Listing \ref{assignmenterror} shows the simple case for 1-d data but, of course, it extents to n-dimensional data. The data that is to be filled into a matrix hat to fit in all dimensions. The command in line 7 works due to the fact, that matlab automatically extends the matrix, if you assign values to a range outside its bounds. \begin{lstlisting}[label=assignmenterror, caption={Assignment errors.}] >> a = zeros(1, 100); >> b = 0:10; >> a(1:10) = b; In an assignment A(:) = B, the number of elements in A and B must be the same. >> a(100:110) = b; >> size(a) ans = 110 1 \end{lstlisting} \subsection{\codeterm{Dimension mismatch error}} Similarly, some arithmetic operations are only valid if the variables fulfill some size constraints. Consider the following commands (listing\,\ref{dimensionmismatch}). The first one (line 3) fails because we are trying to do al elementwise add on two vectors that have different lengths, respectively sizes. The matrix multiplication in line 6 also fails since for this operations to succeed the inner matrix dimensions must agree (for more information on the matrixmultiplication see box\,\ref{matrixmultiplication} in chapter\,\ref{programming}). The elementwise multiplication issued in line 10 fails for the same reason as the addition we tried earlier. Sometimes, however, things apparently work but the result may be surprising. The last operation in listing\,\ref{dimensionmismatch} does not throw an error but the result is something else than the expected elementwise multiplication. \begin{lstlisting}[label=dimensionmismatch, caption={Some arithmetic operations make size constraints, violating them leads to dimension mismatch errors.}] >> a = randn(100, 1); >> b = randn(10, 1); >> a + b Matrix dimensions must agree. >> a * b % The matrix multiplication! Error using * Inner matrix dimensions must agree. >> a .* b Matrix dimensions must agree. >> c = a .* b'; % works but the result may not be what you expected! >> size(c) ans = 100 10 \end{lstlisting} \section{Logical error} Sometimes a program runs smoothly and terminates without any complaint. This, however, does not necessarily mean that the program is correct. We may have made a \codeterm{logical error}. Logical errors are hard to find, \matlab{} has no chance to find such an error and can not help us fixing bugs origination from these. We are on our own but there are a few strategies that should help us. \begin{enumerate} \item Be sceptical: especially when a program executes without any complaint on the first try. \item Clean code: Structure your code that you can easily read it. Comment, but only where necessary. Correctly indent your code. Use descriptive variable and function names. \item Keep it simple. \item Use scripts and functions and call them from the command line. \matlab{} can then provide you with more information. It will then point to the line where the error happens. \item If you still find yourself in trouble: Apply debugging strategies to find and fix bugs (below). \end{enumerate} \subsection{Avoiding errors --- Keep it small and simple} It would be great if we could just sit down, write a program, run it, and be done with the task. Most likely this will not happen. Rather, we will make mistakes and have to bebug the code. There are a few guidelines that help to reduce the number of errors. \shortquote{Debugging time increases as a square of the program's size.}{Chris Wenham} \shortquote{Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?}{Brian Kernighan} Break down your programming problems into small parts (functions) that do exactly one thing and are thus easily testable. This has already been discussed in the context of writing scripts and functions. In parts this is just a matter of feeling overwhelmed by 1000 lines of code. Further, with each task that you incorporate into the same script the probability of naming conflicts (same or similar names for variables) increases. Remembering the meaning of a certain variable that was defined in the beginning of the script is simply hard. Many tasks within an analysis can be squashed into a single line of code. This saves some space in the file, reduces the effort of coming up with variable names and simply looks so much more competent than a collection of very simple lines. Consider the following listing (listing~\ref{easyvscomplicated}). Both parts of the listing solve the same problem but the second one breaks the task down to a sequence of easy-to-understand commands. Finding logical and also syntactic errors is much easier in the second case. The first version is perfectly fine but it requires a deep understanding of the applied functions and also the task at hand. \begin{lstlisting}[label=easyvscomplicated, caption={Converting a series of spike times into the firing rate as a function of time. Many tasks can be solved with a single line of code. But is this readable?}] % the one-liner rate = conv(full(sparse(1, round(spike_times/dt), 1, 1, length(time))), kernel, 'same'); % easier to read rate = zeros(size(time)); spike_indices = round(spike_times/dt); rate(spike_indices) = 1; rate = conv(rate, kernel, 'same'); \end{lstlisting} The preferred way depends on several considerations. (i) How deep is your personal understanding of the programming language? (ii) What about the programming skills of your target audience or other people that may depend on your code? (iii) Is one solution faster or uses less resources than the other? (iv) How much do you have to invest into the development of the most elegant solution relative to its importance in the project? The decision is yours. \section{Debugging strategies} If you find yourself in trouble you can apply a few strategies to solve the problem. \begin{enumerate} \item Lean back and take a breath. \item Read the error messages and identify the line or command where the error happens. Unfortunately, the position that breaks is not always the line or command that really introduced the bug. In some instances the actual error hides a few lines above. \item No idea what the error message is trying to say? Google it! \item Read the program line by line and understand what each line is doing. \item Use \code{disp} to print out relevant information on the command line and compare the output with your expectations. Do this step by step and start at the beginning. \item Use the \matlab{} debugger to stop execution of the code at a specific line and proceed step by step. Be sceptical and test all steps for correctness. \item Call for help and explain the program to someone else. When you do this, start at the beginning and walk through the program line by line. Often it is not necessary that the other person is a programmer or exactly understands what is going on. Often, it is the own reflection on the problem and the chosen approach that helps finding the bug. (This strategy is also known as \codeterm{Rubber duck debugging}. \end{enumerate} \subsection{Debugger} The \matlab{} editor (figure\,\ref{editor_debugger}) supports interactive debugging. Once you save an m-file in the editor and it passes the syntax check, i.e. the little box in the upper right corner of the editor window is green or orange, you can set one or several \codeterm{break point}s. When the program is executed by calling it from the command line it will be stopped at the line with the breakpoint. In the editor this is indicated by a green arrow. The command line will change to indicate that we are now stopped in debug mode (listing\,\ref{debuggerlisting}). \begin{figure} \centering \includegraphics[width=\linewidth]{editor_debugger.png} \caption{Screenshot of the \matlab{} m-file editor. Once a file is saved and passes the syntax check (the indicator in the top-right corner of the editor window turns green or orange), a breakpoint can be set. Breakpoints can be set either using the dropdown menu on top or by clicking the line number on the left margin. An active breakpoint is indicated by a red dot. The line at which the program execution was stopped is indicated by the green arrow.}\label{editor_debugger} \end{figure} \begin{lstlisting}[label=debuggerlisting, caption={Command line when the program execution was stopped in the debugger.}] >> simplerandomwalk 6 for run = 1:num_runs K>> \end{lstlisting} When stopped in the debugger we can view and change the state of the program at this point, we can also issue commands to try the next steps etc. Beware however, the state of a variable can be altered or even deleted which might affect the execution of the remaining code. The toolbar of the editor offers now a new set of tools for debugging: \begin{enumerate} \item \textbf{Continue} --- simply move on until the program terminates or the execution reaches the next breakpoint. \item \textbf{Step} --- Execute the next command and stop. \item \textbf{Step in} --- If the next command is a function call, step into it and stop at the first command. \item \textbf{Step out} --- If the next command is a function call, proceed until the called function returns, then stop. \item \textbf{Run to cursor} --- Execute all statements up to the current cursor position. \item \textbf{Quit debugging} --- Immediately stop the debugging session and stop the further code execution. \end{enumerate} The debugger offers some more (advanced) features but the functionality offered by the basic tools is often enough to debug a program.