516 lines
22 KiB
TeX
516 lines
22 KiB
TeX
\chapter{Debugging}
|
|
|
|
\centerline{\includegraphics[width=0.7\textwidth]{xkcd_debugger}\rotatebox{90}{\footnotesize\url{www.xkcd.com}}}\vspace{4ex}
|
|
|
|
|
|
When writing a program from scratch we almost always make
|
|
mistakes. Accordingly, a quite substantial amount of time is invested
|
|
into finding and fixing errors. This process is called
|
|
\entermde{Debugging}{debugging}. Don't be frustrated that a
|
|
self-written program does not work as intended and produces errors. It
|
|
is quite exceptional if a program appears to be working on the first
|
|
try and, in fact, should leave you suspicious.
|
|
|
|
In this chapter we will talk about typical mistakes, how to read and
|
|
understand error messages, how to actually debug your program code and
|
|
some hints that help to minimize errors.
|
|
|
|
\section{Types of errors and error messages}
|
|
|
|
There are a number of different classes of programming errors and it
|
|
is good to know the common ones. Some of your programming errors will
|
|
will lead to violations of the syntax or to invalid operations that
|
|
will cause \matlab{} to \code{throw} an error. Throwing an error
|
|
ends the execution of a program and there will be an error messages
|
|
shown in the command window. With such messages \matlab{} tries to
|
|
explain what went wrong and to provide a hint on the possible cause.
|
|
|
|
Bugs that lead to the termination of the execution may be annoying but
|
|
are generally easier to find and to fix than logical errors that stay
|
|
hidden and the results of, e.g. an analysis, are seemingly correct.
|
|
|
|
\begin{important}[Try --- catch]
|
|
There are ways to \code{catch} errors during \enterm{runtime}
|
|
(\determ{Laufzeit}, i.e. when the program is executed) and handle
|
|
them in the program.
|
|
|
|
\begin{lstlisting}[label=trycatch, caption={Try catch clause}]
|
|
try
|
|
y = function_that_throws_an_error(x);
|
|
catch
|
|
y = 0;
|
|
end
|
|
\end{lstlisting}
|
|
|
|
This way of solving errors may seem rather convenient but is
|
|
risky. Having a function throwing an error and catching it in the
|
|
\code{catch} clause will keep your command line clean but may obscure
|
|
logical errors! Take care when using the \code{try}-\code{catch}
|
|
clause.
|
|
\end{important}
|
|
|
|
|
|
\subsection{Syntax errors}\label{syntax_error}
|
|
The most common and easiest to fix type of error. A
|
|
\entermde[error!syntax]{Fehler!Syntax@Syntax\texttildelow}{syntax error} violates the
|
|
rules (spelling and grammar) of the programming language. For example
|
|
every opening parenthesis must be matched by a closing one or every
|
|
\code{for} loop has to be closed by an \code{end}. Usually, the
|
|
respective error messages are clear and the editor will point out and
|
|
highlight most syntax errors.
|
|
|
|
\begin{pagelisting}[label=syntaxerror, caption={Unbalanced parenthesis error.}]
|
|
>> mean(random_numbers
|
|
|
|
|
Error: Expression or statement is incorrect--possibly unbalanced (, {, or [.
|
|
|
|
Did you mean:
|
|
>> mean(random_numbers)
|
|
\end{pagelisting}
|
|
|
|
\subsection{Indexing error}\label{index_error}
|
|
Second on the list of common errors are the
|
|
\entermde[error!indexing]{Fehler!Index@Index\texttildelow}{indexing errors}. Usually
|
|
\matlab{} gives rather precise infromation about the cause, once you
|
|
know what they mean. Consider the following code.
|
|
|
|
\begin{lstlisting}[label=indexerror, caption={Indexing errors.}]
|
|
>> my_array = (1:100);
|
|
>> % first try: index 0
|
|
>> my_array(0)
|
|
Subscript indices must either be real positive integers or logicals.
|
|
|
|
>> % second try: negative index
|
|
>> my_array(-1)
|
|
Subscript indices must either be real positive integers or logicals.
|
|
|
|
>> % third try: a floating point number
|
|
>> my_array(5.7)
|
|
Subscript indices must either be real positive integers or logicals.
|
|
|
|
>> % fourth try: a character
|
|
>> my_array('z')
|
|
Index exceeds matrix dimensions.
|
|
|
|
>> % fifth try: another character
|
|
>> my_array('A')
|
|
ans =
|
|
65 % wtf ?!?
|
|
\end{lstlisting}
|
|
|
|
The first two indexing attempts in listing \ref{indexerror} are rather
|
|
clear. We are trying to access elements with indices that are
|
|
invalid. Remember, indices in \matlab{} start with 1. Negative numbers
|
|
and zero are not permitted. In the third attemp we index using a
|
|
floating point number. This fails because indices have to be 'integer'
|
|
values. Using a character as an index (fourth attempt) leads to a
|
|
different error message that says that the index exceeds the matrix
|
|
dimensions. This indicates that we are trying to read data behind the
|
|
length of our variable \varcode{my\_array} which has 100 elements.
|
|
One could have expected that the character is an invalid index, but
|
|
apparently it is valid but simply too large. The fith attempt finally
|
|
succeeds. But why? \matlab{} implicitely converts the character to a
|
|
number and uses this number to address the element in
|
|
\varcode{my\_array}. The character \varcode{'A'} has the ASCII code 65
|
|
and thus the 65th element of \varcode{my\_array} is returned.
|
|
|
|
\subsection{Assignment error}
|
|
Related to the indexing error, an
|
|
\entermde[error!assignment]{Fehler!Zuweisungs@Zuweisungs\texttildelow}{assignment
|
|
error} occurs when we want to write data into a variable, that does
|
|
not fit into it. Listing \ref{assignmenterror} shows the simple case
|
|
for 1-d data but, of course, it extents to n-dimensional data. The
|
|
data that is to be filled into a matrix hat to fit in all
|
|
dimensions. The command in line 7 works due to the fact, that matlab
|
|
automatically extends the matrix, if you assign values to a range
|
|
outside its bounds.
|
|
|
|
\begin{pagelisting}[label=assignmenterror, caption={Assignment errors.}]
|
|
>> a = zeros(1, 100);
|
|
>> b = 0:10;
|
|
|
|
>> a(1:10) = b;
|
|
In an assignment A(:) = B, the number of elements in A and B must be the same.
|
|
|
|
>> a(100:110) = b;
|
|
>> size(a)
|
|
ans =
|
|
110 1
|
|
\end{pagelisting}
|
|
|
|
\subsection{Dimension mismatch error}
|
|
Similarly, some arithmetic operations are only valid if the variables
|
|
fulfill some size constraints. Consider the following commands
|
|
(listing\,\ref{dimensionmismatch}). The first one (line 3) fails
|
|
because we are trying to add two vectors of different lengths
|
|
elementwise. The matrix multiplication in line 6 also fails since for
|
|
this operations to succeed the inner matrix dimensions must agree (for
|
|
more information on the matrixmultiplication see
|
|
box\,\ref{matrixmultiplication} in chapter\,\ref{programming}). The
|
|
elementwise multiplication issued in line 10 fails for the same reason
|
|
as the addition we tried earlier. Sometimes, however, things
|
|
apparently work but the result may be surprising. The last operation
|
|
in listing\,\ref{dimensionmismatch} does not throw an error but the
|
|
result is something else than the expected elementwise multiplication.
|
|
|
|
% XXX Some arithmetic operations make size constraints, violating them leads to dimension mismatch errors.
|
|
\begin{pagelisting}[label=dimensionmismatch, caption={Dimension mismatch errors.}]
|
|
>> a = randn(100, 1);
|
|
>> b = randn(10, 1);
|
|
>> a + b
|
|
Matrix dimensions must agree.
|
|
|
|
>> a * b % The matrix multiplication!
|
|
Error using *
|
|
Inner matrix dimensions must agree.
|
|
|
|
>> a .* b
|
|
Matrix dimensions must agree.
|
|
|
|
>> c = a .* b'; % works but the result may not be what you expected!
|
|
>> size(c)
|
|
ans =
|
|
100 10
|
|
\end{pagelisting}
|
|
|
|
|
|
|
|
\section{Logical error}
|
|
Sometimes a program runs smoothly and terminates without any
|
|
complaint. This, however, does not necessarily mean that the program
|
|
is correct. We may have made a
|
|
\entermde[error!logical]{Fehler!logischer}{logical error}. Logical
|
|
errors are hard to find, \matlab{} has no chance to detect such errors
|
|
since they do not violate the syntax or cause the throwing of an
|
|
error. Thus, we are on our own to find and fix the bug. There are a
|
|
few strategies that should we can employ to solve the task.
|
|
|
|
\begin{enumerate}
|
|
\item Be sceptical: especially when a program executes without any
|
|
complaint on the first try.
|
|
\item Clean code: Structure your code that you can easily read
|
|
it. Comment, but only where necessary. Correctly indent your
|
|
code. Use descriptive variable and function names.
|
|
\item Keep it simple.
|
|
\item Test your code by writing \entermde[unit test]{Modultest}{unit tests} that test every
|
|
aspect of your program (\ref{unittests}).
|
|
\item Use scripts and functions and call them from the command
|
|
line. \matlab{} can then provide you with more information. It will
|
|
then point to the line where the error happens.
|
|
\item If you still find yourself in trouble: Apply debugging
|
|
strategies to find and fix bugs (\ref{debugging}).
|
|
\end{enumerate}
|
|
|
|
|
|
\section{Avoiding errors}
|
|
It would be great if we could just sit down, write a program, run it,
|
|
and be done with the task. Most likely this will not happen. Rather,
|
|
we will make mistakes and have to bebug the code. There are a few
|
|
guidelines that help to reduce the number of errors.
|
|
|
|
|
|
\subsection{Keep it small and simple}
|
|
|
|
\shortquote{Debugging time increases as a square of the program's
|
|
size.}{Chris Wenham}
|
|
|
|
\shortquote{Everyone knows that debugging is twice as hard as writing
|
|
a program in the first place. So if you're as clever as you can be
|
|
when you write it, how will you ever debug it?}{Brian Kernighan}
|
|
|
|
Break down your programming problems into small parts (functions) that
|
|
do exactly one thing and are thus easily testable. This has already
|
|
been discussed in the context of writing scripts and functions. In
|
|
parts this is just a matter of feeling overwhelmed by 1000 lines of
|
|
code. Further, with each task that you incorporate into the same
|
|
script the probability of naming conflicts (same or similar names for
|
|
variables) increases. Remembering the meaning of a certain variable
|
|
that was defined in the beginning of the script is simply hard.
|
|
|
|
Many tasks within an analysis can be squashed into a single line of
|
|
code. This saves some space in the file, reduces the effort of coming
|
|
up with variable names and simply looks so much more competent than a
|
|
collection of very simple lines. Consider the following listing
|
|
(listing~\ref{easyvscomplicated}). Both parts of the listing solve the
|
|
same problem but the second one breaks the task down to a sequence of
|
|
easy-to-understand commands. Finding logical and also syntactic errors
|
|
is much easier in the second case. The first version is perfectly fine
|
|
but it requires a deep understanding of the applied functions and also
|
|
the task at hand.
|
|
|
|
% XXX Converting a series of spike times into the firing rate as a function of time. Many tasks can be solved with a single line of code. But is this readable?
|
|
\begin{pagelisting}[label=easyvscomplicated, caption={One-liner versus readable code.}]
|
|
% the one-liner
|
|
rate = conv(full(sparse(1, round(spike_times/dt), 1, 1, length(time))), kernel, 'same');
|
|
|
|
% easier to read
|
|
rate = zeros(size(time));
|
|
spike_indices = round(spike_times/dt);
|
|
rate(spike_indices) = 1;
|
|
rate = conv(rate, kernel, 'same');
|
|
\end{pagelisting}
|
|
|
|
The preferred way depends on several considerations. (i) How deep is
|
|
your personal understanding of the programming language? (ii) What
|
|
about the programming skills of your target audience or other people
|
|
that may depend on your code? (iii) Is one solution faster or uses
|
|
less resources than the other? (iv) How much do you have to invest
|
|
into the development of the most elegant solution relative to its
|
|
importance in the project? The decision is yours.
|
|
|
|
|
|
\subsection{Unit tests}\label{unittests}
|
|
|
|
The idea of unit tests to write small programs that test \emph{all}
|
|
functions of a program by testing the program's results against
|
|
expectations. The pure lore of test-driven development requires that
|
|
the tests are written \textbf{before} the actual program is
|
|
written. In parts the tests put the \enterm{functional
|
|
specification}, the agreement between customer and programmer, into
|
|
code. This helps to guarantee that the delivered program works as
|
|
specified. In the scientific context, we tend to be a little bit more
|
|
relaxed and write unit tests, where we think them helpful and often
|
|
test only the obvious things. To write \emph{complete} test suits that
|
|
lead to full \enterm{test coverage} is a lot of work and is often
|
|
considered a waste of time. The first claim is true, the second,
|
|
however, may be doubted. Consider that you change a tiny bit of a
|
|
standing program to adjust it to the current needs, how will you be
|
|
able to tell that it is still valid for the previous purpose? Of
|
|
course you could try it out and be satisfied, if it terminates without
|
|
an error, but, remember, there may be logical errors hiding behind the
|
|
facade of a working program.
|
|
|
|
Writing unit tests costs time, but provides the means to guarantee
|
|
validity.
|
|
|
|
\subsubsection{Unit testing in \matlab{}}
|
|
|
|
Matlab offers a unit testing framework in which small scripts are
|
|
written that test the features of the program. We will follow the
|
|
example given in the \matlab{} help and assume that there is a
|
|
function \varcode{rightTriangle()} (listing\,\ref{trianglelisting}).
|
|
|
|
% XXX Slightly more readable version of the example given in the \matlab{} help system. Note: The variable name for the angles have been capitalized in order to not override the matlab defined functions \code{alpha, beta,} and \code{gamma}.
|
|
\begin{pagelisting}[label=trianglelisting, caption={Example function for unit testing.}]
|
|
function angles = rightTriangle(length_a, length_b)
|
|
ALPHA = atand(length_a / length_b);
|
|
BETA = atand(length_a / length_b);
|
|
hypotenuse = length_a / sind(ALPHA);
|
|
GAMMA = asind(hypotenuse * sind(ALPHA) / length_a);
|
|
|
|
angles = [ALPHA BETA GAMMA];
|
|
end
|
|
\end{pagelisting}
|
|
|
|
This function expects two input arguments that are the length of the
|
|
sides $a$ and $b$ and assumes a right angle between them. From this
|
|
information it calculates and returns the angles $\alpha, \beta,$ and
|
|
$\gamma$.
|
|
|
|
Let's test this function: To do so, create a script in the current
|
|
folder that follows the following rules.
|
|
\begin{enumerate}
|
|
\item The name of the script file must start or end with the word
|
|
'test', which is case-insensitive.
|
|
\item Each unit test should be placed in a separate section/cell of the script.
|
|
\item After the \mcode{\%\%} that defines the cell, a name for the
|
|
particular unit test may be given.
|
|
\end{enumerate}
|
|
|
|
Further there are a few things that are different in tests compared to normal scripts.
|
|
\begin{enumerate}
|
|
\item The code that appears before the first section is the in the so
|
|
called \emph{shared variables section} and the variables are available to
|
|
all tests within this script.
|
|
\item In the \emph{shared variables section}, one can define
|
|
preconditions necessary for your tests. If these preconditions are
|
|
not met, the remaining tests will not be run and the test will be
|
|
considered failed and incomplete.
|
|
\item When a script is run as a test, all variables that need to be
|
|
accessible in all test have to be defined in the \emph{shared
|
|
variables section}.
|
|
\item Variables defined in other workspaces are not accessible to the
|
|
tests.
|
|
\end{enumerate}
|
|
|
|
The test script for the \varcode{rightTriangle()} function
|
|
(listing\,\ref{trianglelisting}) may look like in
|
|
listing\,\ref{testscript}.
|
|
|
|
\begin{pagelisting}[label=testscript, caption={Unit test for the \varcode{rightTriangle()} function stored in an m-file testRightTriangle.m}]
|
|
tolerance = 1e-10;
|
|
|
|
% preconditions
|
|
angles = rightTriangle(7, 9);
|
|
assert(angles(3) == 90, 'Fundamental problem: rightTriangle is not producing a right triangle')
|
|
|
|
%% Test 1: sum of angles
|
|
angles = rightTriangle(7, 7);
|
|
assert((sum(angles) - 180) <= tolerance)
|
|
|
|
angles = rightTriangle(7, 7);
|
|
assert((sum(angles) - 180) <= tolerance)
|
|
|
|
angles = rightTriangle(2, 2 * sqrt(3));
|
|
assert((sum(angles) - 180) <= tolerance)
|
|
|
|
angles = rightTriangle(1, 150);
|
|
assert((sum(angles) - 180) <= tolerance)
|
|
|
|
%% Test: isosceles triangles
|
|
angles = rightTriangle(4, 4);
|
|
assert(abs(angles(1) - 45) <= tolerance)
|
|
assert(angles(1) == angles(2))
|
|
|
|
%% Test: 30-60-90 triangle
|
|
angles = rightTriangle(2, 2 * sqrt(3));
|
|
assert(abs(angles(1) - 30) <= tolerance)
|
|
assert(abs(angles(2) - 60) <= tolerance)
|
|
assert(abs(angles(3) - 90) <= tolerance)
|
|
|
|
%% Test: Small angle approx
|
|
angles = rightTriangle(1, 1500);
|
|
smallAngle = (pi / 180) * angles(1); % radians
|
|
approx = sin(smallAngle);
|
|
assert(abs(approx - smallAngle) <= tolerance, 'Problem with small angle approximation')
|
|
\end{pagelisting}
|
|
|
|
In a test script we can execute any code. The actual test whether or
|
|
not the results match our predictions is done using the
|
|
\code{assert()} function. This function basically expects a
|
|
boolean value and if this is not true, it raises an error that, in the
|
|
context of the test does not lead to a termination of the program. In
|
|
the tests above, the argument to assert is always a boolean expression
|
|
which evaluates to \code{true} or \code{false}. Before the first unit
|
|
test (``Test 1: sum of angles'', that starts in line 5,
|
|
listing\,\ref{testscript}) a precondition is defined. The test assumes
|
|
that the $\gamma$ angle must always be 90$^\circ$ since we aim for a
|
|
right triangle. If this is not true, the further tests, will not be
|
|
executed. We further define a \varcode{tolerance} variable that is
|
|
used when comparing double values (Why might the test on equality of
|
|
double values be tricky?).
|
|
|
|
\begin{pagelisting}[label=runtestlisting, caption={Run the test!}]
|
|
result = runtests('testRightTriangle')
|
|
\end{pagelisting}
|
|
|
|
During the run, \matlab{} will put out error messages onto the command
|
|
line and a summary of the test results is then stored within the
|
|
\varcode{result} variable. These can be displayed using the function
|
|
\code[table()]{table(result)}.
|
|
|
|
\begin{pagelisting}[label=testresults, caption={The test results.}, basicstyle=\ttfamily\scriptsize]
|
|
table(result)
|
|
ans =
|
|
4x6 table
|
|
|
|
Name Passed Failed Incomplete Duration Details
|
|
_________________________________ ______ ______ ___________ ________ ____________
|
|
|
|
'testR.../Test_SumOfAngles' true false false 0.011566 [1x1 struct]
|
|
'testR.../Test_IsoscelesTriangles' true false false 0.004893 [1x1 struct]
|
|
'testR.../Test_30_60_90Triangle' true false false 0.005057 [1x1 struct]
|
|
'testR.../Test_SmallAngleApprox' true false false 0.0049 [1x1 struct]
|
|
\end{pagelisting}
|
|
|
|
So far so good, all tests pass and our function appears to do what it
|
|
is supposed to do. But tests are only as good as the programmer who
|
|
designed them. The attentive reader may have noticed that the tests
|
|
only check a few conditions. But what if we passed something else than
|
|
a numeric value as the length of the sides $a$ and $b$? Or a negative
|
|
number, or zero?
|
|
|
|
|
|
|
|
\section{Debugging strategies}\label{debugging}
|
|
|
|
If you still find yourself in trouble you can apply a few strategies
|
|
that help to solve the problem.
|
|
|
|
\begin{enumerate}
|
|
\item Lean back and take a breath.
|
|
\item Read the error messages and identify the line or command where
|
|
the error happens. Unfortunately, the position that breaks is not
|
|
always the line or command that really introduced the bug. In some
|
|
instances the actual error hides a few lines above.
|
|
\item No idea what the error message is trying to say? Google it!
|
|
\item Read the program line by line and understand what each line is
|
|
doing.
|
|
\item Use \code{disp()} to print out relevant information on the command
|
|
line and compare the output with your expectations. Do this step by
|
|
step and start at the beginning.
|
|
\item Use the \matlab{} debugger to stop execution of the code at a
|
|
specific line and proceed step by step. Be sceptical and test all
|
|
steps for correctness.
|
|
\item Call for help and explain the program to someone else. When you
|
|
do this, start at the beginning and walk through the program line by
|
|
line. Often, it is not necessary that the other person is a
|
|
programmer or exactly understands what is going on. Rather, it is the
|
|
own reflection on the problem and the chosen approach that helps
|
|
finding the bug (this strategy is also known as \enterm{Rubber
|
|
ducking}).
|
|
\end{enumerate}
|
|
|
|
|
|
\subsection{Debugger}
|
|
|
|
The \matlab{} editor (figure\,\ref{editor_debugger}) supports
|
|
interactive debugging. Once you save an m-file in the editor and it
|
|
passes the syntax check, i.e. the little box in the upper right corner
|
|
of the editor window is green or orange, you can set one or several
|
|
\entermde[break point]{Haltepunkt}{break points}. When the program is
|
|
executed by calling it from the command line it will be stopped at the
|
|
line with the breakpoint. In the editor this is indicated by a green
|
|
arrow. The command line will change to indicate that we are now
|
|
stopped in debug mode (listing\,\ref{debuggerlisting}).
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\linewidth]{editor_debugger.png}
|
|
\titlecaption{\label{editor_debugger} Setting
|
|
breakpoints.}{Screenshot of the \matlab{} m-file editor. Once a
|
|
file is saved and passes the syntax check (the indicator in the
|
|
top-right corner of the editor window turns green or orange), a
|
|
breakpoint can be set. Breakpoints can be set either using the
|
|
dropdown menu on top or by clicking the line number on the left
|
|
margin. An active breakpoint is indicated by a red dot. The line
|
|
at which the program execution was stopped is indicated by the
|
|
green arrow.}
|
|
\end{figure}
|
|
|
|
|
|
\begin{pagelisting}[label=debuggerlisting, caption={Command line when the program execution was stopped in the debugger.}]
|
|
>> simplerandomwalk
|
|
6 for run = 1:num_runs
|
|
K>>
|
|
\end{pagelisting}
|
|
|
|
When stopped in the debugger we can view and change the state of the
|
|
program at this point, we can also issue commands to try the next
|
|
steps etc. Beware however, the state of a variable can be altered or
|
|
even deleted which might affect the execution of the remaining code.
|
|
|
|
The toolbar of the editor offers now a new set of tools for debugging:
|
|
\begin{enumerate}
|
|
\item \textbf{Continue} --- simply move on until the program terminates or the
|
|
execution reaches the next breakpoint.
|
|
\item \textbf{Step} --- Execute the next command and stop.
|
|
\item \textbf{Step in} --- If the next command is a
|
|
function call, step into it and stop at the first command.
|
|
\item \textbf{Step out} --- Suppose you entered a function with
|
|
\emph{step in}. \emph{Step out} will continue with the execution and
|
|
stop once you are back in the calling function.
|
|
\item \textbf{Run to cursor} --- Execute all statements up to the
|
|
current cursor position.
|
|
\item \textbf{Quit debugging} --- Immediately stop the debugging
|
|
session and stop further code execution.
|
|
\end{enumerate}
|
|
|
|
The debugger offers some more (advanced) features but the
|
|
functionality offered by the basic tools is often enough to debug a
|
|
program.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%\printsolutions
|