first iteration of the debugging chapter

2017-10-23 17:50:38 +02:00 · 2017-10-23 17:50:38 +02:00 · b2aa5335b3
commit b2aa5335b3
parent 667eec03de
2 changed files with 151 additions and 139 deletions
--- a/debugging/lecture/debugging.tex
+++ b/debugging/lecture/debugging.tex
@ -18,21 +18,20 @@ some hints that help to minimize errors.
 \section{Types of errors and error messages}

 There are a number of different classes of programming errors and it
-is good to know the common ones. When we make a programming error
-there are some that will lead to corrupted syntax, or invalid
-operations and \matlab{} will \codeterm{throw} an error. Throwing an
-error ends the execution of a program and there will be an error
-messages shown in the command window. With such messages \matlab{}
-tries to explain what went wrong and provide a hint on the possible
-cause.
+is good to know the common ones. Some of your programming errors will
+will lead to violations of the syntax or to invalid operations that
+will cause \matlab{} to \codeterm{throw} an error. Throwing an error
+ends the execution of a program and there will be an error messages
+shown in the command window. With such messages \matlab{} tries to
+explain what went wrong and to provide a hint on the possible cause.

 Bugs that lead to the termination of the execution may be annoying but
-are generally easier to find and fix than logical errors that stay
+are generally easier to find and to fix than logical errors that stay
 hidden and the results of, e.g. an analysis, are seemingly correct.

 \begin{important}[Try --- catch]
-There are ways to \codeterm{catch} errors during \codeterm{runtime}
-(i.e. when the program is executed) and handle them in the program.
+  There are ways to \codeterm{catch} errors during \codeterm{runtime}
+  (i.e. when the program is executed) and handle them in the program.

 \begin{lstlisting}[label=trycatch, caption={Try catch clause}]
  try
@ -50,12 +49,12 @@ obscure logical errors! Take care when using the \codeterm{try-catch
 \end{important}


-\subsection{\codeterm{Syntax error}}
+\subsection{\codeterm{Syntax errors}}\label{syntax_error}
 The most common and easiest to fix type of error. A syntax error
 violates the rules (spelling and grammar) of the programming
 language. For example every opening parenthesis must be matched by a
-closing one or every \keyword{for} loop has to be closed by an
-\keyword{end}. Usually, the respective error messages are clear and
+closing one or every \code{for} loop has to be closed by an
+\code{end}. Usually, the respective error messages are clear and
 the editor will point out and highlight most \codeterm{syntax error}s.

 \begin{lstlisting}[label=syntaxerror, caption={Unbalanced parenthesis error.}]
@ -67,7 +66,7 @@ Did you mean:
 >>   mean(random_numbers)
 \end{lstlisting}

-\subsection{\codeterm{Indexing error}}
+\subsection{\codeterm{Indexing error}}\label{index_error}
 Second on the list of common errors are the indexing errors. Usually
 \matlab{} gives rather precise infromation about the cause, once you
 know what they mean. Consider the following code.
@ -104,19 +103,72 @@ floating point number. This fails because indices have to be 'integer'
 values.  Using a character as an index (fourth attempt) leads to a
 different error message that says that the index exceeds the matrix
 dimensions. This indicates that we are trying to read data behind the
-length of our variable \codevar{my\_array} which has 100 elements.
+length of our variable \varcode{my\_array} which has 100 elements.
 One could have expected that the character is an invalid index, but
 apparently it is valid but simply too large. The fith attempt finally
 succeeds. But why? \matlab{} implicitely converts the \codeterm{char}
 to a number and uses this number to address the element in
 \varcode{my\_array}. The \codeterm{char} has the ASCII code 65 and
-thus the 65th element of \varcode{my_array} is returned.
+thus the 65th element of \varcode{my\_array} is returned.

 \subsection{\codeterm{Assignment error}}
-This error occurs when we want to write data into a vector.
+Related to the Indexing error this error occurs when we want to write
+data into a variable, that does not fit into it. Listing
+\ref{assignmenterror} shows the simple case for 1-d data but, of
+course, it extents to n-dimensional data. The data that is to be
+filled into a matrix hat to fit in all dimensions. The command in line
+7 works due to the fact, that matlab automatically extends the matrix,
+if you assign values to a range outside its bounds.
+
+\begin{lstlisting}[label=assignmenterror, caption={Assignment errors.}]
+>> a = zeros(1, 100);
+>> b = 0:10;
+
+>> a(1:10) = b;
+     In an assignment  A(:) = B, the number of elements in A and B must be the same.
+
+>> a(100:110) = b;
+>> size(a)
+ans =
+     110    1
+\end{lstlisting}
+
+\subsection{\codeterm{Dimension mismatch error}}
+Similarly, some arithmetic operations are only valid if the variables
+fulfill some size constraints. Consider the following commands
+(listing\,\ref{dimensionmismatch}). The first one (line 3) fails
+because we are trying to do al elementwise add on two vectors that
+have different lengths, respectively sizes. The matrix multiplication
+in line 6 also fails since for this operations to succeed the inner
+matrix dimensions must agree (for more information on the
+matrixmultiplication see box\,\ref{matrixmultiplication} in
+chapter\,\ref{programming}). The elementwise multiplication issued in
+line 10 fails for the same reason as the addition we tried
+earlier. Sometimes, however, things apparently work but the result may
+be surprising. The last operation in listing\,\ref{dimensionmismatch}
+does not throw an error but the result is something else than the
+expected elementwise multiplication.
+
+\begin{lstlisting}[label=dimensionmismatch, caption={Some arithmetic operations make size constraints, violating them leads to dimension mismatch errors.}]
+  >> a = randn(100, 1);
+  >> b = randn(10, 1);
+  >> a + b
+  Matrix dimensions must agree.
+
+  >> a * b      % The matrix multiplication!
+  Error using  *
+  Inner matrix dimensions must agree.
+
+  >> a .* b
+  Matrix dimensions must agree.
+
+  >> c = a .* b';  % works but the result may not be what you expected!
+  >> size(c)
+  ans =
+       100    10
+\end{lstlisting}
+

-\subsection{Name error}
-\subsection{Arithmetic error}

 \section{Logical error}
 Sometimes a program runs smoothly and terminates without any
@ -132,9 +184,7 @@ there are a few strategies that should help us.
 \item Clean code: Structure your code that you can easily read
  it. Comment, but only where necessary. Correctly indent your
  code. Use descriptive variable and function names.
-\item Keep it simple (below).
-\item Read error messages, try to understand what \matlab{} wants to
-  tell.
+\item Keep it simple.
 \item Use scripts and functions and call them from the command
  line. \matlab{} can then provide you with more information. It will
  then point to the line where the error happens.
@ -143,17 +193,20 @@ there are a few strategies that should help us.
 \end{enumerate}


-\subsection{Avoiding errors}
+\subsection{Avoiding errors --- Keep it small and simple}
+
 It would be great if we could just sit down write a program, run it
 and be done. Most likely this will not happen. Rather, we will make
 mistakes and have to bebug the code. There are a few guidelines that
 help to reduce the number of errors.

-\subsection{The Kiss  principle: 'Keep it small and simple' or 'simple and stupid'}
-
 \shortquote{Debugging time increases as a square of the program's
  size.}{Chris Wenham}

+\shortquote{Everyone knows that debugging is twice as hard as writing
+  a program in the first place. So if you're as clever as you can be
+  when you write it, how will you ever debug it?}{Brian Kernighan}
+
 Break down your programming problems into small parts (functions) that
 do exactly one thing. This has already been discussed in the context
 of writing scripts and functions. In parts this is just a matter of
@ -163,11 +216,6 @@ conflicts (same or similar names for variables) increases. Remembering
 the meaning of a certain variable that was defined in the beginning of
 the script is just hard.

-
-\shortquote{Everyone knows that debugging is twice as hard as writing
-  a program in the first place. So if you're as clever as you can be
-  when you write it, how will you ever debug it?}{Brian Kernighan}
-
 Many tasks within an analysis can be squashed into a single line of
 code. This saves some space in the file, reduces the effort of coming
 up with variable names and simply looks so much more competent than a
@ -198,125 +246,89 @@ less resources than the other? (iv) How much do you have to invest
 into the development of the most elegant solution relative to its
 importance in the project? The decision is up to you.

-\subsection{Read error messages carefully and call programs from the command line.}
+
+\section{Debugging strategies}
+
+If you find yourself in trouble you can apply a few strategies to
+solve the problem.
+
+\begin{enumerate}
+\item Lean back and take a breath.
+\item Read the error messages and identify the position in the code
+  where the error happens. Unfortunately this is not always the line
+  or command that really introduced the bug. In some instances the
+  actual error hides a few lines above.
+\item No idea what the error message is trying to say? Google it!
+\item Read the program line by line and understand what each line is
+  doing.
+\item Use \code{disp} to print out relevant information on the command
+  line and compare the output with your expectations.  Do this step by
+  step and start at the beginning.
+\item Use the \matlab{} debugger to stop execution of the code at a
+  specific line and proceed step by step. Be sceptical and test all
+  steps for correctness.
+\item Call for help and explain the program to someone else. When you
+  do this start at the beginning and walk thorough the code line by
+  line. Often it is not necessary that the other person is a
+  programmer or exactly understands what is going on. Often it is the
+  own refelction on the probelem and the chosen approach that helps
+  finding the bug. (This is strategy is also known as \codeterm{Rubber
+    duck debugging}.
+\end{enumerate}


+\subsection{Debugger}

-\section{Error messages}
+The \matlab{} editor (figure\,\ref{editor_debugger}) supports
+interactive debugging. Once you save a m-file in the editor and it
+passes the syntax check, i.e. the little box in the upper right corner
+of the editor window is green or orange, you can set on or several
+\codeterm{break point}s. When the porgram is executed by calling it
+from the command line it will be stopped at the line with the
+breakpoint. In the editor this is indicated by a green arrow. The
+command line will change too to indicate that we are now stopped in
+debug mode (listing\,\ref{debuggerlisting}).

-
-\begin{ibox}[tp]{\label{stacktracebox}Stacktrace or Stack Traceback}
-
-  
-\end{ibox}
-
-
-Es hilft ungemein, wenn zusammengeh\"orige Skripte und Funktionen im
-gleichen Ordner auf der Festplatte zu finden sind. Es bietet sich also
-an, f\"ur jede Analyse einen eigenen Ordner anzulegen und in diesem
-die zugeh\"origen \codeterm{m-files} abzulegen. Auf eine tiefere
-Schachtelung in weitere Unterordner kann in der Regel verzichtet
-werden. \matlab{} erzeugt einen ``MATLAB'' Ordner im eigenen
-\file{Documents} (Linux) oder \file{Eigene Dokumente} (Windows)
-Ordner. Es bietet sich an, diesen Ordner als Wurzelverzeichnis f\"ur
-eigene Arbeiten zu verwenden. Nat\"urlich kann auch jeder andere Ort
-gew\"ahlt werden. In dem Beispiel in \figref{fileorganizationfig} wird
-innerhalb dieses Ordners f\"ur jedes Projekt ein eigener Unterordner
-erstellt, in welchem wiederum f\"ur jedes Problem, jede Analyse ein
-weiterer Unterodner erstellt wird. In diesen liegen sowohl die
-ben\"otigten \codeterm{m-files} also auch die Resultate der Analyse
-(Abbildungen, Daten-Dateien). Zu bemerken sind noch zwei weitere
-Dinge. Im Projektordner existiert ein Skript (analysis.m), das dazu
-gedacht ist, alle Analysen aufzurufen. Des Weiteren gibt es parallel
-zu den Projektordnern einen \file{functions}-Ordner in dem Funktionen
-liegen, die in mehr als einem Projekt oder einer Analyse gebraucht
-werden.
-
-\begin{figure}[tp]
-  \includegraphics[width=0.75\textwidth]{no_bug}
-  \titlecaption{\label{fileorganizationfig} M\"ogliche Organisation von
-    Programmcode im Dateisystem.}{ F\"ur jedes Projekt werden
-    Unterordner f\"ur die einzelnen Analysen angelegt. Auf Ebene des
-    Projektes k\"onnte es ein Skript (hier ``analysis.m'') geben,
-    welches alle Analysen in den Unterordnern anst\"o{\ss}t.}
+\begin{figure}
+  \centering
+  \includegraphics[width=0.9\linewidth]{editor_debugger.png}
+  \caption{Screenshot of the \matlab{} m-file editor. Once a file is
+    saved and passes the syntax check the green indicator (top-right
+    corner of the editor window), a breakpoint can be set. Breakpoints
+    can bes set either using the dropdown menu on top or by clicking
+    the line number on the left margin. An active breakpoint is
+    indicated by a red dot.}\label{editor_debugger}
 \end{figure}


-\Section{Namensgebung von Funktionen und Skripten}
-
-\matlab{} sucht Funktionen und Skripte ausschlie{\ss}lich anhand des
-Namens. Dabei spielt die Gro{\ss}- und Kleinschreibung eine Rolle. Die
-beiden Dateien \file{test\_funktion.m} und \file{Test\_Funktion.m}
-zwei unterschiedliche Funktionen benennen k\"onnen. Diese Art der
-Variation des Namens ist nat\"urlich nicht sinnvoll. Sie tr\"agt keine
-Information \"uber den Unterschied der beiden Funktionen. Auch sagt
-der Name nahezu nichts \"uber den Zweck der Funktion aus.
-
-Die Namensgebung f\"allt mitunter nicht leicht --- manchmal ist es
-sogar der schwierigste Aspekt des Programmierens!  Ausdrucksstarke
-Namen zu finden lohnt sich aber. Ausdrucksstark bedeutet, dass sich
-aus dem Namen R\"uckschl\"usse auf den Zweck ziehen lassen sollte.
-
-\begin{important}[Benennung von Funktionen und Skripten]
-  Die Namen von Funktionen und Skripten sollten m\"oglichst viel \"uber
-  die Funktionsweise oder den Zweck aussagen (\file{firingrates.m}
-  statt \file{uebung.m}). Gute Namen f\"ur Funktionen und Skripte sind
-  die beste Dokumentation.
-\end{important}
-
-In Namen verbietet \matlab{} verbietet Leerzeichen, Sonderzeichen und
-Umlaute. Namen d\"urfen auch nicht mit Zahlen anfangen. Es mach f\"ur
-die Namensgebung selbst keine weiteren Vorgaben. Allerdings folgt die
-Benennung der in \matlab{} vordefinierten Funktionen gewissen Mustern:
-\begin{itemize}
-\item Namen werden immer klein geschrieben.
-\item Es werden gerne Abk\"urzungen eingesetzt (z.B. \code{xcorr()}
-  f\"ur die Kreuzkorrelation oder \code{repmat()} f\"ur ``repeat matrix'')
-\item Funktionen, die zwischen Formaten konvertieren sind immer nach
-  dem Muster ``format2format'' (z.B. \code{num2str()} f\"ur die
-  Konvertierung ``number to string'', Umwandlung eines numerischen
-  Wertes in einen Text) benannt.
-\end{itemize}
-
-
-\begin{important}[Benennung von Variablen]
-  Die Namen von Variablen sollten m\"oglichst viel \"uber ihren Inhalt
-  aussagen (\varcode{spike\_count} statt \varcode{x}). Gute Namen
-  f\"ur Variablen sind die beste Dokumentation.
-\end{important}
-
- 
-\begin{lstlisting}[label=chaoticcode, caption={Un\"ubersichtliche Implementation des Random-walk.}]
-
+\begin{lstlisting}[label=debuggerlisting, caption={Command line when the program execution was stopped in the debugger.}]
+>> simplerandomwalk
+6   for run = 1:num_runs
+K>>
 \end{lstlisting}

-\pagebreak[4]
+When stopped in the debugger we can view and change the state of the
+program at this step and try the next steps etc. Beware, however that
+the state of a variable can be altered or even deleted which might
+affect the execution of the remaining code.

-\begin{lstlisting}[label=cleancode, caption={\"Ubersichtliche Implementation des Random-walk.}]
-num_runs = 10;
-max_steps = 1000;
-positions = zeros(max_steps, num_runs);
+The toolbar of the editor offers now a new set of tools for debugging:
+\begin{enumerate}
+\item \textbf{Continue} --- simply move on until the program terminates or the
+  execution reaches the next breakpoint.
+\item \textbf{Step} --- Execute the next command and stop.
+\item \textbf{Step in} --- If the next command is the execution of a
+  function step into it and stop at the first command.
+\item \textbf{Step out} --- If the next command is a function call,
+  proceed until the called function returns, then stop.
+\item \textbf{Run to cursor} --- Execute all statements up to the
+  current cursor position.
+\item \textbf{Quit debugging} --- Immediately stop the debugging
+  session and stop the further code execution.
+\end{enumerate}

-for run = 1:num_runs
-    for step = 2:max_steps
-        x = randn(1);
-        if x < 0
-            positions(step, run) = positions(step-1, run) + 1;
-        elseif x > 0
-            positions(step, run) = positions(step-1, run) - 1;
-        end
-    end
-end
-\end{lstlisting}
+The debugger offers some more (advanced) features but the
+functionality offered by the basic tools is often enough to debug the
+code.


-% \begin{exercise}{logicalVector.m}{logicalVector.out}
-%   Erstelle einen Vektor \varcode{x} mit den Werten 0--10.
-%   \begin{enumerate}
-%   \item F\"uhre aus: \varcode{y = x < 5}
-%   \item Gib den Inhalt von \varcode{y} auf dem Bildschirm aus.
-%   \item  Was ist der Datentyp von \varcode{y}?
-%   \item  Gibt alle Elemente aus \varcode{x} zur\"uck, die kleiner als 5 sind.
-%   \end{enumerate}
-%   \pagebreak[4]
-% \end{exercise}
--- a/debugging/lecture/figures/editor_debugger.png
+++ b/debugging/lecture/figures/editor_debugger.png