428 lines
19 KiB
TeX
428 lines
19 KiB
TeX
\chapter{Code style}
|
|
|
|
\shortquote{Any code of your own that you haven't looked at for six or
|
|
more months might as well have been written by someone
|
|
else.}{Eagleson's law}
|
|
|
|
Cultivating a good code style is not just a matter of good taste but
|
|
rather is a key ingredient for readability and maintainability of code
|
|
and, in the end, facilitates reproducibility of scientific
|
|
results. Programs should be written and structured in a way that
|
|
supports outsiders as well the author himself --- a few weeks or
|
|
months after it was written --- to understand the programs'
|
|
rationale. Clean code pays off for the original author as well as
|
|
others that are supposed to use the code.
|
|
|
|
Clean code addresses several issues:
|
|
\begin{enumerate}
|
|
\item The programs' structure.
|
|
\item Naming of scripts and functions.
|
|
\item Naming of variables and constants.
|
|
\item Application of indentation and empty lines to define blocks.
|
|
\item Use of comments and inline documentation.
|
|
\item Delegation of repeated code to functions and dedicated
|
|
subroutines.
|
|
\end{enumerate}
|
|
|
|
\section{Organization of programs on the file system}
|
|
|
|
While introducing scripts and functions we suggested a typical program
|
|
layout (box\,\ref{whenscriptsbox}). The idea is to create a single
|
|
entry point by having one script that controls the rest of the program
|
|
by calling functions that work on the data and managing the
|
|
results. Applying this structure makes it easy to understand the flow
|
|
of the program but two questions remain: (i) How to organize the files
|
|
on the file system and (ii) how to name them that the controlling
|
|
script is easily identified among the other \entermde[m-file]{m-File}{m-files}.
|
|
|
|
Upon installation \matlab{} creates a folder called \file{MATLAB} in
|
|
the user space (Windows: My files, Linux: Documents, MacOS:
|
|
Documents). Since this folder is already appended to the Matlab search
|
|
path (Box~\ref{matlabpathbox}), it is easiest to stick to it for the
|
|
moment. Of course, any other location can specified as well. Generally
|
|
it is of great advantage to store related scripts and functions within
|
|
the same folder on the hard drive. An easy approach is to create a
|
|
project-specific folder structure that contains sub-folders for each
|
|
task (analysis) and to store all related \entermde[m-file]{m-File}{m-files}
|
|
(screenshot \ref{fileorganizationfig}). In these task-related folders
|
|
one may consider to create a further sub-folder to store results
|
|
(created figures, result data). On the project level a single script
|
|
(\file{analysis.m}) controls the whole process. In parallel to the
|
|
project folder we suggest to create an additional folder for functions
|
|
that are or may be relevant across different projects.
|
|
|
|
Within such a structure it is quite likely that programs in different
|
|
projects share the same name (e.g. a \varcode{load\_data.m}
|
|
function). Usually this will not lead to conflicts due to the way
|
|
matlab searches for matching functions which always starts in the
|
|
current folder (more information on the \matlab-path in
|
|
Box~\ref{matlabpathbox}).
|
|
|
|
\begin{figure}[tp]
|
|
\includegraphics[width=0.75\textwidth]{program_organization}
|
|
\titlecaption{\label{fileorganizationfig} Possible folder structure
|
|
for maintaining program code on the file system.}{For each project
|
|
one maintains an individual folder in which analyses or tasks may
|
|
be structured in sub-folders. Within each analysis a \file{main.m}
|
|
script is the entry point for the analyses. On the project level
|
|
there could be a single script that triggers and controls all
|
|
analyses and tasks in the sub-folders. Functions that are of
|
|
general interest across projects are best kept in a dedicated
|
|
folder outside the project sub-structure.}
|
|
\end{figure}
|
|
|
|
|
|
\begin{ibox}[tp]{\label{matlabpathbox}\matlab{} search path}
|
|
The \entermde{Suchpfad}{search path} defines where \matlab{} looks
|
|
for scripts and functions. When calling a function from the command
|
|
line \matlab{} needs to figure out which function is addressed and
|
|
starts looking for it in the current path. If this fails it will
|
|
crawl all locations listed in the search path (see figure). The
|
|
\entermde{Suchpfad}{search path} is basically a list of
|
|
folders. \matlab{} will go through this list from front to end and
|
|
the search will stop on the first match. This implies that the order
|
|
in the search path may affect which version of functions that share
|
|
the same name is used. Note: \matlab{} does not perform a recursive
|
|
search. That is, a function that resides in a sub-folder that is not
|
|
explicitly listed in the \entermde{Suchpfad}{search path} will not be found.
|
|
|
|
\vspace{2ex}
|
|
\includegraphics[width=0.9\textwidth]{search_path}
|
|
\vspace{1.5ex}
|
|
|
|
The search path can be managed from the command line by using the
|
|
functions \code{addpath()} or \code{userpath()}. Alternatively, the
|
|
\matlab{} UI offers a graphical tool for adding/removing paths, or
|
|
changing the order of entries.
|
|
|
|
The current working directory can be changed via the UI or also the
|
|
command line using the command \code{cd} (for change directory). The
|
|
current path is shown in the current directory text field of the UI
|
|
or can be requested using the command \code{pwd} (for present work
|
|
directory). The function \code{which()} shows the full path of the
|
|
actually used function. For example, finding out which \code{mean()}
|
|
function is used gives a result similar to:
|
|
\begin{lstlisting}[label=useofwhich, caption={Use of 'which'}]
|
|
>> which('mean')
|
|
/Applications/MATLAB2018b.app/toolbox/matlab/datafun/mean.m
|
|
\end{lstlisting}
|
|
\end{ibox}
|
|
|
|
\section{Naming things}
|
|
The dictum of good code style is: ``Program code must be readable.''
|
|
Expressive names are extraordinarily important in this respect. Even
|
|
if it is tricky to find expressive names that are not overly long,
|
|
naming should be taken seriously.
|
|
|
|
\matlab{} has a few rules about names: Names must not start with a
|
|
number, they must not contain blanks or other special characters like
|
|
e.g. German Umlauts. Otherwise one is free to use whatever suits. The
|
|
names of pre-defined functions shipped with \matlab{} follows several
|
|
patterns:
|
|
\begin{itemize}
|
|
\item Names are always lowercase.
|
|
\item Names are often abbreviations (e.g. \code{xcorr()}
|
|
stands for cross-correlation \code{repmat()} for ``repeat matrix'').
|
|
\item Functions that convert between formats are named according to
|
|
the pattern ``format2format'' (e.g. \code{num2str()} for ``number to string'' conversion).
|
|
\end{itemize}
|
|
|
|
There are other common patterns such as the \emph{camelCase} in which
|
|
the first character of compound words is capitalized. Other
|
|
conventions use the underscore to separate the individual words
|
|
(\emph{snake\_case}). A function that counts the number of action
|
|
potentials could be named \file{spikeCount.m} or
|
|
\file{spike\_count.m}.
|
|
|
|
The same naming rules apply for scripts and functions as well as
|
|
variables and constants.
|
|
|
|
\subsection{Naming scripts and functions}
|
|
\matlab{} will search the search path (Box \ref{matlabpathbox})
|
|
exclusively by name. This search is case-sensitive which implies that
|
|
the files \file{test\_function.m} and \file{Test\_function.m} are two
|
|
different things. It is self-evident that choosing such names is
|
|
nonsensical because the tiny difference in the name contains no cue
|
|
about the difference between the two versions and the function names
|
|
themselves tell close to nothing about the purpose. Finding good names
|
|
is not trivial. Sometimes it is harder than the programming
|
|
itself. Choosing \emph{expressive names} that provide information about a
|
|
function's purpose, however, pays off!
|
|
|
|
\begin{important}[Naming scripts and functions]
|
|
Names of functions and scripts should be expressive in the sense
|
|
that the name provides information about the function's purpose.
|
|
(\file{estimate\_firingrate.m} tells much more than
|
|
\file{exercise1.m}). Choosing a good name replaces large parts of
|
|
the documentation.
|
|
\end{important}
|
|
|
|
\subsection{Naming variables and constants}
|
|
|
|
While the names of scripts and functions describe the purpose, names
|
|
of variables describe the stored content. A variable storing the
|
|
average number of actions potentials could be called\\
|
|
\varcode{average\_spike\_count}. If this variable is meant to store
|
|
multiple spike counts the plural form would be appropriate\\
|
|
(\varcode{average\_spike\_counts}).
|
|
|
|
The control variables used in the head of a \code{for} loop are often
|
|
simply named \varcode{i}, \varcode{j} or \varcode{k}. This kind-of
|
|
clashes with the previously made statements but since it is a very
|
|
common pattern the meaning of such variables in the context of the
|
|
loop is quite obvious. This should, however, be the only exception to
|
|
the general rule of expressive naming.
|
|
|
|
\begin{important}[Naming of variables]
|
|
The names of variables should be expressive. That is, the name
|
|
itself should tell about the content of the variable. The name
|
|
\varcode{spike\_count} tells much more about the stored information
|
|
than \varcode{x}. Choosing a good variable name replaces additional
|
|
comments.
|
|
\end{important}
|
|
|
|
|
|
\section{Code style}
|
|
Readability of program code depends strongly on whether or not a
|
|
consistent code style is applied. A program that is only randomly
|
|
indented or that contains lots of empty lines is very hard to read and
|
|
to comprehend. Even though the \matlab{} language (as many others)
|
|
does not enforce indentation, indentation is very powerful for
|
|
defining coherent blocks. The \matlab{} editor supports this by an
|
|
auto-indentation mechanism. A selected section of the code and be
|
|
automatically indented by pressing \keycode{Ctrl-I}.
|
|
|
|
Interspersing empty lines is very helpful to separate regions in the
|
|
code that belong together. Too many empty lines, however lead to
|
|
hard-to-read code because it might require more space than a granted
|
|
by the screen and thus takes overview.
|
|
|
|
The following two listings show basically the same implementation of a
|
|
random walk\footnote{A random walk is a simple simulation of Brownian
|
|
motion. In each simulation step an agent takes a step into a
|
|
randomly chosen direction.} once in a rather chaotic version
|
|
(listing \ref{chaoticcode}) then in cleaner way (listing
|
|
\ref{cleancode})
|
|
|
|
\begin{pagelisting}[label=chaoticcode, caption={Chaotic implementation of the random-walk.}]
|
|
num_runs = 10; max_steps = 1000;
|
|
|
|
positions = zeros(max_steps, num_runs);
|
|
|
|
for run = 1:num_runs
|
|
|
|
|
|
for step = 2:max_steps
|
|
|
|
x = randn(1);
|
|
if x<0
|
|
positions(step, run)= positions(step-1, run)+1;
|
|
|
|
elseif x>0
|
|
positions(step,run)=positions(step-1,run)-1;
|
|
end
|
|
end
|
|
end
|
|
\end{pagelisting}
|
|
|
|
\begin{pagelisting}[label=cleancode, caption={Clean implementation of the random-walk.}]
|
|
num_runs = 10;
|
|
max_steps = 1000;
|
|
positions = zeros(max_steps, num_runs);
|
|
|
|
for run = 1:num_runs
|
|
for step = 2:max_steps
|
|
x = randn(1);
|
|
if x < 0
|
|
positions(step, run) = positions(step-1, run) + 1;
|
|
elseif x > 0
|
|
positions(step, run) = positions(step-1, run) - 1;
|
|
end
|
|
end
|
|
end
|
|
\end{pagelisting}
|
|
|
|
\section{Using comments}
|
|
|
|
It is common to provide extra information about the meaning of program
|
|
code by adding comments. In \matlab{} comments are indicated by the
|
|
percent character \code{\%}. Anything that follows the percent
|
|
character in a line is ignored and considered a comment. When used
|
|
sparsely comments can be immensely helpful. Comments
|
|
are short sentences that describe the meaning of the (following) lines
|
|
in the program code. During the initial implementation of a function
|
|
they can be used to guide the development but have the tendency to
|
|
blow up the code and decrease readability. By choosing expressive
|
|
variable and function names, most lines should be self-explanatory.
|
|
|
|
For example stating the obvious does not really help and should be
|
|
avoided:\\ \varcode{ x = x + 2; \% add two to x}\\
|
|
|
|
\begin{important}[Using comments]
|
|
\begin{itemize}
|
|
\item Comments describe the rationale of the respective code block.
|
|
\item Comments are good and helpful --- they must be true, however!
|
|
\item A wrong comment is worse than a non-existent one!
|
|
\item Comments must be maintained just as the code. Otherwise they
|
|
may become wrong and worse than meaningless!
|
|
\end{itemize}
|
|
\widequote{Good code is its own best documentation. As you're about to add
|
|
a comment, ask yourself, ``How can I improve the code so that this
|
|
comment isn't needed?'' Improve the code and then document it to
|
|
make it even clearer.}{Steve McConnell}
|
|
\end{important}
|
|
|
|
\section{Documenting functions}
|
|
All pre-defined \matlab{} functions begin with a comment block that
|
|
describes the purpose of the function, the required and optional
|
|
arguments, and the values returned by the function. Using the
|
|
\code{help} command one can display these comments and learn how to
|
|
use the function properly. Self-written functions can and should be
|
|
documented in a similar way. Listing ~\ref{localfunctions} shows a
|
|
well documented function.
|
|
|
|
\begin{important}[Documenting functions]
|
|
Functions must be properly documented, otherwise a user (the author
|
|
him- or herself) must read and understand the function code which is
|
|
a waste of time!
|
|
\begin{itemize}
|
|
\item Describe with a few sentences the purpose of the function.
|
|
\item Note the function head to illustrate the order of the argments.
|
|
\item For each argument state the purpose, the expected data type
|
|
(number, vector, matrix, etc.) and, if applicable, the unit in
|
|
which a provided number must be given (e.g. seconds if a time is
|
|
expected).
|
|
\item The same for all return values.
|
|
\end{itemize}
|
|
\end{important}
|
|
|
|
|
|
\section{Delegating tasks in functions}
|
|
Comments and empty lines are used to organize code into logical blocks
|
|
and to briefly explain what they do. Whenever one feels tempted to do
|
|
this, one could also consider to delegate the respective task to a
|
|
function. In most cases this is preferable.
|
|
|
|
Not delegating the tasks leads to very long \entermde[m-file]{m-File}{m-files}
|
|
which can be confusing. Sometimes such a code is called ``spaghetti
|
|
code''. It is high time to think about delegation of tasks to
|
|
functions.
|
|
|
|
\begin{important}[Delegating to functions]
|
|
When should one consider delegating tasks to specific functions?
|
|
\begin{itemize}
|
|
\item Whenever one needs more than two indentation levels to
|
|
organize to code.
|
|
\item Whenever the same lines of code are repeated more than once.
|
|
\item Whenever one is tempted to use copy-and-paste.
|
|
\end{itemize}
|
|
\end{important}
|
|
|
|
\subsection{Local and nested functions}
|
|
Generally, functions live in their own \entermde[m-file]{m-File}{m-files} that
|
|
have the same name as the function itself. Delegating tasks to
|
|
functions thus leads to a large set of \entermde[m-file]{m-File}{m-files}
|
|
which increases complexity and may lead to confusion. If the delegated
|
|
functionality is used in multiple instances, it is still advisable to
|
|
do so. On the other hand, when the delegated functionality is only
|
|
used within the context of another function \matlab{} allows to define
|
|
\entermde[function!local]{Funktion!lokale}{local functions} and
|
|
\entermde[function!nested]{Funktion!verschachtelte}{nested functions}
|
|
within the same file. Listing \ref{localfunctions} shows an example of
|
|
a local function definition.
|
|
|
|
\pageinputlisting[label=localfunctions, caption={Example for local
|
|
functions.}]{calculateSines.m}
|
|
|
|
\emph{Local function} live in the same \entermde{m-File}{m-file} as
|
|
the main function and are only available in this context. Each local
|
|
function has its own \enterm{scope}, that is, the local function can
|
|
not access (read or write) variables of the calling
|
|
function. Interaction with the local function requires to pass all
|
|
required arguments and to take care of the return values of the
|
|
function.
|
|
|
|
\emph{Nested functions} are different in this respect. They are
|
|
defined within the body of the parent function (between the keywords
|
|
\code{function} and \code{end}) and have full access to all variables
|
|
defined in the parent function. Working (in particular changing) the
|
|
parent's variables is handy on the one side, but is also risky. One
|
|
should take care when defining nested functions.
|
|
|
|
|
|
\section{Specifics when using scripts}
|
|
A similar problem as with nested function arises when using scripts
|
|
(instead of functions). All variables that are defined within a script
|
|
become available in the global \enterm{workspace}
|
|
(\determ{Arbeitsbereich}). There is the risk of name conflicts, that
|
|
is, a called sub-script redefines or uses the same variable name and
|
|
may \emph{silently} change its content. The user will not be notified
|
|
about this change and the calling script may expect a completely
|
|
different content. Bugs that are based on such mistakes are hard to
|
|
find since the program itself looks perfectly fine.
|
|
|
|
To avoid such issues one should design scripts in a way that they
|
|
perform their tasks independent from other scripts and functions.
|
|
|
|
A common use case for a script could be to control the analyses made
|
|
on many datasets and to collect the results. A good script is still
|
|
not too long and is thus easy to comprehend. Another advantage of
|
|
small task-related scripts is that they can be directly executed by
|
|
either calling them from the command line or pressing \keycode{F5} in
|
|
the editor. Should it fail there will be a proper error message that
|
|
provides important information to track and fix the bug.
|
|
|
|
\begin{important}[Structuring scripts]
|
|
\begin{itemize}
|
|
\item Similar to functions script should solve one task and should
|
|
not be too long.
|
|
|
|
\item Scripts should work independently of existing variables in the
|
|
global workspace.
|
|
|
|
\item Often it is advisable to start a script with deleting
|
|
variables (\code{clear}) from the workspace and most of the times
|
|
it is also good to close all open figures (\code{close all}). Be
|
|
careful if a the respective script has been called by another one.
|
|
|
|
\item Clean up the workspace at the end of a script. Delete
|
|
(\code{clear}) all variables that are no longer needed.
|
|
|
|
\item Consider to write functions instead of scripts.
|
|
\end{itemize}
|
|
\end{important}
|
|
|
|
|
|
\section{Summary}
|
|
|
|
Program code must be readable. Names of variables, functions and
|
|
scripts should be expressive and describe their purpose (scripts and
|
|
functions) or their content (variables). Cultivating a personalized
|
|
code style is perfectly fine as long as it is consistent. Many
|
|
programming languages or communities have their own traditions. It is
|
|
advisable to adhere to these.
|
|
|
|
Repeated tasks should (to be read as must) be delegated to
|
|
functions. In cases in which a function is only locally applied and
|
|
not of more global interest across projects consider to define it as
|
|
\entermde[function!local]{Funktion!lokale}{local function} or
|
|
\entermde[function!nested]{Funktion!verschachtelte}{nested
|
|
function}. Taking care to increase readability and comprehensibility
|
|
pays off, even to the author! \footnote{Reading tip: Robert
|
|
C. Martin: \textit{Clean Code: A Handbook of Agile Software
|
|
Craftmanship}, Prentice Hall}
|
|
|
|
\shortquote{Programs must be written for people to read, and only
|
|
incidentally for machines to execute.}{Abelson / Sussman}
|
|
|
|
\shortquote{Any fool can write code that a computer can
|
|
understand. Good programmers write code that humans can
|
|
understand.}{Martin Fowler}
|
|
|
|
\shortquote{First, solve the problem. Then, write the code.}{John
|
|
Johnson}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%\printsolutions
|