\chapter{Code style} \shortquote{Any code of your own that you haven't looked at for six or more months might as well have been written by someone else.}{Eagleson's law} Cultivating a good code style is not just a matter of good taste but rather is a key ingredient for readability and maintainability of code and, in the end, facilitates reproducibility of scientific results. Programs should be written and structured in a way that supports outsiders as well the author himself --- a few weeks or months after it was written --- to understand the programs' rationale. Clean code pays off for the original author as well as others that are supposed to use the code. Clean code addresses several issues: \begin{enumerate} \item The programs' structure. \item Naming of scripts and functions. \item Naming of variables and constants. \item Application of indentation and empty lines to define blocks. \item Use of comments and inline documentation. \item Delegation of repeated code to functions and dedicated subroutines. \end{enumerate} \section{Organization of programs on the file system} While introducing scripts and functions we suggested a typical program layout (box\,\ref{whenscriptsbox}). The idea is to create a single entry point by having one script that controls the rest of the program by calling functions that work on the data and managing the results. Applying this structure makes it easy to understand the flow of the program but two questions remain: (i) How to organize the files on the file system and (ii) how to name them that the controlling script is easily identified among the other \entermde[m-file]{m-File}{m-files}. Upon installation \matlab{} creates a folder called \file{MATLAB} in the user space (Windows: My files, Linux: Documents, MacOS: Documents). Since this folder is already appended to the Matlab search path (Box~\ref{matlabpathbox}), it is easiest to stick to it for the moment. Of course, any other location can specified as well. Generally it is of great advantage to store related scripts and functions within the same folder on the hard drive. An easy approach is to create a project-specific folder structure that contains sub-folders for each task (analysis) and to store all related \entermde[m-file]{m-File}{m-files} (screenshot \ref{fileorganizationfig}). In these task-related folders one may consider to create a further sub-folder to store results (created figures, result data). On the project level a single script (\file{analysis.m}) controls the whole process. In parallel to the project folder we suggest to create an additional folder for functions that are or may be relevant across different projects. Within such a structure it is quite likely that programs in different projects share the same name (e.g. a \varcode{load\_data.m} function). Usually this will not lead to conflicts due to the way matlab searches for matching functions which always starts in the current folder (more information on the \matlab-path in Box~\ref{matlabpathbox}). \begin{figure}[tp] \includegraphics[width=0.75\textwidth]{program_organization} \titlecaption{\label{fileorganizationfig} Possible folder structure for maintaining program code on the file system.}{For each project one maintains an individual folder in which analyses or tasks may be structured in sub-folders. Within each analysis a \file{main.m} script is the entry point for the analyses. On the project level there could be a single script that triggers and controls all analyses and tasks in the sub-folders. Functions that are of general interest across projects are best kept in a dedicated folder outside the project sub-structure.} \end{figure} \begin{ibox}[tp]{\label{matlabpathbox}\matlab{} search path} The \entermde{Suchpfad}{search path} defines where \matlab{} looks for scripts and functions. When calling a function from the command line \matlab{} needs to figure out which function is addressed and starts looking for it in the current path. If this fails it will crawl all locations listed in the search path (see figure). The \entermde{Suchpfad}{search path} is basically a list of folders. \matlab{} will go through this list from front to end and the search will stop on the first match. This implies that the order in the search path may affect which version of functions that share the same name is used. Note: \matlab{} does not perform a recursive search. That is, a function that resides in a sub-folder that is not explicitly listed in the \entermde{Suchpfad}{search path} will not be found. \vspace{2ex} \includegraphics[width=0.9\textwidth]{search_path} \vspace{1.5ex} The search path can be managed from the command line by using the functions \code{addpath()} or \code{userpath()}. Alternatively, the \matlab{} UI offers a graphical tool for adding/removing paths, or changing the order of entries. The current working directory can be changed via the UI or also the command line using the command \code{cd} (for change directory). The current path is shown in the current directory text field of the UI or can be requested using the command \code{pwd} (for present work directory). The function \code{which()} shows the full path of the actually used function. For example, finding out which \code{mean()} function is used gives a result similar to: \begin{lstlisting}[label=useofwhich, caption={Use of 'which'}] >> which('mean') /Applications/MATLAB2018b.app/toolbox/matlab/datafun/mean.m \end{lstlisting} \end{ibox} \section{Naming things} The dictum of good code style is: ``Program code must be readable.'' Expressive names are extraordinarily important in this respect. Even if it is tricky to find expressive names that are not overly long, naming should be taken seriously. \matlab{} has a few rules about names: Names must not start with a number, they must not contain blanks or other special characters like e.g. German Umlauts. Otherwise one is free to use whatever suits. The names of pre-defined functions shipped with \matlab{} follows several patterns: \begin{itemize} \item Names are always lowercase. \item Names are often abbreviations (e.g. \code{xcorr()} stands for cross-correlation \code{repmat()} for ``repeat matrix''). \item Functions that convert between formats are named according to the pattern ``format2format'' (e.g. \code{num2str()} for ``number to string'' conversion). \end{itemize} There are other common patterns such as the \emph{camelCase} in which the first character of compound words is capitalized. Other conventions use the underscore to separate the individual words (\emph{snake\_case}). A function that counts the number of action potentials could be named \file{spikeCount.m} or \file{spike\_count.m}. The same naming rules apply for scripts and functions as well as variables and constants. \subsection{Naming scripts and functions} \matlab{} will search the search path (Box \ref{matlabpathbox}) exclusively by name. This search is case-sensitive which implies that the files \file{test\_function.m} and \file{Test\_function.m} are two different things. It is self-evident that choosing such names is nonsensical because the tiny difference in the name contains no cue about the difference between the two versions and the function names themselves tell close to nothing about the purpose. Finding good names is not trivial. Sometimes it is harder than the programming itself. Choosing \emph{expressive names} that provide information about a function's purpose, however, pays off! \begin{important}[Naming scripts and functions] Names of functions and scripts should be expressive in the sense that the name provides information about the function's purpose. (\file{estimate\_firingrate.m} tells much more than \file{exercise1.m}). Choosing a good name replaces large parts of the documentation. \end{important} \subsection{Naming variables and constants} While the names of scripts and functions describe the purpose, names of variables describe the stored content. A variable storing the average number of actions potentials could be called\\ \varcode{average\_spike\_count}. If this variable is meant to store multiple spike counts the plural form would be appropriate\\ (\varcode{average\_spike\_counts}). The control variables used in the head of a \code{for} loop are often simply named \varcode{i}, \varcode{j} or \varcode{k}. This kind-of clashes with the previously made statements but since it is a very common pattern the meaning of such variables in the context of the loop is quite obvious. This should, however, be the only exception to the general rule of expressive naming. \begin{important}[Naming of variables] The names of variables should be expressive. That is, the name itself should tell about the content of the variable. The name \varcode{spike\_count} tells much more about the stored information than \varcode{x}. Choosing a good variable name replaces additional comments. \end{important} \section{Code style} Readability of program code depends strongly on whether or not a consistent code style is applied. A program that is only randomly indented or that contains lots of empty lines is very hard to read and to comprehend. Even though the \matlab{} language (as many others) does not enforce indentation, indentation is very powerful for defining coherent blocks. The \matlab{} editor supports this by an auto-indentation mechanism. A selected section of the code and be automatically indented by pressing \keycode{Ctrl-I}. Interspersing empty lines is very helpful to separate regions in the code that belong together. Too many empty lines, however lead to hard-to-read code because it might require more space than a granted by the screen and thus takes overview. The following two listings show basically the same implementation of a random walk\footnote{A random walk is a simple simulation of Brownian motion. In each simulation step an agent takes a step into a randomly chosen direction.} once in a rather chaotic version (listing \ref{chaoticcode}) then in cleaner way (listing \ref{cleancode}) \begin{pagelisting}[label=chaoticcode, caption={Chaotic implementation of the random-walk.}] num_runs = 10; max_steps = 1000; positions = zeros(max_steps, num_runs); for run = 1:num_runs for step = 2:max_steps x = randn(1); if x<0 positions(step, run)= positions(step-1, run)+1; elseif x>0 positions(step,run)=positions(step-1,run)-1; end end end \end{pagelisting} \begin{pagelisting}[label=cleancode, caption={Clean implementation of the random-walk.}] num_runs = 10; max_steps = 1000; positions = zeros(max_steps, num_runs); for run = 1:num_runs for step = 2:max_steps x = randn(1); if x < 0 positions(step, run) = positions(step-1, run) + 1; elseif x > 0 positions(step, run) = positions(step-1, run) - 1; end end end \end{pagelisting} \section{Using comments} It is common to provide extra information about the meaning of program code by adding comments. In \matlab{} comments are indicated by the percent character \code{\%}. Anything that follows the percent character in a line is ignored and considered a comment. When used sparsely comments can be immensely helpful. Comments are short sentences that describe the meaning of the (following) lines in the program code. During the initial implementation of a function they can be used to guide the development but have the tendency to blow up the code and decrease readability. By choosing expressive variable and function names, most lines should be self-explanatory. For example stating the obvious does not really help and should be avoided:\\ \varcode{ x = x + 2; \% add two to x}\\ \begin{important}[Using comments] \begin{itemize} \item Comments describe the rationale of the respective code block. \item Comments are good and helpful --- they must be true, however! \item A wrong comment is worse than a non-existent one! \item Comments must be maintained just as the code. Otherwise they may become wrong and worse than meaningless! \end{itemize} \widequote{Good code is its own best documentation. As you're about to add a comment, ask yourself, ``How can I improve the code so that this comment isn't needed?'' Improve the code and then document it to make it even clearer.}{Steve McConnell} \end{important} \section{Documenting functions} All pre-defined \matlab{} functions begin with a comment block that describes the purpose of the function, the required and optional arguments, and the values returned by the function. Using the \code{help} command one can display these comments and learn how to use the function properly. Self-written functions can and should be documented in a similar way. Listing ~\ref{localfunctions} shows a well documented function. \begin{important}[Documenting functions] Functions must be properly documented, otherwise a user (the author him- or herself) must read and understand the function code which is a waste of time! \begin{itemize} \item Describe with a few sentences the purpose of the function. \item Note the function head to illustrate the order of the argments. \item For each argument state the purpose, the expected data type (number, vector, matrix, etc.) and, if applicable, the unit in which a provided number must be given (e.g. seconds if a time is expected). \item The same for all return values. \end{itemize} \end{important} \section{Delegating tasks in functions} Comments and empty lines are used to organize code into logical blocks and to briefly explain what they do. Whenever one feels tempted to do this, one could also consider to delegate the respective task to a function. In most cases this is preferable. Not delegating the tasks leads to very long \entermde[m-file]{m-File}{m-files} which can be confusing. Sometimes such a code is called ``spaghetti code''. It is high time to think about delegation of tasks to functions. \begin{important}[Delegating to functions] When should one consider delegating tasks to specific functions? \begin{itemize} \item Whenever one needs more than two indentation levels to organize to code. \item Whenever the same lines of code are repeated more than once. \item Whenever one is tempted to use copy-and-paste. \end{itemize} \end{important} \subsection{Local and nested functions} Generally, functions live in their own \entermde[m-file]{m-File}{m-files} that have the same name as the function itself. Delegating tasks to functions thus leads to a large set of \entermde[m-file]{m-File}{m-files} which increases complexity and may lead to confusion. If the delegated functionality is used in multiple instances, it is still advisable to do so. On the other hand, when the delegated functionality is only used within the context of another function \matlab{} allows to define \entermde[function!local]{Funktion!lokale}{local functions} and \entermde[function!nested]{Funktion!verschachtelte}{nested functions} within the same file. Listing \ref{localfunctions} shows an example of a local function definition. \pageinputlisting[label=localfunctions, caption={Example for local functions.}]{calculateSines.m} \emph{Local function} live in the same \entermde{m-File}{m-file} as the main function and are only available in this context. Each local function has its own \enterm{scope}, that is, the local function can not access (read or write) variables of the calling function. Interaction with the local function requires to pass all required arguments and to take care of the return values of the function. \emph{Nested functions} are different in this respect. They are defined within the body of the parent function (between the keywords \code{function} and \code{end}) and have full access to all variables defined in the parent function. Working (in particular changing) the parent's variables is handy on the one side, but is also risky. One should take care when defining nested functions. \section{Specifics when using scripts} A similar problem as with nested function arises when using scripts (instead of functions). All variables that are defined within a script become available in the global \enterm{workspace} (\determ{Arbeitsbereich}). There is the risk of name conflicts, that is, a called sub-script redefines or uses the same variable name and may \emph{silently} change its content. The user will not be notified about this change and the calling script may expect a completely different content. Bugs that are based on such mistakes are hard to find since the program itself looks perfectly fine. To avoid such issues one should design scripts in a way that they perform their tasks independent from other scripts and functions. A common use case for a script could be to control the analyses made on many datasets and to collect the results. A good script is still not too long and is thus easy to comprehend. Another advantage of small task-related scripts is that they can be directly executed by either calling them from the command line or pressing \keycode{F5} in the editor. Should it fail there will be a proper error message that provides important information to track and fix the bug. \begin{important}[Structuring scripts] \begin{itemize} \item Similar to functions script should solve one task and should not be too long. \item Scripts should work independently of existing variables in the global workspace. \item Often it is advisable to start a script with deleting variables (\code{clear}) from the workspace and most of the times it is also good to close all open figures (\code{close all}). Be careful if a the respective script has been called by another one. \item Clean up the workspace at the end of a script. Delete (\code{clear}) all variables that are no longer needed. \item Consider to write functions instead of scripts. \end{itemize} \end{important} \section{Summary} Program code must be readable. Names of variables, functions and scripts should be expressive and describe their purpose (scripts and functions) or their content (variables). Cultivating a personalized code style is perfectly fine as long as it is consistent. Many programming languages or communities have their own traditions. It is advisable to adhere to these. Repeated tasks should (to be read as must) be delegated to functions. In cases in which a function is only locally applied and not of more global interest across projects consider to define it as \entermde[function!local]{Funktion!lokale}{local function} or \entermde[function!nested]{Funktion!verschachtelte}{nested function}. Taking care to increase readability and comprehensibility pays off, even to the author! \footnote{Reading tip: Robert C. Martin: \textit{Clean Code: A Handbook of Agile Software Craftmanship}, Prentice Hall} \shortquote{Programs must be written for people to read, and only incidentally for machines to execute.}{Abelson / Sussman} \shortquote{Any fool can write code that a computer can understand. Good programmers write code that humans can understand.}{Martin Fowler} \shortquote{First, solve the problem. Then, write the code.}{John Johnson} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\printsolutions