This repository has been archived on 2021-05-17. You can view files and clone it, but cannot push or open issues or pull requests.
scientificComputing/plotting/lecture/plotting.tex
2018-02-07 10:45:21 +01:00

655 lines
32 KiB
TeX

\chapter{Graphical representation of scientific data}
We may count the ability of adequately presenting scientific data to
the core competences needed to do science. We need to present data in
a meaningful way that fosters understanding of the data and the
results.
\begin{figure}[hb!]
\includegraphics[width=0.9\columnwidth]{convincing}
\titlecaption{The consequences of bad plots may be
severe.}{\url{www.xkcd.com}}\label{xkcdplotting}
\end{figure}
\section{What makes a good plot?}
Plot should help/enable the interested reader to get a grasp of the
data and to understand the performed analysis and to critically assess
the presented results. The most important rule is the correct and
complete annotation of the plots. This starts with axis labels and
units and and extends to legends. Incomplete annotation can have
terrible consequences (\figref{xkcdplotting}).
The principle of \emph{ink minimization} may be used a a guiding
principle for appealing plots. It requires that the relation of amount
of ink spent on the data and that spent on other parts of the plot
should be strongly in favor of the data. Ornamental of otherwise
unnecessary gimicks should not be used in scientific contexts. An
exception can be made if the particular figure was designed for
didactic purposes and sometimes for presentations.
\begin{important}[Correct labeling of plots]
A data plot must be sufficiently labeled:
\begin{itemize}
\item Every axis must have a label and the correct unit, if it has
one.\\ (e.g. \code[xlabel()]{xlabel('Speed [m/s]'}).
\item When more than one line is plotted, they have to be labeled
using the figure legend, or similar \matlabfun{legend()}.
\item If using subplots that show similar information on the axes,
they should be scaled to show the same ranges to ease comparison
between plots. (e.g. \code[xlim()]{xlim([0 100])}.\\ If one
chooses to ignore this rule one should explicitly state this in
the figure caption and/or the descriptions in the text.
\item Labels must be large enough to be readable. In particular,
when using the figure in a presentation use large enough fonts.
\end{itemize}
\end{important}
\section{Things that should be avoided.}
When plotting scientific data we should take great care to avoid
suggestive or misleading presentations. Unnecessary additions and
fancy graphical effects make a plot frivolous and also violate the
\emph{ink minimization principle}. Illustrations in comic style
(\figref{comicexamplefig}) are not suited for scientific data in most
instances. For presentations or didactic purposes, however, using a
comic style may be helpful to indicate that the figure is a mere
sketch and the exact position of the data points is of no importance.
\begin{figure}[t]
\includegraphics[width=0.7\columnwidth]{outlier}\vspace{-3ex}
\titlecaption{Comic-like illustration.}{Obviously not suited to
present scientific data. In didactic or illustrative contexts they
can be helpful to focus on the important
aspects.}\label{comicexamplefig}
\end{figure}
The following figures show examples of misleading or suggestive
presentations of data. Several of the effects have been exaggerated to
make the point. A little more subtlety these methods are employed to
nudge the viewers experience into the desired direction. You can find
more examples on \url{https://en.wikipedia.org/wiki/Misleading_graph}.
\begin{figure}[p]
\includegraphics[width=0.35\textwidth]{misleading_pie}
\hspace{0.05\textwidth}
\includegraphics[width=0.35\textwidth]{sample_pie}
\titlecaption{Perspective distortion influences the perceived
size.}{By changing the perspective of the 3-D illustration the
highlighted segment \textbf{C} gains more weight than it should
have. In the left graph segments \textbf{A} and \textbf{C} appear
very similar. The 2-D plot on the right-hand side shows that this
is an
illusion. \url{https://en.wikipedia.org/wiki/Misleading_graph}}\label{misleadingpiefig}
\end{figure}
\begin{figure}[p]
\includegraphics[width=0.9\textwidth]{plot_scaling.pdf}
\titlecaption{Choosing the figure format and scaling of the axes
influences the perceived strength of a correlation.}{All subplots
show the same data. By choosing a certain figure size we can
pronounce or reduce the perceived strength of the correlation
in the data. Technically all three plots are correct.
}\label{misleadingscalingfig}
\end{figure}
\begin{figure}[p]
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.8\textwidth]{improperly_scaled_graph}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.8\textwidth]{comparison_properly_improperly_graph}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.7\textwidth]{properly_scaled_graph}
\end{minipage}
\titlecaption{Scaling of markers and symbols.} {In these graphs
symbols have been used to illustrate the measurements made in two
categories. The measured value for category \textbf{B} is actually
three times the measured value for category \textbf{A}. In the
left graph the symbol for category \textbf{B} has been scaled to
triple height while maintaining the proportions. This appears just
fair and correct but leads to the effect that the covered surface
is not increased to the 3-fold but the 9-fold (center plot). The
plot on the right shows how it could have been done correctly.
\url{https://en.wikipedia.org/wiki/Misleading_graph}}\label{misleadingsymbolsfig}
\end{figure}
By using perspective effects in 3-D plot the perceived size can be
distorted into the desired direction. While the plot is correct in a
strict sense it is rather suggestive
(\figref{misleadingpiefig}). Similarly the choice of figure size and
proportions can lead to different interpretations of the
data. Stretching the y-extent of a graph leads to a stronger
impression of the correlation in the data. Compressing this axis will
lead to a much weaker perceived correlation
(\figref{misleadingscalingfig}). When using symbols to illustrate a
quantity we have to take care not to overrate of difference due to
symbol scaling (\figref{misleadingsymbolsfig}).
\section{The \matlab{} plotting system}
Plotting data in \matlab{} is rather straight forward for simple line
plots. By calling \code[plot()]{plot(x, y)} a simple line plot will be
created. This figure, however is missing any annotations like axis
labeling, a legend, etc.. There are two options to edit the plot: (i)
the graphical user interface (GUI) or the command line. Both ways have
their right to exist associated with respective pros and cons. The UI
way of editing plots is ideal for experimenting the command line
approach is best suited for automation and to achieve a consistent
layout across figures and graphs.
\begin{figure}
\begin{minipage}[t]{0.6\textwidth}
\includegraphics[height=0.29\textheight]{plot_editor}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[height=0.29\textheight]{property_editor}
\end{minipage}
\titlecaption{The graphical plot-editor.}{From the menu ``Tools
$\rightarrow$ Edit Plot'' one can select the editor. Using the
mouse you can select different parts of the current plot (axes,
lines, the figure background, etc.) and the interface will change
to allow modifying the properties. Some properties are not offered
directly but hide behind the \emph{More Properties} button which
will open the \emph{Property Editor}.}\label{ploteditorfig}
\end{figure}
\vspace{1ex} While it is very convenient to edit a figure using the
GUI (\figref{ploteditorfig}), it is hard to re-create the exact same
plot later on or transfer the changes done to one figure to
another. \matlab{} figures consist of several graphical objects:
\begin{enumerate}
\item \enterm[figure]{Figure}: This object represents the whole
drawing area, it holds properties like background color, the size of
the figure/paper and the placement of the axes on the paper, etc..
\item \enterm[axes]{Axes}: The coordinate system for plotting the
data. Defines properties like the scaling of the axes, the labeling,
line widths, etc..
\item \enterm[lines]{Lines}: The drawn data lines. Holds properties
like line width and color, the name associated with the line, marker
size and many more.
\item \enterm[annotations]{Annotations}: Annotations like textboxes
and or arrows that can be used to highlight points or segments.
\item \enterm[legends]{Legends}: Legends of the data plot. One can
define the style of the legend, its placement in the plot, etc..
\end{enumerate}
Each of these objects offers a number of settings some of them can be
directly manipulated in the plot editor others are available via the
property editor.
\subsection{Avoiding manual editing of figures}
All properties that can be manipulated with the graphical interfaces
can also be edited using command line or the respective commands can
be included in a script or function. Creating the plot from inside a
script or function has the advantage that one can apply the same
settings to several figures, re-create the figure automatically when
the data was changed or the same kind of plot has to be created for a
number of datasets.
\begin{important}[Why manual editing should be avoided.]
On first glance the manual editing of a figure using common tools
like Corel draw, Illustrator, etc.\,appears some much more
convenient and less complex. This, however, is not entirely
true. What if the figure has to be re-drawn or updated? Then the
editing work starts all over again. Rather, there is a great risk
associated with this approach. Axes are shifted, fonts have not been
embedded into the final document, annotations have been copy pasted
between figures and are not valid. All of these mistakes can be
found in publications and then require an erratum, which is not
desirable. Even if it appears more cumbersome in the beginning one
should always try to create publication-ready figures directly from
the data analysis tool using scripts or functions to properly
layout the plot.
\end{important}
\subsection{Simple plotting}
Creating a simple line-plot is rather easy. Assuming there exists a
variable \varcode{y} in the \codeterm{Workspace} that contains the
measurement data it is enough to call \code[plot()]{plot(y)}. At the
first call of this function a new window will be opened and the data
will be plotted with as a line plot. If you repeatedly call this
function the current plot will be replaced unless the \code[hold]{hold
on} command was issued before. If it was, the current plot is held
and a second line will be added to it. Calling \code[hold]{hold off}
will release the plot and any subsequent plotting will replace the
previous plot.
In our previous call to \varcode{plot} we have provided just a single
variable containing the y-values of the plot. The x-axis will be
scaled from zero to the number of elements in \varcode{y} the x-values
are automatically substituted assuming a constant stepsize of 1. This
automatic scaling is probably not desired and thus, we need to provide
the missing information ourselves. The respective call will expand to
\code[plot()]{plot(x, y)}. In axis will be scaled from the minimum in
\varcode{x} to the maximum of \varcode{x} and by default it will be
plotted as a line plot with a solid blue line of the with 1pt. A
second plot that is added to the figure will be plotted in red using
the same standard settings. The order of the used colors depends on
the \enterm{colormap} settings which can be adjusted to personal taste
or need. Table\,\ref{plotlinestyles} shows some predefined values that
can be chosen for the line style, the marker, or the color. For
additional options consult the help.
\begin{table}[tp]
\titlecaption{Predefined line styles (left), colors (center) and
marker symbols (right).}{}\label{plotlinestyles}
\begin{tabular}[t]{lc} \hline
\textbf{line styles} & \textbf{abbreviation} \erh \\\hline solid
& '\verb|-|' \erb \\ dashed & '\verb|--|' \\ dotted &
'\verb|:|' \\ dash-dotted & '\verb|.-|' \\\hline
\end{tabular}
\hfill
\begin{tabular}[t]{lc} \hline
\textbf{color} & \textbf{abbreviation} \erh \\ \hline red & 'r' \erb
\\ green & 'g' \\ blue & 'b' \\ cyan & 'c' \\ magenta & 'm'
\\ yellow & 'y' \\ black & 'k' \\ \hline
\end{tabular}
\hfill
\begin{tabular}[t]{lc} \hline
\textbf{marker symbols} & \textbf{abbreviation} \erh \\ \hline circle &
'o' \erb \\ star & '*' \\ plus & '+' \\ cross & 'x' \\ diamond &
'd' \\ pentagram & 'p' \\ hexagram & 'h' \\ square & 's'
\\ triangle & '\^{}' \\ inverted triangle & 'v' \\ triangle left
& '$<$'\\ triangle right & '$>$'\\\hline
\end{tabular}
\end{table}
\subsection{Changing properties of a line plot}
The properties of line plots can be changed by passing more arguments
to the \varcode{plot} function. The command shown in
listing\,\ref{settinglineprops} creates a line plot using the dotted
line style, sets the line width to 1.5pt, a red line color is
chosen, and star marker symbols is used. Finally, the name of the
curve is set to \emph{plot 1} which will be displayed in a legend, if
chosen.
\begin{lstlisting}[label=settinglineprops, caption={Setting line properties when calling \varcode{plot}.}]
x = 0:0.1:2*pi; y = sin(x); plot( x, y, 'color', 'r', 'linestyle',
':', 'marker', '*', 'linewidth', 1.5, 'displayname', 'plot 1')
\end{lstlisting}
\begin{important}[Choosing the right color.]
Choosing the perfect color goes a little bit beyond personal
taste. When creating a colored plot you may want to consider the
following points:
\begin{itemize}
\item A substantial amount (about 9\%) of the male population can
not distinguish between red and green.
\item Can you distinguish the colors in a b/w respectively gray
scale print?
\item Color figures in publications often cost extra money.
\end{itemize}
\end{important}
\subsection{Changing the axis properties}
The first thing a data plot needs are axis labels with a correct
unit. By calling the functions \code[xlabel]{xlabel('Time [ms]')} and
\code[ylabel]{ylabel{'Voltage [mV]'}} these can be set. By default the
axes will be scaled to show the whole data range. The extremes will be
selected as the closest integer for small values or the next full
multiple of tens, hundreds, thousands, etc.\ depending on the maximum
value. If these defaults do not match our needs, the limits of the
axes can be explicitly set with the functions \code[xlim()]{xlim()}
and \code[ylim()]{ylim()}. To do this, the functions expect a single
argument, that is a vector containing the minimum and maximum
value. Table\,\ref{plotaxisprops} lists some of the commonly adjusted
properties of an axis. These properties can be set using the
\code[set()]{set()} function. The \code{set} function expects as a
first argument a \enterm{handle} of the affected axis. An axis handle
of the current plot is returned by the \code[gca]{gca} function (gca
stands for ``get current axis''). The following arguments passed to
\code{set} are pairs of the property name and the desired value. It is
possible to set any number of properties using a single call to
\code{set}. See listing\,\ref{niceplotlisting} (lines 20 and 21) for
an example (these commands could be joined into a single call to
\code{set} but have been split for better readability).
\begin{table}[tp]
\titlecaption{Incomplete list of axis properties.}{For a complete
list consult the help system or open the property editor when an
axis is selected (\figref{ploteditorfig}). If there is a default
value of a property it will be listed first.}\label{plotaxisprops}
\begin{tabular*}{1\textwidth}{lp{5.8cm}p{5.5cm}} \hline
\textbf{property} & \textbf{Description} & \textbf{options} \erh
\\ \hline \code{Box} & Defines whether the axes are drawn on all
sides. & $\{'on'|'off'\}$ \erb\\
\code{Color} & Background color of the drawing area, not the whole figure. & Any RGB or CMYK
values. \\
\code{FontName} & Name of the font used for labeling. & Installed fonts. \\
\code{FontSize} & Size of the font used for labels. & Any scalar value.\\
\code{FontUnit} & Unit in which the font size is given. & $\{'points' | 'centimeters' | 'inches',
...\}$\\ \code{FontWeight} & Bold or normal font. & $\{'normal' | 'bold'\}$\\
\code{TickDir} & Direction of the axis ticks. & $\{'in' | 'out'\}$\\
\code{TickLength} & Length of the ticks. & A scalar value\\
\code{X-, Y-, ZDir} & Direction of axis scaling. Zero bottom/left, or not? & $\{'normal' | 'reversed'\}$\\
\code{X-, Y-, ZGrid} & Defines whether grid lines for the respective axes should be plotted? &
$\{'off'|'on'\}$ \\
\code{X-, Y-, ZScale} & Linear of logarithmic scaling? & $\{'linear' | 'log'\}$\\
\code{X-, Y-, ZTick} & Position of the tick marks. & Vector of positions.\\
\code{X-, Y-, ZTickLabel} & Labels that should be use to label the ticks. & Vector of numbers or a cell-array of strings.\\ \hline
\end{tabular*}
\end{table}
\subsection{Changing figure properties}
\begin{table}[tp]
\titlecaption{Incomplete list of available figure properties.}{For a
complete reference consult the \matlab{} help or select the
property editor while having the figure background selected
(\figref{ploteditorfig}).}\label{plotfigureprops}
\begin{tabular*}{1\textwidth}{lp{6.6cm}p{5.7cm}} \hline
\textbf{property} & \textbf{description} & \textbf{options}
\erh \\
\hline \code{Color} & Background color of the figure, not the drawing area. & Any RGB, CMYK values. \erb
\\ \code{PaperPosition} & Position of the axes on the paper. & 4-element vector containing the positions of the bottom-left and top-right corners. \\
\code{PaperSize} & Size of the paper. & 2-element vector defining width and height.\\
\code{PaperUnits} & Unit in which size and position are given. & $\{'inches' | 'centimeters' |
'normalized' | 'points'\}$\\
\code{Visible} & Defines whether the plot should actually be drawn on screen. Useful when plots should not be displayed but directly saved to file. & $\{'on' | 'off'\}$\\ \hline
\end{tabular*}
\end{table}
Like axes, also figure has several properties that can be adjusted to
the current needs. Most notably the paper (figure) size and the
placement of the axes on the paper. Table\,\ref{plotfigureprops} lists
commonly used properties. For a complete reference check the help. To
change the properties, we again use the \code{set()} function. The
first argument is now a handle to the current figure, not the current
axis as before. Analogously to the \code{gca} command there is a
\code{gcf} (``get current figure'') command with which the handle can
be retrieved.
The script shown in the listing\,\ref{niceplotlisting} exemplifies
several features of the plotting system and automatically generates
and saves figure\,\ref{spikedetectionfig}. With any execution of this
script exactly the same plot will be created. If we decided to plot a
different recording, the format will stay exactly the same, just the
data changes. Of special interest are the lines 22 and 23 which set
the size of the figure and positions the axes on the paper. Line 26
finally saves the figure in the 'pdf' format to file. When calling the
function \code{saveas()} the first argument is the current figure
handle, the second the file name, and the last one defines the
output format (box\,\ref{graphicsformatbox}).
\begin{figure}[t]
\includegraphics{spike_detection} \titlecaption{Automatically
created plot.}{This plot has been created using the code in
listing\,\ref{niceplotlisting}.}\label{spikedetectionfig}
\end{figure}
\begin{ibox}[t]{\label{graphicsformatbox}File formats for digital artwork.}
There are two fundamentally different types of formats for digital artwork:
\begin{enumerate}
\item \enterm{Bitmaps}
\item \enterm{Vector graphics}
\end{enumerate}
When using bitmaps a color value is given for each pixel of the
stored figure. Bitmaps do have a fixed resolution (e.g.\,300\,dpi
--- dots per inch), they are very useful for photographs. In the
contrary, vector graphics store descriptions of the graphic in terms
of so called primitives (lines, circles, polygons, etc.). The main
advantage of a vector graphic is that it can be scaled without a
loss of quality.
\begin{minipage}[t]{0.38\textwidth}
\mbox{}\\[-2ex]
\includegraphics[width=0.85\textwidth]{VectorBitmap.pdf}
\rotatebox{90}{\footnotesize by Darth Stabro at en.wikipedia.org}
\end{minipage}
\hfill
\begin{minipage}[t]{0.5\textwidth}
Formats supported by \matlab{} \footnote{more information can be
found in the documentation of \code{saveas()}}:\\[2ex]
\begin{tabular}{|l|c|l|}
\hline \textbf{format} & \textbf{type} & \code{saveas()} \textbf{argument}
\erh \\ \hline pdf & vector & \varcode{'pdf'} \erb \\ eps &
vector & \varcode{'eps'}, \varcode{'epsc'} \\ SVG & vector &
\varcode{'svg'} \\ PS & vector & \varcode{'ps'}, \varcode{'psc'}
\\ jpg & bitmap & \varcode{'jpeg'} \\ tif & bitmap &
\varcode{'tiff'}, \varcode{'tiffn'} \\ png & bitmap &
\varcode{'png'} \\ bmp & bitmap & \varcode{'bmp'} \\ \hline
\end{tabular}
\end{minipage}
It is often meaningful to store of data plots generated by \matlab{}
using a vector graphics format. When in doubt they can usually be
easily converted to a bitmap format. The way from a bitmap to a
vector graphic is not possible without a loss in quality. Storing a
plot that contains a very large set of graphical elements (e.g.\,a
raster-plot showing thousands of action potentials) may, on the
other hand, lead to very large files that can be hard to
handle. Saving such a plot using a bitmap format may be more
efficient.
\end{ibox}
\lstinputlisting[caption={Script for creating the plot shown in
\figref{spikedetectionfig}.},
label=niceplotlisting]{automatic_plot.m}
\section{Plot examples}
So far we have introduced the standard line plots. Next to these there
are many more options to display scientific data. Mathworks shows
various examples and the respective code on their website
\url{http://www.mathworks.de/discovery/gallery.html}.
For some types of plots we present examples in the following sections.
\subsection{Line plot, subplots}
A very common scenario is to combine several plots in the same
figure. To do this we create so-called subplots
figures\,\ref{regularsubplotsfig},\,\ref{irregularsubplotsfig}. The
\code[subplot()]{subplot()} command allows to place multiple axes onto
a single sheet of paper. Generally, \varcode{subplot} expects three argument
defining the number of rows, column, and the currently active
plot. The currently active plot number starts with 1 and goes up to
$rows \cdot columns$ (numbers in the subplots in
figures\,\ref{regularsubplotsfig}, \ref{irregularsubplotsfig}).
\begin{figure}[t]
\includegraphics[width=0.5\linewidth]{regular_subplot}
\titlecaption{Subplots placed on a regular grid.}{By default all
subplots have the same size. See
listing\,\ref{regularsubplotlisting}. Subplot labeling has been
created using the \code[text()]{text()} annotation function (see
also below).}\label{regularsubplotsfig}
\end{figure}
\lstinputlisting[caption={Script for creating subplots in a regular
grid \figref{regularsubplotsfig}.}, label=regularsubplotlisting,
basicstyle=\ttfamily\scriptsize]{regular_subplot.m}
By default, all subplots have the same size, if something else is
desired, e.g.\ one subplot should span a whole row, while two others
are smaller and should be placed side by side in the same row, the
third argument of \varcode{subplot} can be a vector or numbers that
should be joined. These have, of course, to be adjacent numbers
(\figref{irregularsubplotsfig},
listing\,\ref{irregularsubplotslisting}).
\begin{figure}[ht]
\includegraphics[width=0.5\linewidth]{irregular_subplot}
\titlecaption{Subplots of different size.}{The third argument of
\varcode{subplot} may be a vector of cells that should be joined
into the same subplot. See
listing\,\ref{irregularsubplotslisting}}\label{irregularsubplotsfig}
\end{figure}
Not all cells of the grid, defined by the number of rows and
columns, need to be used in a plot. If you want to create something
more elaborate, or have more spacing between the subplots one can
create a grid with larger numbers of columns and rows, and specify the
used cells of the grid by passing a vector as the third argument to
\varcode{subplot}.
\lstinputlisting[caption={Script for creating subplots of different
sizes \figref{irregularsubplotsfig}.},
label=irregularsubplotslisting,
basicstyle=\ttfamily\scriptsize]{irregular_subplot.m}
\subsection{Show estimation errors}
The repeated measurements of a quantity almost always results in
varying results. Neuronal activity, for example is notoriously
noisy. The responses of a neuron to repeated stimulation with the same
stimulus may share common features but are different each time. This
is the reason we calculate measures that describe the variability of
such as the standard deviation and thus need a way to
illustrate it in plots of scientific data. Providing an estimate of
the error gives the reader the chance of assessing the reliability of
the data and get a feeling of possible significance of a
difference in the average values.
\matlab{} offers several ways to plot the average and the error. We
will introduce two possible ways.
\begin{itemize}
\item The \code[errorbar()]{errorbar} function (figure\,\ref{errorbarplot} A, B).
\item Using the \code[fill()]{fill} function to draw an area showing
the spread of the data (figure\,\ref{errorbarplot} C).
\end{itemize}
\subsubsection{Errorbar}
Using the \code[errorbar()]{errorbar} function is rather straight
forward. In its easiest form, it expects three arguments being the x-
and y-values plus the error (line 5 in listing \ref{errorbarlisting},
note that we provide additional optional arguments to set the
marker). This form is obviously only suited for symmetric
distributions. In case the values are symmetrically distributed, a
separate error for positive and negative deflections from the mean are
more apt. Accordingly, four arguments are needed (line 12 in listing
\ref{errorbarlisting}). The first two arguments are the same, the next
to represent the positive and negative deflections.
By default the \code{errorbar} function does not draw a marker. In the
examples shown here we provide extra arguments to define that a circle
is used for that purpose. The line connecting the average values can
be removed by passing additional arguments. The properties of the
errorbars themselves (linestyle, linewidth, capsize, etc.) can be
changed by taking the return argument of \code{errorbar} and changing
its properties. See the \matlab{} help for more information.
\begin{figure}[ht]
\includegraphics[width=0.9\linewidth]{errorbars}
\titlecaption{Adding error bars to a line plot}{\textbf{A}
symmetrical error around the mean (e.g.\ using the standard
deviation). \textbf{B} Errorbars of an asymmetrical distribution
of the data (note: the average value is now the median and the
errors are the lower and upper quartiles). \textbf{C} A shaded
area is used to illustrate the spread of the data. See
listing\,\ref{errorbarlisting}}\label{errorbarplot}
\end{figure}
\lstinputlisting[caption={Illustrating estimation errors. Script that
creates \figref{errorbarplot}.},
label=errorbarlisting, firstline=13, lastline=29,
basicstyle=\ttfamily\scriptsize]{errorbarplot.m}
\subsubsection{Fill}
For a few years now it has become fancy to illustrate the error not
using errorbars but by drawing a shaded area around the mean. Beside
their fancyness there is also a real argument in favor of using error
areas instead of errorbars: In case you have a lot of data points with
respective errorbars such that they would merge in the figure it is
cleaner and probably easier to read and handle if one uses an error
area instead. To achieve an illustration as shown in
figure\,\ref{errorbarplot} C, we use the \code{fill} command in
combination with a standard line plot. The original purpose of
\code{fill} is to draw a filled polygon. We hence have to provide it
with the vertex points of the polygon. For each x-value we now have
two y-values (average minus error and average plus error). Further, we
want the vertices to be connected in a defined order. One can achieve
this by going back and forth on the x-axis; we append a reversed
version of the x-values to the original x-values using the \code{cat}
and inversion is done using the \code{fliplr} command (line 3 in
listing \ref{errorbarlisting2}; Depending on the layout of your data
you may need concatenate along a different dimension of the data and
use \code{flipud} instead). The y-coordinates of the polygon vertices
are concatenated in a similar way (line 4). In the example shown here
we accept the polygon object that is returned by fill (variable p) and
use it to change a few properties of the polygon. The \emph{FaceAlpha}
property defines the transparency (or rather the opaqueness) of the
area. The provided alpha value is a number between 0 and 1 with zero
leading to invisibility and a value of one to complete
opaqueness. Finally, we use the normal plot command to draw a line
connecting the average values.
\lstinputlisting[caption={Illustrating estimation errors. Script that
creates \figref{errorbarplot}.}, label=errorbarlisting2,
firstline=30,
basicstyle=\ttfamily\scriptsize]{errorbarplot.m}
\subsection{Annotations, text}
Sometimes want to highlight certain parts of a plot or simply add an
annotation that does not fit or belong to the legend. In these cases
we can use the \code[text()]{text()} or
\code[annotation()]{annotation()} function to add this information to
the plot. While \varcode{text} simply prints out the given text string
at the defined position (for example line in
listing\,\ref{regularsubplotlisting}) the \varcode{annotation}
function allows to add some more advanced highlights like arrows,
lines, ellipses, or rectangles. Figure\,\ref{annotationsplot} shows
some examples, the respective code can be found in
listing\,\ref{annotationsplotlisting}. For more options consult the
documentation.
\begin{figure}[ht]
\includegraphics[width=0.5\linewidth]{annotations}
\titlecaption{Annotations in a plot.}{See
listing\,\ref{annotationsplotlisting}}\label{annotationsplot}
\end{figure}
\lstinputlisting[caption={Adding annotations to figures. Script that
creates \figref{annotationsplot}.},
label=annotationsplotlisting,
basicstyle=\ttfamily\scriptsize]{annotations.m}
\begin{important}[Positions in data or figure coordinates.]
A very confusing pitfall are the different coordinate systems used
by \varcode{text} and \varcode{annotation}. While \varcode{text}
expects the positions to be in data coordinates, i.e.\,in the limits
of the x- and y-axis, \varcode{annotation} requires the positions to
be given in normalized figure coordinates. Normalized means that the
width and height of the figure are expressed by numbers in the range
0 to 1. The bottom/left corner then has the coordinates $(0,0)$ and
the top/right corner the $(1,1)$.
Why different coordinate systems? Using data coordinates is
convenient for annotations within a plot, but what about an arrow
that should be drawn between two subplots?
\end{important}
\section{Summary}
A good plot of scientific data displays the data completely and
seriously without too many distractions. Misleading or suggestive
plots as may result from perspective presentations, inappropriate
scaling of axes of symbols should be avoided.
\noindent When combining several line plots within the same figure one should
consider adapting color \textbf{and} line style (solid, dashed,
dotted. etc.) to make the distinguishable even in black-and-white
prints. Combinations of red and green are no good choice since they
cannot be distinguished by people with red-green blindness.
\vspace{2ex}
Key ingredients for a good data plot:
\begin{itemize}
\item Clearness.
\item Complete labeling.
\item Plotted lines and curves must be distinguishable.
\item No suggestive or misleading presentation.
\item The right balance of line width, font size and size of the figure.
\item Error bars wherever they are appropriate.
\end{itemize}