This repository has been archived on 2021-05-17. You can view files and clone it, but cannot push or open issues or pull requests.
scientificComputing/plotting/lecture/plotting.tex
2020-12-13 09:43:44 +01:00

783 lines
39 KiB
TeX

\chapter{Graphical representation of scientific data}
We may count the ability of adequately presenting scientific data to
the core competences needed to do science. We need to present data in
a meaningful way that supports understanding of the data and the
results without biases.
\begin{figure}[hb!]
\includegraphics[width=0.9\columnwidth]{convincing}
\titlecaption{The consequences of bad plots may be
severe.}{\url{www.xkcd.com}}\label{xkcdplotting}
\end{figure}
\section{The \matlab{} plotting system}
Plotting data in \matlab{} is rather straight forward for simple line
plots. By calling \code[plot()]{plot(x, y)} a simple line plot will be
created. The resulting figure, however is missing any annotations like axis
labeling, a legend, etc.. There are two options to edit the plot: (i)
the graphical user interface (GUI) or the command line. Both ways have
their right to exist associated with respective pros and cons. The UI
way of editing plots is ideal for experimenting, the command line
approach is best suited for automation and to achieve a consistent
layout across figures and graphs in a paper or thesis.
\begin{figure}
\begin{minipage}[t]{0.6\textwidth}
\includegraphics[height=0.29\textheight]{plot_editor}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[height=0.29\textheight]{property_editor}
\end{minipage}
\titlecaption{The graphical plot-editor.}{From the menu ``Tools
$\rightarrow$ Edit Plot'' one can select the editor. Using the
mouse you can select different parts of the current plot (axes,
lines, the figure background, etc.) and the interface will change
to allow modifying the properties. Some properties are not offered
directly but hide behind the \emph{More Properties} button which
will open the \emph{Property Editor}.}\label{ploteditorfig}
\end{figure}
\vspace{1ex} While it is very convenient to edit a figure using the
GUI (\figref{ploteditorfig}), it is hard to re-create the exact same
plot later on or transfer the changes done to one figure to
another. \matlab{} figures consist of several graphical objects:
\begin{enumerate}
\item \enterm[figure]{Figure}: This object represents the whole
drawing area, it holds properties like background color, the size of
the figure/paper and the placement of the axes on the paper, etc..
\item \enterm[axes]{Axes}: The coordinate system for plotting the
data. Defines properties like the scaling of the axes, the labeling,
line widths, etc..
\item \enterm[lines]{Lines}: The drawn data lines. Holds properties
like line width and color, the name associated with the line, marker
size and many more.
\item \enterm[annotations]{Annotations}: Annotations like textboxes
and or arrows that can be used to highlight points or segments.
\item \enterm[legends]{Legends}: Legends of the data plot. One can
define the style of the legend, its placement in the plot, etc..
\end{enumerate}
Each of these objects offers a number of settings some of them can be
directly manipulated in the plot editor others are available via the
property editor.
\subsection{Avoiding manual editing of figures}
All properties that can be manipulated with the graphical interfaces
can also be edited using command line or the respective commands can
be included in a script or function. Creating the plot from inside a
script or function has the advantage that one can apply the same
settings to several figures, re-create the figure automatically when
the data was changed or the same kind of plot has to be created for a
number of datasets.
\begin{important}[Why manual editing should be avoided.]
On first glance manual editing of a figure using tools such as
inkscape, Corel draw, Illustrator, etc.\,appears much more
convenient and less complex than coding everything into the analysis
scripts. This, however, is not entirely true. What if the figure has
to be re-drawn or updated, because, for example, you got more data?
Then the editing work starts all over again. In addition, there is a
great risk associated with the manual editing approach. Axes may be
shifted, fonts have not been embedded into the final document,
annotations have been copy-pasted between figures and are not
valid. All of these mistakes can be found in publications and then
require an erratum, which is not desirable. Even if it appears more
cumbersome in the beginning, one should always try to generate
publication-ready figures directly from the data analysis tool using
scripts or functions to properly layout and annotate the plot.
\end{important}
\subsection{Simple plotting}
Creating a simple line-plot is rather easy. Assuming there exists a
variable \varcode{y} in the \entermde{Arbeitsbereich}{workspace} that
contains the measurement data it is enough to call
\code[plot()]{plot(y)}. At the first call of this function a new
\enterm{figure} will be opened and the data will be plotted with as a
line plot. If you repeatedly call this function the current plot will
be replaced unless the \code[hold]{hold on} command was issued
before. If it was, the current plot is held and a second line will be
added to it. Calling \code[hold]{hold off} will release the plot and
any subsequent plotting will replace the previous plot.
In our previous call to \varcode{plot} we provided just a single
variable containing the y-values of the plot. The x-axis will be
scaled from zero to the number of elements in \varcode{y} the x-values
are automatically substituted assuming a constant stepsize of 1. This
automatic scaling is probably not desired and thus, we need to provide
the missing information ourselves. Thus, we need a second variable
that contains the respective \varcode{x} values. The length of
\varcode{x} and \varcode{y} must be the same otherwise the later call
of the \varcode{plot} function will raise an error. The respective
call will expand to \code[plot()]{plot(x, y)}. The x-axis will now be
scaled from the minimum in \varcode{x} to the maximum of \varcode{x}
and by default it will be plotted as a line plot with a solid blue
line of the linewidth 1pt. A second plot that is added to the figure
will be plotted in red using the same settings. The order of the used
colors depends on the \enterm{colormap} settings which can be adjusted
to personal taste or need. Table\,\ref{plotlinestyles} shows some
predefined values that can be chosen for the line style, the marker,
or the color. For additional options consult the help.
\begin{table}[htp]
\titlecaption{Predefined line styles (left), colors (center) and
marker symbols (right).}{}\label{plotlinestyles}
\begin{tabular}[t]{lc} \hline
\textbf{line styles} & \textbf{abbreviation} \erh \\\hline solid
& '\verb|-|' \erb \\ dashed & '\verb|--|' \\ dotted &
'\verb|:|' \\ dash-dotted & '\verb|.-|' \\\hline
\end{tabular}
\hfill
\begin{tabular}[t]{lc} \hline
\textbf{color} & \textbf{abbreviation} \erh \\ \hline red & 'r' \erb
\\ green & 'g' \\ blue & 'b' \\ cyan & 'c' \\ magenta & 'm'
\\ yellow & 'y' \\ black & 'k' \\ \hline
\end{tabular}
\hfill
\begin{tabular}[t]{lc} \hline
\textbf{marker symbols} & \textbf{abbreviation} \erh \\ \hline circle &
'o' \erb \\ star & '*' \\ plus & '+' \\ cross & 'x' \\ diamond &
'd' \\ pentagram & 'p' \\ hexagram & 'h' \\ square & 's'
\\ triangle & '\^{}' \\ inverted triangle & 'v' \\ triangle left
& '$<$'\\ triangle right & '$>$'\\\hline
\end{tabular}
\end{table}
The following listing shows a simple line plot with axis labeling and a title
\pageinputlisting[caption={A simple plot showing a sinewave.},
label=simpleplotlisting]{simple_plot.m}
\subsection{Changing properties of a line plot}
The properties of line plots can be changed by passing more arguments
to the \varcode{plot} function. The command shown in
listing\,\ref{settinglineprops} creates a line plot using the dotted
line style, sets the line width to 1.5pt, a red line color is
chosen, and star marker symbols is used. Finally, the name of the
curve is set to \emph{plot 1} which will be displayed in a legend, if
chosen.
\begin{pagelisting}[label=settinglineprops, caption={Setting line properties when calling \varcode{plot}.}]
x = 0:0.1:2*pi; y = sin(x); plot( x, y, 'color', 'r', 'linestyle',
':', 'marker', '*', 'linewidth', 1.5, 'displayname', 'plot 1')
\end{pagelisting}
\begin{important}[Choosing the right color.]
Choosing the perfect color goes a little bit beyond personal
taste. When creating a colored plot you may want to consider the
following points:
\begin{itemize}
\item A substantial amount (about 9\%) of the male population can
not distinguish between red and green.
\item Can you distinguish the colors in a b/w respectively gray
scale print?
\item Color figures in publications sometimes cost extra money.
\end{itemize}
\end{important}
\subsection{Changing the axes properties}
The first thing a plot needs are axis labels with correct units. By
calling the functions \code[xlabel()]{xlabel('Time [ms]')} and
\code[ylabel()]{ylabel('Voltage [mV]')} these can be set. By default the
axes will be scaled to show the full extent of the data. The extremes
will be selected as the closest integer for small values or the next
full multiple of tens, hundreds, thousands, etc.\ depending on the
maximum value. If these defaults do not match your needs, the limits
of the axes can be explicitly set with the functions
\code[xlim()]{xlim()} and \code[ylim()]{ylim()}. To do this, the
functions expect a single argument, that is a 2-element vector
containing the minimum and maximum value. Table\,\ref{plotaxisprops}
lists some of the commonly adjusted properties of an axis. To set
these properties, we need to have the axes object which can either be
stored in a variable when calling \varcode{plot} (\varcode{axes =
plot(x,y);}) or can be retrieved using the \code{gca()} function
(gca stands for ``get current axes''). Changing the properties of the axes
object will update the plot (listing\,\ref{niceplotlisting}).
\begin{table}[tp]
\titlecaption{Incomplete list of axis properties.}{For a complete
list consult the help system or open the property editor when an
axis is selected (\figref{ploteditorfig}). If there is a default
value of a property it will be listed first.}\label{plotaxisprops}
\begin{tabular*}{1\textwidth}{lp{5.8cm}p{5.5cm}} \hline
\textbf{property} & \textbf{Description} & \textbf{options} \erh
\\ \hline \code{Box} & Defines whether the axes are drawn on all
sides. & $\{'on'|'off'\}$ \erb\\
\code{Color} & Background color of the drawing area, not the whole figure. & Any RGB or CMYK
values. \\
\code{FontName} & Name of the font used for labeling. & Installed fonts. \\
\code{FontSize} & Size of the font used for labels. & Any scalar value.\\
\code{FontUnit} & Unit in which the font size is given. & $\{'points' | 'centimeters' | 'inches',
...\}$\\ \code{FontWeight} & Bold or normal font. & $\{'normal' | 'bold'\}$\\
\code{TickDir} & Direction of the axis ticks. & $\{'in' | 'out'\}$\\
\code{TickLength} & Length of the ticks. & A scalar value\\
\code{X-, Y-, ZDir} & Direction of axis scaling. Zero bottom/left, or not? & $\{'normal' | 'reversed'\}$\\
\code{X-, Y-, ZGrid} & Defines whether grid lines for the respective axes should be plotted? &
$\{'off'|'on'\}$ \\
\code{X-, Y-, ZScale} & Linear of logarithmic scaling? & $\{'linear' | 'log'\}$\\
\code{X-, Y-, ZTick} & Position of the tick marks. & Vector of positions.\\
\code{X-, Y-, ZTickLabel} & Labels that should be use to label the ticks. & Vector of numbers or a cell-array of strings.\\ \hline
\end{tabular*}
\end{table}
\subsection{Changing figure properties}
\begin{table}[tp]
\titlecaption{Incomplete list of available figure properties.}{For a
complete reference consult the \matlab{} help or select the
property editor while having the figure background selected
(\figref{ploteditorfig}).}\label{plotfigureprops}
\begin{tabular*}{1\textwidth}{lp{6.6cm}p{5.7cm}} \hline
\textbf{property} & \textbf{description} & \textbf{options}
\erh \\
\hline \code{Color} & Background color of the figure, not the drawing area. & Any RGB, CMYK values. \erb
\\ \code{PaperPosition} & Position of the axes on the paper. & 4-element vector containing the positions of the bottom-left and top-right corners. \\
\code{PaperSize} & Size of the paper. & 2-element vector defining width and height.\\
\code{PaperUnits} & Unit in which size and position are given. & $\{'inches' | 'centimeters' |
'normalized' | 'points'\}$\\
\code{Visible} & Defines whether the plot should actually be drawn on screen. Useful when plots should not be displayed but directly saved to file. & $\{'on' | 'off'\}$\\ \hline
\end{tabular*}
\end{table}
Like the axes, also the figure has several properties that can be
adjusted to the current needs. Most notably the paper (figure) size
and the placement of the axes on the
paper. Table\,\ref{plotfigureprops} lists commonly used
properties. For a complete reference check the help. To change the
figure's appearance, we need to change the properties of the figure
object which can be retrieved during creation of the figure (\code[figure()]{fig
= figure();}) or by using the \code{gcf()} (``get current figure'')
command.
The script shown in the listing\,\ref{niceplotlisting} exemplifies
several features of the plotting system and automatically generates
and saves figure\,\ref{spikedetectionfig}. With any execution of this
script exactly the same plot will be created. If we decided to plot a
different recording, the format will stay exactly the same, just the
data changes. Of special interest are the lines 35 through 37 which
set the size of the figure and positions the axes on the paper. Lines
24 through 27 control the font used for labeling inside the axes. The
axes holds the default \varcode{FontSize} and via multipliers applied
to the default one can control the size of the title (line 26) or the
axes labels (line 27). Line 40 finally saves the figure in the 'pdf'
format to file. When calling the function \code{saveas()} the first
argument is the current figure handle, the second the file name, and
the last one defines the output format (box\,\ref{graphicsformatbox}).
\begin{figure}[t]
\includegraphics{spike_detection} \titlecaption{Automatically
created plot.}{This plot has been created using the code in
listing\,\ref{niceplotlisting}.}\label{spikedetectionfig}
\end{figure}
\begin{ibox}[tp]{\label{graphicsformatbox}File formats for digital artwork.}
There are two fundamentally different types of formats for digital artwork:
\begin{enumerate}
\item \enterm[bitmap]{Bitmaps} (\determ{Rastergrafik})
\item \enterm[vector graphics]{Vector graphics} (\determ{Vektorgrafik})
\end{enumerate}
When using bitmaps a color value is given for each pixel of the
stored figure. Bitmaps do have a fixed resolution (e.g.\,300\,dpi
--- dots per inch), they are very useful for photographs. In the
contrary, vector graphics store descriptions of the graphic in terms
of so called primitives (lines, circles, polygons, etc.). The main
advantage of a vector graphic is that it can be scaled without a
loss of quality.
\begin{minipage}[t]{0.38\textwidth}
\mbox{}\\[-2ex]
\includegraphics[width=0.85\textwidth]{VectorBitmap.pdf}
\rotatebox{90}{\footnotesize by Darth Stabro at en.wikipedia.org}
\end{minipage}
\hfill
\begin{minipage}[t]{0.5\textwidth}
Formats supported by \matlab{} \footnote{more information can be
found in the documentation of \code{saveas()}}:\\[2ex]
\begin{tabular}{|l|c|l|}
\hline \textbf{format} & \textbf{type} & \code{saveas()} \textbf{argument}
\erh \\ \hline pdf & vector & \varcode{'pdf'} \erb \\ eps &
vector & \varcode{'eps'}, \varcode{'epsc'} \\ SVG & vector &
\varcode{'svg'} \\ PS & vector & \varcode{'ps'}, \varcode{'psc'}
\\ jpg & bitmap & \varcode{'jpeg'} \\ tif & bitmap &
\varcode{'tiff'}, \varcode{'tiffn'} \\ png & bitmap &
\varcode{'png'} \\ bmp & bitmap & \varcode{'bmp'} \\ \hline
\end{tabular}
\end{minipage}
It is advisable to store of data plots generated by \matlab{}
using a vector graphics format. In doubt they can usually be
easily converted to a bitmap format. The way from a bitmap to a
vector graphic is not possible without a loss in quality. Storing a
plot that contains very large sets of graphical elements (e.g.\,a
raster-plot showing thousands of action potentials) may, on the
other hand, lead to very large files that can be hard to
handle. Saving such plots using a bitmap format may be more
efficient.
\end{ibox}
\pageinputlisting[caption={Script for creating the plot shown in
\figref{spikedetectionfig}.},
label=niceplotlisting]{automatic_plot.m}
\begin{ibox}[t]{\label{handlevsobjectbox}The wind of change.}
The way figure or axis properties can be adapted has been changed
with recent \matlab{} versions. In versions before \emph{R2014b}
properties could be read and set using the functions
\code[get()]{get} and \code[set()]{set}. The first argument these
functions expect are valid figure or axis \emph{handles} which were
returned by the \code{figure()} and \code{plot()} functions, or could be
retrieved using \code{gcf()} or \code{gca()} for the
current figure or axis handle, respectively. Subsequent arguments
passed to \code{set()} are pairs of a property's name and the desired
value.
\begin{lstlisting}[caption={Using set to change figure and axis properties.}]
frequency = 5; % frequency of the sine wave in Hz
time = 0.01:0.01:1.0; % the time axis in seconds
signal = sin(2 * pi * time * frequency);
plot(time, signal)
axes_handle = gca(); % get current axes
figure_handle = gcf(); % get current figure
set(axes_handle, 'XLabel', 'time [s]', 'YLabel', 'amplitude');
set(figure_handle, 'PaperSize', [5.5, 5.5], 'PaperUnit', 'centimeters', ...
'PaperPosition', [0, 0, 5.5, 5.5]);
\end{lstlisting}
With newer versions the handles returned by \code{gcf()} and
\code{gca()} are ``objects'' and setting properties became much
easier as it is used throughout this chapter. For downward
compatibility with older versions set and get still work in current
versions of \matlab{}.
\end{ibox}
\section{Plot examples}
So far we have introduced the standard line plots. Next to these there
are many more options to display scientific data. Mathworks shows
various examples and the respective code on their website
\url{http://www.mathworks.de/discovery/gallery.html}.
For some types of plots we present examples in the following sections.
\subsection{Scatter}
For displaying events or pairs of x-y coordinates the standard line
plot is not optimal. Rather, we use \code{scatter()} for this
purpose. For example, we have a number of measurements of a system's
response to a certain stimulus intensity. There is no dependency
between the data points, drawing them with a line-plot would be
nonsensical (figure\,\ref{scatterplotfig}\,A). In contrast to
\code{plot()} we need to provide x- and y-coordinates in order to
draw the data. In the example we also provide further arguments to set
the size, color of the dots and specify that they are filled
(listing\,\ref{scatterlisting1}).
\pageinputlisting[caption={Creating a scatter plot with red filled dots.},
label=scatterlisting1, firstline=9, lastline=9]{scatterplot.m}
We could have used plot for this purpose and set the marker to
something and the line-style to ``none'' to draw an equivalent
plot. Scatter, however offers some more advanced features that allows
to add two more dimensions to the plot
(figure\,\ref{scatterplotfig}\,B,\,C). For each dot one can define an
individual size and color. In this example the size argument is simply
a vector of the same size as the data that contains number from 1 to
the length of 'x' (line 1 in listing\,\ref{scatterlisting2}). To
manipulate the color we need to specify a length(x)-by-3 matrix. For
each dot we provide an individual color (i.e. the RGB triplet in each
row of the color matrix, lines 2-4 in listing\,\ref{scatterlisting2})
\pageinputlisting[caption={Creating a scatter plot with size and color
variations. The RGB triplets define the respective color intensity
in a range 0:1. Here, we modify only the red color channel.},
label=scatterlisting2, linerange={15-15, 21-23}]{scatterplot.m}
\begin{figure}[t]
\includegraphics{scatterplot}
\titlecaption{Scatterplots.}{Scatterplots are used to draw
datapoints where there is no direct dependency between the
individual measurements (like time). Scatter offers several
advantages over the standard plot command. One can vary the size
and/or the color of each dot.}\label{scatterplotfig}
\end{figure}
\subsection{Subplots}
A very common scenario is to combine several plots in the same
figure. To do this we create so-called subplots
figures\,\ref{regularsubplotsfig},\,\ref{irregularsubplotsfig}. The
\code[subplot()]{subplot()} command allows to place multiple axes onto
a single sheet of paper. Generally, \code{subplot()} expects three
argument defining the number of rows, column, and the currently active
plot. The currently active plot number starts with 1 and goes up to
$rows \cdot columns$ (numbers in the subplots in
figures\,\ref{regularsubplotsfig}, \ref{irregularsubplotsfig}).
\begin{figure}[t]
\includegraphics[width=0.5\linewidth]{regular_subplot}
\titlecaption{Subplots placed on a regular grid.}{By default all
subplots have the same size. See
listing\,\ref{regularsubplotlisting}. Subplot labeling has been
created using the \code[text()]{text()} annotation function (see
also below).}\label{regularsubplotsfig}
\end{figure}
\pageinputlisting[caption={Script for creating subplots in a regular
grid \figref{regularsubplotsfig}.}, label=regularsubplotlisting,
basicstyle=\ttfamily\scriptsize]{regular_subplot.m}
By default, all subplots have the same size, if something else is
desired, e.g.\ one subplot should span a whole row, while two others
are smaller and should be placed side by side in the same row, the
third argument of \code{subplot()} can be a vector or numbers that
should be joined. These have, of course, to be adjacent numbers
(\figref{irregularsubplotsfig},
listing\,\ref{irregularsubplotslisting}).
\begin{figure}[ht]
\includegraphics[width=0.5\linewidth]{irregular_subplot}
\titlecaption{Subplots of different size.}{The third argument of
\varcode{subplot} may be a vector of cells that should be joined
into the same subplot. See
listing\,\ref{irregularsubplotslisting}}\label{irregularsubplotsfig}
\end{figure}
Not all cells of the grid, defined by the number of rows and
columns, need to be used in a plot. If you want to create something
more elaborate, or have more spacing between the subplots one can
create a grid with larger numbers of columns and rows, and specify the
used cells of the grid by passing a vector as the third argument to
\code{subplot()}.
\pageinputlisting[caption={Script for creating subplots of different
sizes \figref{irregularsubplotsfig}.},
label=irregularsubplotslisting,
basicstyle=\ttfamily\scriptsize]{irregular_subplot.m}
\subsection{Show estimation errors}
The repeated measurements of a quantity almost always results in
varying results. Neuronal activity, for example is notoriously
noisy. The responses of a neuron to repeated stimulation with the same
stimulus may share common features but are different each time. This
is the reason we calculate measures that describe the variability of
such as the standard deviation and thus need a way to
illustrate it in plots of scientific data. Providing an estimate of
the error gives the reader the chance of assessing the reliability of
the data and get a feeling of possible significance of a
difference in the average values.
\matlab{} offers several ways to plot the average and the error. We
will introduce two possible ways.
\begin{itemize}
\item The \code[errorbar()]{errorbar} function (figure\,\ref{errorbarplot} A, B).
\item Using the \code[fill()]{fill} function to draw an area showing
the spread of the data (figure\,\ref{errorbarplot} C).
\end{itemize}
\subsubsection{Errorbar}
Using the \code[errorbar()]{errorbar} function is rather straight
forward. In its easiest form, it expects three arguments being the x-
and y-values plus the error (line 5 in listing \ref{errorbarlisting},
note that we provide additional optional arguments to set the
marker). This form is obviously only suited for symmetric
distributions. In case the values are symmetrically distributed, a
separate error for positive and negative deflections from the mean are
more apt. Accordingly, four arguments are needed (line 12 in listing
\ref{errorbarlisting}). The first two arguments are the same, the next
to represent the positive and negative deflections.
By default the \code{errorbar()} function does not draw a marker. In the
examples shown here we provide extra arguments to define that a circle
is used for that purpose. The line connecting the average values can
be removed by passing additional arguments. The properties of the
errorbars themselves (linestyle, linewidth, capsize, etc.) can be
changed by taking the return argument of \code{errorbar()} and changing
its properties. See the \matlab{} help for more information.
\begin{figure}[ht]
\includegraphics[width=0.9\linewidth]{errorbars}
\titlecaption{Indicating the estimation error in plots.}{\textbf{A}
symmetrical error around the mean (e.g.\ using the standard
deviation). \textbf{B} Errorbars of an asymmetrical distribution
of the data (note: the average value is now the median and the
errors are the lower and upper quartiles). \textbf{C} A shaded
area is used to illustrate the spread of the data. See
listing\,\ref{errorbarlisting} for A and C and listing\,\ref{errorbarlisting2} }\label{errorbarplot}
\end{figure}
\pageinputlisting[caption={Illustrating estimation errors using error bars. Script that
creates \figref{errorbarplot}. A, B},
label=errorbarlisting, firstline=13, lastline=31,
basicstyle=\ttfamily\scriptsize]{errorbarplot.m}
\subsubsection{Fill}
For a few years now it has become fancy to illustrate the error not
using errorbars but by drawing a shaded area around the mean. Beside
the fancyness there is also a real argument in favor of using error
areas instead of errorbars: In case you have a lot of data points with
respective errorbars such that they would merge in the figure it is
cleaner and probably easier to read and handle if one uses an error
area instead. To achieve an illustration as shown in
figure\,\ref{errorbarplot} C, we use the \code{fill()} command in
combination with a standard line plot. The original purpose of
\code{fill()} is to draw a filled polygon. We hence have to provide it
with the vertex points of the polygon. For each x-value we now have
two y-values (average minus error and average plus error). Further, we
want the vertices to be connected in a defined order. One can achieve
this by going back and forth on the x-axis; we append a reversed
version of the x-values to the original x-values using \code{cat()} and
\code{fliplr()} for concatenation and inversion, respectively (line 3 in
listing \ref{errorbarlisting2}; Depending on the layout of your data
you may need concatenate along a different dimension of the data and
use \code{flipud()} instead). The y-coordinates of the polygon vertices
are concatenated in a similar way (line 4). In the example shown here
we accept the polygon object that is returned by fill (variable p) and
use it to change a few properties of the polygon. The \emph{FaceAlpha}
property defines the transparency (or rather the opaqueness) of the
area. The provided alpha value is a number between 0 and 1 with zero
leading to invisibility and a value of one to complete
opaqueness. Finally, we use the normal plot command to draw a line
connecting the average values (line 12).
\pageinputlisting[caption={Illustrating estimation errors using a shaded area. Script that
creates \figref{errorbarplot} C.}, label=errorbarlisting2,
firstline=33,
basicstyle=\ttfamily\scriptsize]{errorbarplot.m}
\subsection{Annotations, text}
The \code[text()]{text()} or \code[annotation()]{annotation()} are
used for highlighting certain parts of a plot or simply adding an
annotation that does not fit or does not belong into the legend.
While \code{text()} simply prints out the given text string at the
defined position (for example line in
listing\,\ref{regularsubplotlisting}) the \code{annotation()}
function allows to add some more advanced highlights like arrows,
lines, ellipses, or rectangles. Figure\,\ref{annotationsplot} shows
some examples, the respective code can be found in
listing\,\ref{annotationsplotlisting}. For more options consult the
\matlab{} help.
\begin{figure}[ht]
\includegraphics[width=0.5\linewidth]{annotations}
\titlecaption{Annotations in a plot.}{See
listing\,\ref{annotationsplotlisting}}\label{annotationsplot}
\end{figure}
\pageinputlisting[caption={Adding annotations to figures. Script that
creates \figref{annotationsplot}.},
label=annotationsplotlisting,
basicstyle=\ttfamily\scriptsize]{annotations.m}
\begin{important}[Positions in data or figure coordinates.]
A very confusing pitfall are the different coordinate systems used
by \varcode{text()} and \varcode{annotation()}. While \varcode{text()}
expects the positions to be in data coordinates, i.e.\,in the limits
of the x- and y-axis, \varcode{annotation()} requires the positions to
be given in normalized figure coordinates. Normalized means that the
width and height of the figure are expressed by numbers in the range
0 to 1. The bottom/left corner then has the coordinates $(0,0)$ and
the top/right corner the $(1,1)$.
Why different coordinate systems? Using data coordinates is
convenient for annotations within a plot, but what about an arrow
that should be drawn between two subplots?
\end{important}
\subsection{Animations and movies}
A picture is worth a thousand words and sometimes creating animations
or movies is worth many pictures. They can help understanding complex
or time-dependent developments and may add some variety to a
presentation. The following example shows how a movie can be created
and saved to file. A similar mechanism is available to produce
animations that are supposed to be shown within \matlab{} but for this
we point to the documentation of the \code[movie()]{movie()}
command. The underlying principle is the same, however. The code shown
in listing\,\ref{animationlisting} creates an animation of a
Lissajous figure. The basic steps are:
\begin{enumerate}
\item Create a figure and set some basic properties (lines 7 --- 10).
\item Create a \code[VideoWriter()]{VideoWriter} object that, in this
example, takes the filename and the profile, the mpg-4 compression
profile, as arguments (line 12). For more options see the
documentation.
\item We can set the desired framerate and the quality of the video
(lines 13, 14). Quality is a value between 0 and 100, where 100 is
the best quality but leads to the largest files. The framerate
defines how quickly the individual frames will switched. In our
example, we create 500 frames and the video framerate is
25\,Hz. That is, the movie will have a duration of
$500/25 = 20$\,seconds.
\item Open the destination file (line 16). Opening means that the file
is created and opened for writing. This also implies that is has to
be closed after the whole process (line 31).
\item For each frame of the video, we plot the appropriate data (we
use \code{scatter()} for this purpose, line 20) and ``grab''
the frame (line 28). Grabbing is similar to making a screenshot of
the figure. The \code{drawnow()} command (line 27) is used to
stop the excution of the for loop until the drawing process is
finished.
\item Write the frame to file (line 29).
\item Finally, close the file (line 31).
\end{enumerate}
\pageinputlisting[caption={Making animations and saving them as a
movie.}, label=animationlisting, firstline=16, lastline=36,
basicstyle=\ttfamily\scriptsize]{movie_example.m}
\section{What makes a good plot?}
Plot should help/enable the interested reader to get a grasp of the
data and to understand the performed analysis and to critically assess
the presented results. The most important rule is the correct and
complete annotation of the plots. This starts with axis labels and
units and extends to legends. Incomplete annotation can have
terrible consequences (\figref{xkcdplotting}).
The principle of \emph{ink minimization} may be used as a guiding
principle for appealing plots. It requires that the relation of amount
of ink spent on the data and that spent on other parts of the plot
should be strongly in favor of the data. Ornamental or otherwise
unnecessary gimicks should not be used in scientific contexts. An
exception can be made if the particular figure was designed for
didactic purposes and sometimes for presentations.
\begin{important}[Correct labeling of plots]
A data plot must be sufficiently labeled:
\begin{itemize}
\item Every axis must have a label and the correct unit, if it has
one.\\ (e.g. \code[xlabel()]{xlabel('Speed [m/s]'}).
\item When more than one line is plotted, they have to be labeled
using the figure legend, or similar \matlabfun{legend()}.
\item If using subplots that show similar information on the axes,
they should be scaled to show the same ranges to ease comparison
between plots. (e.g. \code[xlim()]{xlim([0 100])}.\\ If one
chooses to ignore this rule one should explicitly state this in
the figure caption and/or the descriptions in the text.
\item Labels must be large enough to be readable. In particular,
when using the figure in a presentation use large enough fonts.
\end{itemize}
\end{important}
\section{Things that should be avoided.}
When plotting scientific data we should take great care to avoid
suggestive or misleading presentations. Unnecessary additions and
fancy graphical effects make a plot frivolous and also violate the
\emph{ink minimization principle}. Illustrations in comic style
(\figref{comicexamplefig}) are not suited for scientific data in most
instances. For presentations or didactic purposes, however, using a
comic style may be helpful to indicate that the figure is a mere
sketch and the exact position of the data points is of no importance.
\begin{figure}[t]
\includegraphics[width=0.7\columnwidth]{outlier}\vspace{-3ex}
\titlecaption{Comic-like illustration.}{Obviously not suited to
present scientific data. In didactic or illustrative contexts they
can be helpful to focus on the important
aspects.}\label{comicexamplefig}
\end{figure}
The following figures show examples of misleading or suggestive
presentations of data. Several of the effects have been exaggerated to
make the point. A little more subtlety these methods are employed to
nudge the viewers experience into the desired direction. You can find
more examples on \url{https://en.wikipedia.org/wiki/Misleading_graph}.
\begin{figure}[p]
\includegraphics[width=0.35\textwidth]{misleading_pie}
\hspace{0.05\textwidth}
\includegraphics[width=0.35\textwidth]{sample_pie}
\titlecaption{Perspective distortion influences the perceived
size.}{By changing the perspective of the 3-D illustration the
highlighted segment \textbf{C} gains more weight than it should
have. In the left graph segments \textbf{A} and \textbf{C} appear
very similar. The 2-D plot on the right-hand side shows that this
is an
illusion. \url{https://en.wikipedia.org/wiki/Misleading_graph}}\label{misleadingpiefig}
\end{figure}
\begin{figure}[p]
\includegraphics[width=0.9\textwidth]{plot_scaling.pdf}
\titlecaption{Choosing the figure format and scaling of the axes
influences the perceived strength of a correlation.}{All subplots
show the same data. By choosing a certain figure size we can
pronounce or reduce the perceived strength of the correlation
in the data. Technically all three plots are correct.
}\label{misleadingscalingfig}
\end{figure}
\begin{figure}[p]
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.8\textwidth]{improperly_scaled_graph}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.8\textwidth]{comparison_properly_improperly_graph}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.7\textwidth]{properly_scaled_graph}
\end{minipage}
\titlecaption{Scaling of markers and symbols.} {In these graphs
symbols have been used to illustrate the measurements made in two
categories. The measured value for category \textbf{B} is actually
three times the measured value for category \textbf{A}. In the
left graph the symbol for category \textbf{B} has been scaled to
triple height while maintaining the proportions. This appears just
fair and correct but leads to the effect that the covered surface
is not increased to the 3-fold but the 9-fold (center plot). The
plot on the right shows how it could have been done correctly.
\url{https://en.wikipedia.org/wiki/Misleading_graph}}\label{misleadingsymbolsfig}
\end{figure}
By using perspective effects in 3-D plot the perceived size can be
distorted into the desired direction. While the plot is correct in a
strict sense it is rather suggestive
(\figref{misleadingpiefig}). Similarly the choice of figure size and
proportions can lead to different interpretations of the
data. Stretching the y-extent of a graph leads to a stronger
impression of the correlation in the data. Compressing this axis will
lead to a much weaker perceived correlation
(\figref{misleadingscalingfig}). When using symbols to illustrate a
quantity we have to take care not to overrate of difference due to
symbol scaling (\figref{misleadingsymbolsfig}).
\section{Summary}
A good plot of scientific data displays the data completely and
seriously without too many distractions. Misleading or suggestive
plots as may result from perspective presentations, inappropriate
scaling of axes and symbols should be avoided.
\noindent When combining several line plots within the same figure one should
consider adapting color \textbf{and} line style (solid, dashed,
dotted. etc.) to make the distinguishable even in black-and-white
prints. Combinations of red and green are not a good choice since they
cannot be distinguished by people with red-green blindness.
\vspace{2ex}
Key ingredients for a good data plot:
\begin{itemize}
\item Clearness.
\item Complete labeling.
\item Plotted lines and curves must be distinguishable.
\item No suggestive or misleading presentation.
\item The right balance of line width, font size and size of the
figure, this may depend on the purpose, for presentations slightly
thicker lines help.
\item Error bars wherever they are appropriate.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\printsolutions