This repository has been archived on 2021-05-17. You can view files and clone it, but cannot push or open issues or pull requests.
scientificComputing/plotting/lecture/plotting.tex
2017-11-02 17:17:10 +01:00

469 lines
23 KiB
TeX

\chapter{Graphical Representation of Scientific Data}
We may count the ability of adequately presenting scientific data to
the core competences needed to do science. We need to present data in
a meaningful way that fosters understanding of the data and the
results.
\begin{figure}[hb!]
\includegraphics[width=0.9\columnwidth]{convincing}
\titlecaption{The consequences of bad plots may be
severe.}{\url{www.xkcd.com}}\label{xkcdplotting}
\end{figure}
\section{What makes a good plot?}
Plot should help/enable the interested reader to get a grasp of the
data and to understand the performed analysis and to critically assess
the presented results. The most important rule is the correct and
complete annotation of the plots. This starts with axis labels and
units and and extends to legends. Incomplete annotation can have
terrible consequences (\figref{xkcdplotting}).
The principle of \emph{ink minimization} may be used a a guiding
principle for appealing plots. It requires that the relation of amount
of ink spent on the data and that spent on other parts of the plot
should be strongly in favor of the data. Ornamental of otherwise
unnecessary gimicks should not be used in scientific contexts. An
exception can be made if the particular figure was designed for
didactic purposes and sometimes for presentations.
\begin{important}[Correct labeling of plots]
A data plot must be sufficiently labeled:
\begin{itemize}
\item Every axis must have a label and the correct unit, if it has
one.\\ (e.g. \code[xlabel()]{xlabel('Speed [m/s]'}).
\item When more than one line is plotted, they have to be labeled
using the figure legend, or similar \matlabfun{legend()}.
\item If using subplots that show similar information on the axes,
they should be scaled to show the same ranges to ease comparison
between plots. (e.g. \code[xlim()]{xlim([0 100])}.\\ If one
chooses to ignore this rule one should explicitly state this in
the figure caption and/or the descriptions in the text.
\item Labels must be large enough to be readable. In particular,
when using the figure in a presentation use large enough fonts.
\end{itemize}
\end{important}
\section{Things that should be avoided.}
When plotting scientific data we should take great care to avoid
suggestive or misleading presentations. Unnecessary additions and
fancy graphical effects make a plot frivolous and also violate the
\emph{ink minimization principle}. Illustrations in comic style
(\figref{comicexamplefig}) are not suited for scientific data in most
instances. For presentations or didactic purposes, however, using a
comic style may be helpful to indicate that the figure is a mere
sketch and the exact position of the data points is of no importance.
\begin{figure}[t]
\includegraphics[width=0.7\columnwidth]{outlier}\vspace{-3ex}
\titlecaption{Comic-like illustration.}{Obviously not suited to
present scientific data. In didactic or illustrative contexts they
can be helpful to focus on the important
aspects.}\label{comicexamplefig}
\end{figure}
The following figures show examples of misleading or suggestive
presentaions of data. Several of the effects have been axaggerated to
make the point. A little more subtlely these methods are employed to
nudge the viewers experience into the desired direction. You can find
more examples on \url{https://en.wikipedia.org/wiki/Misleading_graph}.
\begin{figure}[p]
\includegraphics[width=0.35\textwidth]{misleading_pie}
\hspace{0.05\textwidth}
\includegraphics[width=0.35\textwidth]{sample_pie}
\titlecaption{Perspective distortion influendes the perceived
size.}{By changing the perspective of the 3-D illustration the
highlighted segment \textbf{C} gains more weight than it should
have. In the left graph segments \textbf{A} and \textbf{C} appear
very similar. The 2-D plot on the right-hand side shows that this
is an
illusion. \url{https://en.wikipedia.org/wiki/Misleading_graph}}\label{misleadingpiefig}
\end{figure}
\begin{figure}[p]
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.9\textwidth]{line_graph1}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.9\textwidth]{line_graph1_3}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.9\textwidth]{line_graph1_4}
\end{minipage}
\titlecaption{Chosing the figure format influences the erceived
strength of a correlation.}{All three subplots show the same data.
By choosing a certain figure size we can pronounce or to reduce
the perceived strength of the correlation in th data. Techincally
all three plots are correct.
\url{https://en.wikipedia.org/wiki/Misleading_graph}}\label{misleadingscalingfig}
\end{figure}
\begin{figure}[p]
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.8\textwidth]{improperly_scaled_graph}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.8\textwidth]{comparison_properly_improperly_graph}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[width=0.7\textwidth]{properly_scaled_graph}
\end{minipage}
\titlecaption{Scaling of markers and symbols.} {In these graphs
symbols have been used to illustrate the measurements made in two
categories. The measured value for category \textbf{B} is actually
three times the measured value for category \textbf{A}. In the
left graph the symbol for catergory \textbf{B} has been scaled to
triple heigth while maintaining the porpotions. This appears jusst
fair and correct but leads to the effect that the covered surface
is not increased to the 3-fold but the 9-fold (center plot). The
plot on the right shows how it could have been done correctly.
\url{https://en.wikipedia.org/wiki/Misleading_graph}}\label{misleadingsymbolsfig}
\end{figure}
By using perspective effects in 3-D plot the perceived size can be
distorted into the desired direction. While the plot is correct in a
strict sense it is rather suggestive
(\figref{misleadingpiefig}). Similarly the choice of figure size and
proportions can lead to different interpretations of the
data. Stretching the y-extent of a graph leads to a stronger
impression of the correlation in the data. Compressing this axis will
lead to a much weaker perceived correlation
(\figref{misleadingscalingfig}). When using symbols to illustrate a
quantity we have to take care not to overrate of difference due to
symbol scaling (\figref{misleadingsymbolsfig}).
\section{The \matlab{} plotting system}
Plotting data in \matlab{} is rather straight forward for simple line
plots. By calling \code[plot()]{plot(x, y)} a simple line plot will be
created. This figure, however is missing any annotations like axis
labeling, a legend, etc.. There are two options to edit the plot: (i)
the graphical user interface (GUI) or the command line. Both ways have
their right to exist associated with respective pros and cons. The UI
way of editing plots is ideal for experimenting the command line
approach is best suited for automation and to achieve a consistent
layout across figures and graphs.
\begin{figure}
\begin{minipage}[t]{0.6\textwidth}
\includegraphics[height=0.29\textheight]{plot_editor}
\end{minipage}
\begin{minipage}[t]{0.3\textwidth}
\includegraphics[height=0.29\textheight]{property_editor}
\end{minipage}
\titlecaption{The graphical plot-editor.}{From the menu ``Tools
$\rightarrow$ Edit Plot'' one can select the editor. Using the
mouse you can select different parts of the current plot (axes,
lines, the figure background, etc.) and the interface will change
to allow modifying the properties. Some properties are not offered
directly but hide behind the \emph{More Properties} button which
will open the \emph{Property Editor}.}\label{ploteditorfig}
\end{figure}
\vspace{1ex} While it is very convenient to edit a figure using the
GUI (\figref{ploteditorfig}), it is hard to re-create the exact same
plot later on or transfer the changes done to one figure to
another. \matlab{} figures consist of several graphical objects:
\begin{enumerate}
\item \enterm[figure]{Figure}: This object represents the whole
drawing area, it holds properties like background color, the size of
the figure/paper and the placement of the axes on the paper, etc..
\item \enterm[axes]{Axes}: The coordinate system for plotting the
data. Defines properties like the scaling of the axes, the labeling,
line widths, etc..
\item \enterm[lines]{Lines}: The drawn data lines. Holds properties
like line width and color, the name associated with the line, marker
size and many more.
\item \enterm[annotations]{Annotations}: Annotations like textboxes
and or arrows that can be used to highlight points or segments.
\item \enterm[legends]{Legends}: Legends of the data plot. One can
define the style of the legend, its placement in the plot, etc..
\end{enumerate}
Each of these objects offers a number of settings some of them can be
directly manipulated in the plot editor others are available via the
property editor.
\subsection{Avoiding manual editing of figures}
All properties that can be manipulated with the graphical interfaces
can also be edited using command line or the respective commands can
be included in a script or function. Creating the plot from inside a
script or function has the advantage that one can apply the same
settings to several figures, re-create the figure automatically when
the data was changed or the same kind of plot has to be created for a
number of datasets.
\begin{important}[Why manual editing should be avoided.]
On first glance the manual editing of a figure using common tools
like Corel draw, Illustrator, etc.\,appears some much more
convenient and less complex. This, however, is not entirely
true. What if the figure has to be re-drawn or updated? Then the
editing work starts all over again. Rather, there is a great risk
associated with this approach. Axes are shifted, fonts have not been
embedded into the final document, annotations have been copy pasted
between figures and are not valid. All of these mistakes can be
found in publications and then require an erratum, which is not
desirable. Even if it appears more cumbersome in the beginning one
should always try to create publication-ready figures directly from
the data analysis tool using scripts or functions to properly
layout the plot.
\end{important}
\subsection{Simple plotting}
Creating a simple line-plot is rather easy. Assuming there exists a
varaible \varcode{y} in the \codeterm{Workspace} that contains the
measurement data it is enough to call \code[plot()]{plot(y)}. At the
first call of this function a new window will be opened and the data
will be plotted with as a line plot. If you repeatedly call this
function the current plot will be replaced unless the the
\code[hold]{hold on} command was issued before. If it was, the current
plot is held and a second line will be added to it. Calling
\code[hold]{hold off} will release the plot and any subsequent
plotting will replace the previous plot.
In our previous call to \varcode{plot} we have provided just a single
variable containing the y-values of the plot. The x-axis will be
scaled from zero to the number of elements in \varcode{y} the x-values
are automatically substituted assuming a constant stepsize of 1. This
automatic scaling is probably not desired and thus, we need to provide
the missing information ourselves. The respective call will expand to
\code[plot()]{plot(x, y)}. In axis will be scaled from the minimum in
\varcode{x} to the maximum of \varcode{x} and by default it will be
plotted as a line plot with a solid blue line of the with 1pt. A
second plot that is added to the figure will be plotted in red using
the same standard settings. The order of the used colors depends on
the \enterm{colormap} settings which can be adjusted to personal taste
or need. Table\,\ref{plotlinestyles} shows some predefined values
that can be chosen for the line style, the marker, or the color. For
additional options consult the help.
\begin{table}[tp]
\titlecaption{Predefined line styles (left), colors (center) and
marker symbols (right).}{}\label{plotlinestyles}
\begin{tabular}[t]{lc} \hline
\textbf{line styles} & \textbf{abbreviation} \erh \\\hline solid
& '\verb|-|' \erb \\ dashed & '\verb|--|' \\ dotted &
'\verb|:|' \\ dash-dotted & '\verb|.-|' \\\hline
\end{tabular}
\hfill
\begin{tabular}[t]{lc} \hline
\textbf{color} & \textbf{abbreviation} \erh \\ \hline red & 'r' \erb
\\ green & 'g' \\ blue & 'b' \\ cyan & 'c' \\ magenta & 'm'
\\ yellow & 'y' \\ black & 'k' \\ \hline
\end{tabular}
\hfill
\begin{tabular}[t]{lc} \hline
\textbf{marker symbols} & \textbf{abbreviation} \erh \\ \hline circle &
'o' \erb \\ star & '*' \\ plus & '+' \\ cross & 'x' \\ diamond &
'd' \\ pentagram & 'p' \\ hexagram & 'h' \\ square & 's'
\\ triangle & '\^{}' \\ inverted triangle & 'v' \\ triangle left
& '$<$'\\ triangle right & '$>$'\\\hline
\end{tabular}
\end{table}
\subsection{Changing properties of a line plot}
Die properties of line plots can be changed by passing more arguments
to the \varcode{plot} function. The command show in
listing\,\ref{settinglineprops} creates line plot using the dotted
line style, setting the line width to 1.5pt, a red line color is
chosen, and star marker symbols will be used. Finally the name of the
curve is set to 'plot 1' which will be displayed in a legend, if
chosen.
\begin{lstlisting}[label=settinglineprops, caption={Setting line properties when calling \varcode{plot}.}]
x = 0:0.1:2*pi; y = sin(x); plot( x, y, 'color', 'r', 'linestyle',
':', 'marker', '*', 'linewidth', 1.5, 'displayname', 'plot 1')
\end{lstlisting}
\begin{important}[Choosing the right color.]
Choosing the perfect color goes a little bit beyond personal
taste. When creating a colored plot you may want to consider the
following:
\begin{itemize}
\item A substantial amount (about 9\%) of the male population can
not distinguish between red and green.
\item Can you distinguish the colors in a b/w respectively gray
scale print?
\item Color figures in publications often cost extra money.
\end{itemize}
\end{important}
\subsection{Changing the axis properties}
The first thing a data plot needs are axis labels with a correct
unit. By calling the functions \code[xlabel]{xlabel('Time [ms]')} and
\code[ylabel]{ylabel{'Voltage [mV]'}} these can be set. By default the
axes will be scaled to show the whole data range. The extremes will be
selected as the closest integer for small values of the next full
multiple of tens, hundreds, thousands, etc. depending on the maximum
value. If these defaults do not match our needs the limits of the axes
can be explicitly set with the functions \code[xlim()]{xlim()} and
\code[ylim()]{ylim()} functions. To do this, the functions expect a
single argument that is a vector containing the minimum and maximum
value. Table\,\ref{plotaxisprops} list some of the commonly adjusted
properties of an axis. These properties can be set using the
\code[set()]{set()} function. The \code{set} function expects as a
first argument a \enterm{handle} of the affected axis. An axis handle
of the current plot is returned by the \code[gca]{gca} function (gca
stands for ``get current axis''). The following arguments passed to
\code{set} are pairs of the property name and the desired value. It is
possible to set any number of properties using a single call to
\code{set}. See listing\,\ref{niceplotlisting} (lines 20 and 21) for an
example.
\begin{table}[tp]
\titlecaption{Incomplete list of axis properties.}{For a complete
list consult the help system or open the property editor when an
axis is selected (\figref{ploteditorfig}). If there is a default
value of a property it will be listed first.}\label{plotaxisprops}
\begin{tabular*}{1\textwidth}{lp{5.8cm}p{5.5cm}} \hline
\textbf{property} & \textbf{Description} & \textbf{options} \erh
\\ \hline \code{Box} & Defines whether the axes are drawn an all
sides. & $\{'on'|'off'\}$ \erb\\
\code{Color} & Background color of the drawing area, not the whole figure. & Any RGB or CMYK
values. \\
\code{FontName} & Name of the font used for labeling. & Installed fonts. \\
\code{FontSize} & Fontsize used for labels. & any scalar value.\\
\code{FontUnit} & Unit in which the font size is given. & $\{'points' | 'centimeters' | 'inches',
...\}$\\ \code{FontWeight} & Bold or normal font. & $\{'normal' | 'bold'\}$\\
\code{TickDir} & Direction of the axis ticks. & $\{'in' | 'out'\}$\\
\code{TickLength} & Length of the ticks. & scalar value\\
\code{X-, Y-, ZDir} & Direction of axis scaling. Zero bottom/left, or not? & $\{'normal' | 'reversed'\}$\\
\code{X-, Y-, ZGrid} & Defines whether grid line for the respective axes should be plotted? &
$\{'off'|'on'\}$ \\
\code{X-, Y-, ZScale} & Linear of logarithmic scaling? & $\{'linear' | 'log'\}$\\
\code{X-, Y-, ZTick} & Position of the tick marks. & Vector of positions.\\
\code{X-, Y-, ZTickLabel} & Labels that should be use to label the ticks. & Vector of numbers or a cell-array of strings.\\ \hline
\end{tabular*}
\end{table}
\subsection{Changing the figure properties}
\begin{table}[tp]
\titlecaption{Incomple list of available figure properties.}{For a complete reference consult the \matlab{} help or select the property editor while having the figuree background selected
(\figref{ploteditorfig}).}\label{plotfigureprops}
\begin{tabular*}{1\textwidth}{lp{6.6cm}p{5.7cm}} \hline
\textbf{property} & \textbf{description} & \textbf{options}
\erh \\
\hline \code{Color} & Background color of the figure, not the drawing area. & Any RGB, CMYK values. \erb
\\ \code{PaperPosition} & Position of the axes on the paper. & 4-element vector containing the positions of the botom-left and top-right corners. \\
\code{PaperSize} & Size of the paper. & 2-element vector defining width and height.\\
\code{PaperUnits} & Unit in which size and postition are given. & $\{'inches' | 'centimeters' |
'normalized' | 'points'\}$\\
\code{Visible} & Defines whether the plot should actually be drawn on screen. Useful when plots should not be displayed but directly saved to file. & $\{'on' | 'off'\}$\\ \hline
\end{tabular*}
\end{table}
Like axes also the whole figure has several properties that can be
adjusted to the current needs. Most notably the paper (figure) size
and the placement of the axes on the
paper. Table\,\ref{plotfigureprops} lists commonly used ones. For a
complete reference check the help. To change the properties, we again
use the \code{set()} function. The first argument is now a handle to
the current figure, not the current axis as before. Analogously to the
\code{gca} command there is a \code{gcf} (``get current figure'')
command with which the handle can be retrieved.
The script shown in the listing\,\ref{niceplotlisting} exemplifies
several features of the plotting system and automatically generates
and saves figure\,\ref{spikedetectionfig}. With any execution of this
script exactly the same plot will be created. If we decided to plot a
different recording, the format will stay exactly the same, just the
data changes. Of special interest are the lines 22 and 23 which set
the size of the figure and line 26 which saves the figure in the 'pdf'
format to file. When calling the function \code{saveas()} the first
argument is the current figure handle a, the second the file name, and
the last one defines the output format (box\,\ref{graphicsformatbox}).
\begin{figure}[t]
\includegraphics{spike_detection} \titlecaption{Automatically
created plot.}{This plot has been created using the code in
listing\,\ref{niceplotlisting}.}\label{spikedetectionfig}
\end{figure}
\begin{ibox}[t]{\label{graphicsformatbox}File formats for digital artwork.}
There are two fundamentally different types of formats for digital artwork:
\begin{enumerate}
\item \enterm{Bitmaps}
\item \enterm{Vector graphics}
\end{enumerate}
When using bitmaps a color is given for each pixel of the stored
figure. Bitmaps do have a fixed resolution (e.g.\,300\,dpi --- dots
per inch), they are very useful for photographs. In the contrary
vector graphics store descriptions of the graphic in terms of so
called primitives (lines, circles, polygons, etc.). The main
advantage of a vector graphic is that it can be scaled without a
loss of quality.
\begin{minipage}[t]{0.38\textwidth}
\mbox{}\\[-2ex]
\includegraphics[width=0.85\textwidth]{VectorBitmap.pdf}
\rotatebox{90}{\footnotesize by Darth Stabro at en.wikipedia.org}
\end{minipage}
\hfill
\begin{minipage}[t]{0.5\textwidth}
Formats supported by \matlab{} \footnote{more information can be
found in the documentation of \code{saveas()}}:\\[2ex]
\begin{tabular}{|l|c|l|}
\hline \textbf{format} & \textbf{type} & \code{saveas()} \textbf{argument}
\erh \\ \hline pdf & vector & \varcode{'pdf'} \erb \\ eps &
vector & \varcode{'eps'}, \varcode{'epsc'} \\ SVG & vector &
\varcode{'svg'} \\ PS & vector & \varcode{'ps'}, \varcode{'psc'}
\\ jpg & bitmap & \varcode{'jpeg'} \\ tif & bitmap &
\varcode{'tiff'}, \varcode{'tiffn'} \\ png & bitmap &
\varcode{'png'} \\ bmp & bitmap & \varcode{'bmp'} \\ \hline
\end{tabular}
\end{minipage}
It is often meaningful to store of data plots generated by \matlab{}
using a vector graphics format. When in doubt they can usually be
easily converted to a bitmap format. The way from a bitmap to a
vector graphic is not possible without a loss in quality. Storing a
plot that contains a very large set of graphical elements (e.g.\,a
raster-plot showing thousands of action potentials) may, on the
other hand, lead to very large files that can be hard to
handle. Saving such a plot using a bitmap format may be more
efficient.
\end{ibox}
\lstinputlisting[caption={Skript zur Erstellung des Plots in
\figref{spikedetectionfig}.},
label=niceplotlisting]{automatic_plot.m}
Next to the standard line plots there are many more options to display
scientific data. Mathworks shows various examples and the respective
code on their website
\url{http://www.mathworks.de/discovery/gallery.html}.
\section{Conclusion}
A good plot of scientific data displays the data completely and
seriously without too many distractions. Misleading or suggestive
plots as may result from perspective presentations, inappropriate
scaling of axes of symbols should be avoided.
\noindent When combining several line plots within the same figure one should
consider adapting color \textbf{and} line style (solid, dashed,
dotted. etc.) to make the distinguishable even in black-and-white
prints. Combinations of red and green are no good choice since they
cannot be distinguished by people with red-green blindness.
\vspace{2ex}
Key ingredients for a good data plot:
\begin{itemize}
\item Clearness.
\item Complete labeling.
\item Plotted lines and curves must be distinguishable.
\item No suggestive or misleading presentation.
\item The right balance of line width, font size and size of the figure.
\item Error bars wherever they are appropriate.
\end{itemize}