[linalg] added solutions to exercise

This commit is contained in:
Jan Benda 2019-01-08 14:47:04 +01:00
parent 512255dfec
commit be35befc5b
19 changed files with 366 additions and 14 deletions

View File

@ -0,0 +1,19 @@
n = 1000;
x = 2.0*randn(n, 1);
z = 3.0*randn(n, 1);
rs = [-1.0:0.25:1.0];
for k = 1:length(rs)
r = rs(k);
y = r*x + sqrt(1.0-r^2)*z;
r
cv = cov([x y])
corrcoef([x y])
subplot(3, 3, k)
scatter(x, y, 'filled', 'MarkerEdgeColor', 'white')
title(sprintf('r=%g', r))
xlim([-10, 10])
ylim([-20, 20])
text(-8, -15, sprintf('cov(x,y)=%.2f', cv(1, 2)))
end
savefigpdf(gcf, 'covariance.pdf', 20, 20);

Binary file not shown.

View File

@ -100,14 +100,22 @@ jan.benda@uni-tuebingen.de}
\texttt{corrcoef}). How do these matrices look like for different \texttt{corrcoef}). How do these matrices look like for different
values of $r$? How do the values of the matrices change if you generate values of $r$? How do the values of the matrices change if you generate
$x$ and $z$ with larger variances? $x$ and $z$ with larger variances?
\begin{solution}
\lstinputlisting{covariance.m}
\includegraphics[width=0.8\textwidth]{covariance}
\end{solution}
\part Do the same analysis (scatter plot, covariance, and correlation coefficient) \part Do the same analysis (scatter plot, covariance, and correlation coefficient)
for \[ y = x^2 + 0.5 \cdot z \] for \[ y = x^2 + 0.5 \cdot z \]
Are $x$ and $y$ really independent? Are $x$ and $y$ really independent?
\begin{solution}
\lstinputlisting{nonlinearcorrelation.m}
\includegraphics[width=0.8\textwidth]{nonlinearcorrelation}
\end{solution}
\end{parts} \end{parts}
\question \qt{Principal component analysis in 2D\vspace{-3ex}} \question \qt{Principal component analysis in 2D\vspace{-3ex}}
\begin{parts} \begin{parts}
\part Generate pairs $(x,y)$ of Gaussian distributed random numbers such \part Generate $n=1000$ pairs $(x,y)$ of Gaussian distributed random numbers such
that all $x$ values have zero mean, half of the $y$ values have mean $+d$ that all $x$ values have zero mean, half of the $y$ values have mean $+d$
and the other half mean $-d$, with $d \ge0$. and the other half mean $-d$, with $d \ge0$.
\part Plot scatter plots of the pairs $(x,y)$ for $d=0$, 1, 2, 3, 4 and 5. \part Plot scatter plots of the pairs $(x,y)$ for $d=0$, 1, 2, 3, 4 and 5.
@ -115,11 +123,16 @@ jan.benda@uni-tuebingen.de}
\part Apply PCA on the data and plot a histogram of the data projected onto \part Apply PCA on the data and plot a histogram of the data projected onto
the PCA axis with the largest eigenvalue. the PCA axis with the largest eigenvalue.
What do you observe? What do you observe?
\begin{solution}
\lstinputlisting{pca2d.m}
\includegraphics[width=0.8\textwidth]{pca2d-2}
\end{solution}
\end{parts} \end{parts}
\newsolutionpage
\question \qt{Principal component analysis in 3D\vspace{-3ex}} \question \qt{Principal component analysis in 3D\vspace{-3ex}}
\begin{parts} \begin{parts}
\part Generate triplets $(x,y,z)$ of Gaussian distributed random numbers such \part Generate $n=1000$ triplets $(x,y,z)$ of Gaussian distributed random numbers such
that all $x$ values have zero mean, half of the $y$ and $z$ values have mean $+d$ that all $x$ values have zero mean, half of the $y$ and $z$ values have mean $+d$
and the other half mean $-d$, with $d \ge0$. and the other half mean $-d$, with $d \ge0$.
\part Plot 3D scatter plots of the pairs $(x,y)$ for $d=0$, 1, 2, 3, 4 and 5. \part Plot 3D scatter plots of the pairs $(x,y)$ for $d=0$, 1, 2, 3, 4 and 5.
@ -127,15 +140,19 @@ jan.benda@uni-tuebingen.de}
\part Apply PCA on the data and plot a histogram of the data projected onto \part Apply PCA on the data and plot a histogram of the data projected onto
the PCA axis with the largest eigenvalue. the PCA axis with the largest eigenvalue.
What do you observe? What do you observe?
\begin{solution}
\lstinputlisting{pca3d.m}
\includegraphics[width=0.8\textwidth]{pca3d-2}
\end{solution}
\end{parts} \end{parts}
\continue \continuepage
\question \qt{Spike sorting} \question \qt{Spike sorting}
Extracellular recordings often pick up action potentials originating Extracellular recordings often pick up action potentials originating
from more than a single neuron. In case the waveforms of the action from more than a single neuron. In case the waveforms of the action
potentials differ between the neurons one could assign each action potentials differ between the neurons one could assign each action
potential to the neuron it originated from. This process is called potential to the neuron it originated from. This process is called
``spike sorting''. Here we explore this methods on a simulated ``spike sorting''. Here we explore this method on a simulated
recording that contains action potentials from two different recording that contains action potentials from two different
neurons. neurons.
\begin{parts} \begin{parts}
@ -149,42 +166,77 @@ jan.benda@uni-tuebingen.de}
time vector (\texttt{waveformt}) are also contained in the file. time vector (\texttt{waveformt}) are also contained in the file.
\part Plot the voltage trace and mark the peaks of the detected \part Plot the voltage trace and mark the peaks of the detected
action potentials. Zoom into the plot and look whether you can action potentials (using \texttt{spiketimes}). Zoom into the plot
differentiate between two different waveforms of action and look whether you can differentiate between two different
potentials. How do they differ? waveforms of action potentials. How do they differ?
\part Cut out the waveform of each action potential (5\,ms before \part Cut out the waveform of each action potential (5\,ms before
and after the peak). Plot all these snippets in a single and after the peak). Plot all these snippets in a single
plot. Can you differentiate the two actionpotential waveforms? plot. Can you differentiate the two actionpotential waveforms?
\begin{solution}
\part Apply PCA on the waveform snippets, that is compute the \mbox{}\\[-3ex]\hspace*{5em}
eigenvalues and eigenvectors of their covariance matrix, and plot \includegraphics[width=0.8\textwidth]{spikesorting1}
the sorted eigenvalues (the ``eigenvalue spectrum''). How many \end{solution}
eigenvalues are clearly larger than zero?
\newsolutionpage
\part Apply PCA on the waveform snippets. That is compute the
eigenvalues and eigenvectors of their covariance matrix, which is
a $n \times n$ matrix, with $n$ being the number of data points
contained in a single waveform snippet. Plot the sorted
eigenvalues (the ``eigenvalue spectrum''). How many eigenvalues
are clearly larger than zero?
\begin{solution}
\mbox{}\\[-3ex]\hspace*{5em}
\includegraphics[width=0.8\textwidth]{spikesorting2}
\end{solution}
\part Plot the two eigenvectors (``features'') with the two \part Plot the two eigenvectors (``features'') with the two
largest eigenvalues. largest eigenvalues as a function of time.
\begin{solution}
\mbox{}\\[-3ex]\hspace*{5em}
\includegraphics[width=0.8\textwidth]{spikesorting3}
\end{solution}
\part Project the waveform snippets onto these two eigenvectors \part Project the waveform snippets onto these two eigenvectors
and display them with a scatter plot. What do you observe? Can you and display them with a scatter plot. What do you observe? Can you
separate two ``clouds'' of data points (``clusters'')? imagine how to separate two ``clouds'' of data points
(``clusters'')?
\newsolutionpage
\part Think about a very simply way how to separate the two \part Think about a very simply way how to separate the two
clusters. Generate a vector whose elements label the action clusters. Generate a vector whose elements label the action
potentials, e.g. that contains '1' for all snippets belonging to potentials, e.g. that contains '1' for all snippets belonging to
the one cluster and '2' for the waveforms of the other the one cluster and '2' for the waveforms of the other
cluster. Use this vector to mark the two clusters in the previous cluster. Use this vector to mark the two clusters in the previous
plot with two different colors. plot with two different colors.
\begin{solution}
\mbox{}\\[-3ex]\hspace*{5em}
\includegraphics[width=0.8\textwidth]{spikesorting4}
\end{solution}
\part Plot the waveform snippets of each cluster together with the \part Plot the waveform snippets of each cluster together with the
true waveform obtained from the data file. Do they match? true waveform obtained from the data file. Do they match?
\begin{solution}
\mbox{}\\[-3ex]\hspace*{5em}
\includegraphics[width=0.8\textwidth]{spikesorting5}
\end{solution}
\newsolutionpage
\part Mark the action potentials in the recording according to \part Mark the action potentials in the recording according to
their cluster identity. their cluster identity.
\begin{solution}
\mbox{}\\[-3ex]\hspace*{5em}
\includegraphics[width=0.8\textwidth]{spikesorting6}
\end{solution}
\part Compute interspike-interval histograms of all the (unsorted) \part Compute interspike-interval histograms of all the (unsorted)
action potentials, and of each of the two neurons. What do they action potentials, and of each of the two neurons. What do they
tell you? tell you?
\begin{solution}
\mbox{}\\[-3ex]\hspace*{5em}
\includegraphics[width=0.8\textwidth]{spikesorting7}
\lstinputlisting{spikesorting.m}
\end{solution}
\end{parts} \end{parts}
\end{questions} \end{questions}

Binary file not shown.

View File

@ -0,0 +1,9 @@
n = 1000;
x = randn(n, 1);
z = randn(n, 1);
y = x.^2 + 0.5*z;
scatter(x, y)
cov([x y])
r = corrcoef([x y])
text(-2, 8, sprintf('r=%.2f', r(1, 2)))
savefigpdf(gcf, 'nonlinearcorrelation.pdf', 15, 10);

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,39 @@
n = 1000;
ds = [0:5];
for k = 1:length(ds)
d = ds(k);
% generate data:
x = randn(n, 1);
y = randn(n, 1);
y(1:n/2) = y(1:n/2) - d;
y(1+n/2:end) = y(1+n/2:end) + d;
% scatter plot of data:
subplot(2, 2, 1)
scatter(x, y, 'filled', 'MarkerEdgeColor', 'white')
title(sprintf('d=%g', d))
xlabel('x')
ylabel('y')
% histogram of data:
subplot(2, 2, 3)
hist(x, 20)
xlabel('x')
% pca:
cv = cov([x y]);
[ev en] = eig(cv);
[en inx] = sort(diag(en), 'descend');
nc = [x y]*ev(:,inx);
% scatter in new coordinates:
subplot(2, 2, 2)
scatter(nc(:, 1), nc(:, 2), 'filled', 'MarkerEdgeColor', 'white')
title(sprintf('d=%g', d))
xlabel('pca 1')
ylabel('pca 2')
% histogram of data:
subplot(2, 2, 4)
hist(nc(:, 1), 20)
xlabel('pca 1')
if d == 2
savefigpdf(gcf, 'pca2d-2.pdf', 15, 15);
end
pause(1.0)
end

Binary file not shown.

View File

@ -0,0 +1,44 @@
n = 1000;
ds = [0:5];
for k = 1:length(ds)
d = ds(k);
% generate data:
x = randn(n, 1);
y = randn(n, 1);
z = randn(n, 1);
y(1:n/2) = y(1:n/2) - d;
y(1+n/2:end) = y(1+n/2:end) + d;
z(1:n/2) = z(1:n/2) - d;
z(1+n/2:end) = z(1+n/2:end) + d;
% scatter plot of data:
subplot(2, 2, 1)
scatter3(x, y, z, 'filled', 'MarkerEdgeColor', 'white')
title(sprintf('d=%g', d))
xlabel('x')
ylabel('y')
zlabel('z')
% histogram of data:
subplot(2, 2, 3)
hist(x, 20)
xlabel('x')
% pca:
cv = cov([x y z]);
[ev en] = eig(cv);
[en inx] = sort(diag(en), 'descend');
nc = [x y z]*ev(:,inx);
% scatter in new coordinates:
subplot(2, 2, 2)
scatter3(nc(:, 1), nc(:, 2), nc(:, 3), 'filled', 'MarkerEdgeColor', 'white')
title(sprintf('d=%g', d))
xlabel('pca 1')
ylabel('pca 2')
zlabel('pca 3')
% histogram of data:
subplot(2, 2, 4)
hist(nc(:, 1), 20)
xlabel('pca 1')
if d == 2
savefigpdf(gcf, 'pca3d-2.pdf', 15, 15);
end
pause(1.0)
end

View File

@ -0,0 +1,28 @@
function savefigpdf(fig, name, width, height)
% Saves figure fig in pdf file name.pdf with appropriately set page size
% and fonts
% default width:
if nargin < 3
width = 11.7;
end
% default height:
if nargin < 4
height = 9.0;
end
% paper:
set(fig, 'PaperUnits', 'centimeters');
set(fig, 'PaperSize', [width height]);
set(fig, 'PaperPosition', [0.0 0.0 width height]);
set(fig, 'Color', 'white')
% font:
set(findall(fig, 'type', 'axes'), 'FontSize', 12)
set(findall(fig, 'type', 'text'), 'FontSize', 12)
% save:
saveas(fig, name, 'pdf')
end

View File

@ -0,0 +1,161 @@
% load data into time, voltage and spiketimes
x = load('extdata');
time = x.time;
voltage = x.voltage;
spiketimes = x.spiketimes;
waveformt = x.waveformt;
waveform1 = x.waveform1;
waveform2 = x.waveform2;
% indices into voltage trace of spike times:
dt = time(2) - time(1);
tinx = round(spiketimes/dt) + 1;
% plot voltage trace with detected spikes:
figure(1);
plot(time, voltage, '-b')
hold on
scatter(time(tinx), voltage(tinx), 'r', 'filled');
xlabel('time [s]');
ylabel('voltage');
xlim([0.1, 0.4]) % zoom in
hold off
% spike waveform snippets:
snippetwindow = 0.005; % milliseconds
w = ceil(snippetwindow/dt);
vs = zeros(length(tinx), 2*w);
for k=1:length(tinx)
vs(k,:) = voltage(tinx(k)-w:tinx(k)+w-1);
end
% time axis for snippets:
ts = time(1:size(vs,2));
ts = ts - ts(floor(length(ts)/2));
% plot all snippets:
figure(2);
hold on
plot(1000.0*ts, vs, '-b');
title('spike snippets')
xlabel('time [ms]');
ylabel('voltage');
hold off
savefigpdf(gcf, 'spikesorting1.pdf', 12, 6);
% pca:
cv = cov(vs);
[ev , ed] = eig(cv);
[d, dinx] = sort(diag(ed), 'descend');
% features:
figure(2)
subplot(1, 2, 1);
plot(1000.0*ts, ev(:,dinx(1)), 'r', 'LineWidth', 2);
xlabel('time [ms]');
ylabel('eigenvector 1');
subplot(1, 2, 2);
plot(1000.0*ts, ev(:,dinx(2)), 'g', 'LineWidth', 2);
xlabel('time [ms]');
ylabel('eigenvector 2');
savefigpdf(gcf, 'spikesorting2.pdf', 12, 6);
% plot covariance matrix:
figure(3);
subplot(1, 2, 1);
imagesc(cv);
xlabel('time bin');
ylabel('time bin');
title('covariance matrix');
caxis([-0.1 0.1])
% spectrum of eigenvalues:
subplot(1, 2, 2);
scatter(1:length(d), d, 'b', 'filled');
title('spectrum');
xlabel('index');
ylabel('eigenvalue');
savefigpdf(gcf, 'spikesorting3.pdf', 12, 6);
% project onto eigenvectors:
nx = vs * ev(:,dinx(1));
ny = vs * ev(:,dinx(2));
% clustering (two clusters):
%kx = kmeans([nx, ny], 2);
% nx smaller or greater a threshold:
kthresh = 1.6;
kx = ones(size(nx));
kx(nx<kthresh) = 2;
% plot pca coordinates:
figure(4)
scatter(nx(kx==1), ny(kx==1), 'r', 'filled');
hold on;
scatter(nx(kx==2), ny(kx==2), 'g', 'filled');
hold off;
xlabel('projection onto eigenvector 1');
ylabel('projection onto eigenvector 2');
savefigpdf(gcf, 'spikesorting4.pdf', 12, 10);
% show sorted spike waveforms:
figure(5)
subplot(1, 2, 1);
hold on
plot(1000.0*ts, vs(kx==1,:), '-r');
plot(1000.0*waveformt, waveform2, '-k', 'LineWidth', 2);
xlim([1000.0*ts(1) 1000.0*ts(end)])
xlabel('time [ms]');
ylabel('waveform 1');
hold off
subplot(1, 2, 2);
hold on
plot(1000.0*ts, vs(kx==2,:), '-g');
plot(1000.0*waveformt, waveform1, '-k', 'LineWidth', 2);
xlim([1000.0*ts(1) 1000.0*ts(end)])
xlabel('time [ms]');
ylabel('waveform 2');
hold off
savefigpdf(gcf, 'spikesorting5.pdf', 12, 6);
% mark neurons in recording:
figure(1);
hold on;
scatter(time(tinx(kinx1)), voltage(tinx(kinx1)), 'r', 'filled');
scatter(time(tinx(kinx2)), voltage(tinx(kinx2)), 'g', 'filled');
hold off;
savefigpdf(gcf, 'spikesorting6.pdf', 12, 6);
% ISIs:
figure(7);
subplot(1, 3, 1)
allisis = diff(spiketimes);
bins = [0:0.005:0.2];
[h, b] = hist(allisis, bins);
bar(1000.0*b, h/sum(h)/mean(diff(b)))
title('all spikes')
xlabel('ISI [ms]')
xlim([0, 200.0])
subplot(1, 3, 2)
spikes1 = time(tinx(kx==1));
isis1 = diff(spikes1);
[h, b] = hist(isis1, bins);
bar(1000.0*b, h/sum(h)/mean(diff(b)))
title('neuron 1')
xlabel('ISI [ms]')
xlim([0, 200.0])
subplot(1, 3, 3)
spikes2 = time(tinx(kx==2));
isis2 = diff(spikes2);
[h, b] = hist(isis2, bins);
bar(1000.0*b, h/sum(h)/mean(diff(b)))
title('neuron 2')
xlabel('ISI [ms]')
xlim([0, 200.0])
savefigpdf(gcf, 'spikesorting7.pdf', 12, 6);

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.