Using Multivariate Gaussians for Kernel-based Classification: Some tests in Matlab.

We first read the data in Matlab, plot it, and calculate per class (sitting, standing and walking) the mean vector and covariance matrix:

>> S = bigreadnplot('a0_data.txt',1000); plot(S);

>> plot(S(500:1000,:))

>> mu1 = mean(S(1:300,:))

mu1 =

58.9500 71.2633

>> mu2 = mean(S(400:700,:))

mu2 =

70.3920 57.7874

>> mu3 = mean(S(800:1000,:))

mu3 =

67.1244 56.9055

>> si1 = cov(S(1:300,:))

si1 =

0.6430 -0.0002
-0.0002 0.2481

>> si2 = cov(S(400:700,:))

si2 =

0.3391 0.0203
0.0203 0.7680

>> si3 = cov(S(800:1000,:))

si3 =

121.5995 71.4218
71.4218 59.4160
 

We then see which value is highest if we take the multivariate Gaussian of each sample in S (the dataset) using the mean and covariance of the training sets:

>> for i=1:length(S), [k, m(i)] = max( [ multivargauss(mu1,si1,S(i,:)) multivargauss(mu2,si2,S(i,:)) multivargauss(mu3,si3,S(i,:))]); end;
>> subplot(2,1,1)
>> plot(m)
>> axis([0 5000 0 4])
>> subplot(2,1,2)
>> plot(S)

To have a more stable recognition, one can take the mean over 10 samples, which gives a clearer picture:

>> for i=1:length(m)-10, n(i) = mean(m(i:i+10));end;
>> plot(n)
>> axis([0 5000 0 4])


 

Note this is better than just minimizing the plain simple Manhattan/City Block distance:

>> for i=1:length(S), m1(i) = abs(S(i,1)-mu1(1))+abs(S(i,2)-mu1(2)) ; end
>> for i=1:length(S), m2(i) = abs(S(i,1)-mu2(1))+abs(S(i,2)-mu2(2)) ; end
>> for i=1:length(S), m3(i) = abs(S(i,1)-mu3(1))+abs(S(i,2)-mu3(2)) ; end
>> for i=1:length(S), [k, m(i)] = min([ m1(i) m2(i) m3(i) ]); end
>> subplot(2,1,1)
>> plot(m)
>> axis([0 5000 0 4])

>> subplot(2,1,2)
>> plot(S)
 

 

so we can also compare the two - again by averaging over 10 samples:

>> for i=1:length(m)-10, o(i) = mean(m(i:i+10));end;
>> subplot(2,1,1)

>> plot([o; n]')
 

We can do the same for the Euclidean distance, which gives us:

>> for i=1:length(S), m1(i) = euclid( S(i,1), mu1 ); end
>> for i=1:length(S), m2(i) = euclid( S(i,1), mu2 ); end
>> for i=1:length(S), m3(i) = euclid( S(i,1), mu3 ); end
>> for i=1:length(S), [k, m(i)] = min([ m1(i) m2(i) m3(i) ]); end
>> for i=1:length(m)-10, p(i) = mean(m(i:i+10));end;
>> subplot(2,1,1)
>> plot([n; o; p]')
 

 

Lets then plot each Gaussian:

>> for i=1:100, for j=1:100, m1(i,j) = multivargauss(mu1,si1,[i j]); end; end;
>> surfc([65:80],[55:80],m1(55:80,65:80))
 

 

>> for i=1:100, for j=1:100, m2(i,j) = multivargauss(mu2,si2,[i j]); end; end;
>> surfc([55:80],[65:80],m2(65:80,55:80))
 

>> for i=1:100, for j=1:100, m3(i,j) = multivargauss(mu3,si3,[i j]); end; end;
>> surfc([30:90],[30:100],m3(30:100,30:90))
 

Note that the third Gaussian is very small and stretched w.r.t. the other two!

 

function x = multivargauss(mu,si,v)

%MULTIVARGAUSS Given the covariance matrix si and the mean vector mu for a
% dataset, give the value for vector v on the multivariate Gaussian.
% x = multivargauss(mu,si,v)
%
% Kristof Van Laerhoven


x = (1/sqrt((2*pi)^(length(mu))*det(si)))*exp(-0.5*(v - mu)*inv(si)*(v-mu)');
 
function [S] = bigreadnplot(filename, interval)

%BIGREADNPLOT Read big sensor datafile, plot and put it in a matrix.
% [S] = bigreadnplot(filename,interval) reads the data into matrix S.
% and plots it every interval samples.
%
% Kristof Van Laerhoven

path = [];

if ~exist('filename'),
     [filename path]= uigetfile('*.txt','Pick a file:')
end;

i = 1;
S = [];
T = [1];

while (~isempty(T)),
     T = readsens(filename,i,i+interval);
     S = [S; T];
     i = i + interval;
     if ~isempty(T),
          plot(T);
     end
     drawnow;
end

end

 
function x = euclid(u,v)

%EUCLID Calculates the Euclidean distance between the vectors u, v.
% x = euclid(u,v)
%
% Kristof Van Laerhoven

x = 0;
for i=1:length(u),
     x = x + (u(i)-v(i))^2;
end;
x = sqrt(x);