An Overview of different Clustering Algorithms

 

back to notes

Clustering

 

 MacQueen defines a cluster as:  a set of similar objects

 

   

Remarks

Several remarks:

  1. Initialization: Most of the time random initialization is adequate, but other techniques tend to give better results.

  2. Distance: (See these notes for different metrics.) Different metrics have different effects on the algorithm's performance. Usually the Euclidean distance is used, but this is not always the best choice.

  3. Online/Offline: Most algorithms have both online and offline versions. Usually the online one is taken, since it fairly easy to generalize towards the offline version.

Also, see this page for an overview of different algorithms (most briefly explained here) with additional plots, implemented in VisualBasic: the source code (by Steven Lowette) is available here.

 

   

K-Means (MacQueen, 1967)

 

The algorithm:

  1. Choose K as the number of clusters

  2. Initialise the codebook vectors of the K clusters (randomly, for instance)

  3. For every new sample vector:

  4. Compute the distance between the new vector and every cluster's codebook vector

  5. Recompute the closest codebook vector with the new vector, using a learning rate that decreases in time.

 

   

Voronoi Based Adaptive K-means Clustering (Thomas Schreiber, 1991)

 

 

   

Maximum-Cut, Minimum-Cut, Mean-Split (Wu and Witten, 1985), Median-Cut (Paul Heckbert, 1982), Optimal-Cut/Variance Based Clustering (Wan, Wong, Prusinkiewicz , 1988)

 

Minimum cut problem: Given a connected graph G=(V,E), a capacity c:E->R+, and two nodes s and t, find a minimum s-t cut.

Ford-Fulkerson Labeling Algorithm

Median-Cut

Median-Cut

(in quantizing of colours:) The concept behind the median cut algorithm is to use each of the colours in the synthesized colour map to represent an equal number of pixels in the original image. This algorithm repeatedly subdivides colour space into smaller and smaller rectangular boxes. We start with one box which tightly encloses the colours of all NI*NJ pixels from the original image. The number of different colours in this first box is dependent on the colour resolution used. Experimental results show that 15 bits per colour (the resolution of the histogram) is sufficient in most cases. Iteration step: split a box. The box is "shrunk" to fit tightly around the points (colours) it encloses, by finding the minimum and maximum values of each of the colour coordinates. Next we use "adaptive partitioning" (Bentley's terminology) to decide which way to split the box. The enclosed points are sorted along the longest dimension of the box, and segregated into two halves at the median point. Approximately equal numbers of points will fall on each side of the cutting plane.
The above step is recursively applied until K boxes are generated. If, at some point in the subdivision, we attempt to split a box containing only one point (repeated many times, perhaps), the spare box (which would have gone unused) can be reassigned to split the largest box we can find. After K boxes are generated, the representative for each box is computed by averaging the colours contained in each. The list of representatives is the colour map Y.
The sorting used in the iteration step can be done conveniently and efficiently with a radix list sort [13], since the colour coordinates are small integers, generally in the range [0,255]. Splitting each box will therefore take time proportional to the number of different colours enclosed. Generating the colour map will take O(N*log(K)) time, where N is the number of different colours in the first box.
 

Optimal-Cut or Variance Based

 

   

Pairwise Clustering

 

The algorithm:

  1. Initially each vector from the dataset is a cluster.

  2. At each step we find the most similar pair of clusters.

  3. Merge these two clusters into one cluster.

  4. The chosen pair is the one that corresponds to the minimal value achieved by the distance function.

  5. Stop when obtain the desired number of clusters

Popular in reducing colours in an image.

 

   

Minimum Cost Spanning Tree Clustering Algorithms

 

A Minimum Spanning Tree is: A minimum-weight tree in a weighted graph which contains all of the graph's vertices

So the problem is: Given a connected graph G=(Vertices, Edges) and a weight d: Edges -> R+,  find a minimum spanning tree.

Prim's Algorithm

Builds upon a single partial minimum spanning tree, at each step adding an edge connecting the vertex nearest to but not already in the current partial minimum spanning tree.

The time required by Prim's algorithm is O(|Vertices|2). It will be reduced to O(|Edges|log|Vertices|) if the heap is used to keep {v in V\Si : L(v) < infinity}.

proofs

Kruskal's Algorithm

Maintains a set of partial minimum spanning trees (called forest), and repeatedly adds the shortest edge in the graph whose vertices are in different partial minimum spanning trees.

The time required by Kruskal's algorithm is O(|Edges|log|Vertices|).

cool demo

 

   

Sammon's Mapping (Sammon, 1969)

 

The algorithm:

  1. Choose a dimension of the output map (usually 2)

  2. Initialise the codebook vectors on the output map (randomly, for instance)

  3. For all sample vectors:

  4. Calculate the relative pairwise error of each codebook vector-pair between spaces

  5. Update the codebook vectors using gradient descent (minimizing the error)

  6. Repeat until an error threshold is reached

 

   

Sequential Leader (Hartigan, 1975)

 

The algorithm:

  1. Choose a cluster threshold value

  2. For every new sample vector:

  3. Compute the distance between the new vector and every cluster's codebook vector

  4. If the distance between the closest codebook vector and the new vector is smaller than the chosen threshold, then recompute the closest codebook vector with the new vector

  5. Otherwise, make a new cluster with the new vector as its codebook vector.

 

   

Self-Organizing Map (Kohonen, 1982)

 

The algorithm:

  1. Choose the dimension and size of the map

  2. For every new sample vector:

  3. Compute the distance between the new vector and every cluster's codebook vector

  4. Recompute all codebook vectors with the new vector, using both a distance radius on the map and a learning rate that decrease in time

 

   

Neural Gas, Growing Neural Gas (Fritzke), Growing Cell (Fritzke), Growing Grid

Neural Gas

The algorithm:

  1. Choose a number of clusters K

  2. For every new sample vector:

  3. Compute the distance between the new vector and every cluster's codebook vector

  4. Sort the clusters according to those distances

  5. Recompute all codebook vectors with the new vector, using the order after sorting

Growing Neural Gas

The algorithm:

  1. Start with 2 initialized clusters, and an empty connection set

  2. Choose a maximum age threshold for the connections

  3. For every new sample vector:

  4. Compute the distance between the new vector and every cluster's codebook vector

  5. Find the nearest and second-nearest clusters

  6. Connect them (using the connection set)

  7. Set the age to 0 (refresh)

  8. Recompute the closest cluster's codebook vector, and also all connected clusters

  9. Increment the age of all connections

  10. Delete the connections that are older than the maximum age threshold

on top of this, new clusters are being added by observing the maximum accumulated errors:

  1. Search the cluster with the highest accumulated error

  2. Search its connected cluster with the highest accumulated error

  3. Delete the connection between these two

  4. Make a new cluster with an interpolation between the two clusters' codebooks as a codebook

  5. connect the new cluster to the two others

Growing Cell

The algorithm:

  1. Choose a network dimension K,

  2. Connect K+1 clusters to each other so that there are K connections

  3. For every new sample vector:

  4. Compute the distance between the new vector and every cluster's codebook vector

  5. Find the nearest cluster

  6. Recompute the closest cluster's codebook vector, and also all connected clusters

on top of this, new clusters are being added by observing the maximum accumulated errors, as before.

Growing Grid

The algorithm:

  1. Choose the dimension and size of the map

  2. For every new sample vector:

  3. Compute the distance between the new vector and every cluster's codebook vector

  4. Recompute all codebook vectors with the new vector, using both a distance radius on the map and a learning rate that decrease in time

on top of this, new clusters are being added by observing the maximum accumulated errors.

 

   

LBG(Linde, Buzo & Gray) / generalized Lloyd (Lloyd)

 

The algorithm:

  1. Choose K the number of clusters

  2. For all sample vectors:

  3. Search for each cluster the closest sample vectors in the dataset (Voronoi set)

  4. Recompute every cluster's codebook vector using the mean of those sample vectors

  5. Stop if none of the clusters' codebook vectors change

 

   

References

S. P. Lloyd. Least squares quantization in pcm. technical note, Bell Laboratories, 1957. published in 1982 in IEEE Transactions on Information Theory IT-28: 129 137.

MacQueen, J.B. (1967), "Some Methods for Classification and Analysis of Multivariate Observations," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281 -297.

John W. Sammon, Jr. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5):401-409, May 1969.

Hartigan, J.A. (1975), Clustering Algorithms, New York: John Wiley & Sons, Inc.

Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communication, COM-28:-95, 1980.

B. Fritzke. Growing cell structures - a self-organizing network for unsupervised and supervised learning. Neural Networks, 7(9):-1460, 1994.

B. Fritzke. Fast learning with incremental RBF networks. Neural Processing Letters, 1(1):-5, 1994.

Schreiber, Thomas, "A Voronoi Diagram Based Adaptive K-means-type Clustering Algorithm for Multidimensional Weighted Data," in Computational Geometry--Methods, Algorithms, and Applications, Proceedings of International Workshop on Computational Geometry CG '91, Springer-Verlag, March 1991.


 

   
 

Compiled by Kristof Van Laerhoven.