An Overview of different Clustering Algorithms


Clustering 

MacQueen defines a cluster as: a set of similar objects


Remarks 

Several remarks:
Also, see this page for an overview of different algorithms (most briefly explained here) with additional plots, implemented in VisualBasic: the source code (by Steven Lowette) is available here.


KMeans (MacQueen, 1967) 

The algorithm:


Voronoi Based Adaptive Kmeans Clustering (Thomas Schreiber, 1991) 



MaximumCut, MinimumCut, MeanSplit (Wu and Witten, 1985), MedianCut (Paul Heckbert, 1982), OptimalCut/Variance Based Clustering (Wan, Wong, Prusinkiewicz , 1988) 

Minimum cut problem: Given a connected graph G=(V,E), a capacity c:E>R+, and two nodes s and t, find a minimum st cut. FordFulkerson Labeling Algorithm MedianCut MedianCut
(in quantizing of colours:) The concept behind the median cut algorithm is
to use each of the colours in the synthesized colour map to represent an
equal number of pixels in the original image. This algorithm repeatedly
subdivides colour space into smaller and smaller rectangular boxes. We start
with one box which tightly encloses the colours of all NI*NJ pixels from the
original image. The number of different colours in this first box is
dependent on the colour resolution used. Experimental results show that 15
bits per colour (the resolution of the histogram) is sufficient in most
cases. Iteration step: split a box. The box is "shrunk" to fit tightly
around the points (colours) it encloses, by finding the minimum and maximum
values of each of the colour coordinates. Next we use "adaptive
partitioning" (Bentley's terminology) to decide which way to split the box.
The enclosed points are sorted along the longest dimension of the box, and
segregated into two halves at the median point. Approximately equal numbers
of points will fall on each side of the cutting plane. OptimalCut or Variance Based


Pairwise Clustering 

The algorithm:
Popular in reducing colours in an image.


Minimum Cost Spanning Tree Clustering Algorithms 

A Minimum Spanning Tree is: A minimumweight tree in a weighted graph which contains all of the graph's vertices So the problem is: Given a connected graph G=(Vertices, Edges) and a weight d: Edges > R+, find a minimum spanning tree. Prim's Algorithm Builds upon a single partial minimum spanning tree, at each step adding an edge connecting the vertex nearest to but not already in the current partial minimum spanning tree. The time required by Prim's algorithm is O(Vertices^{2}). It will be reduced to O(EdgeslogVertices) if the heap is used to keep {v in V\S_{i} : L(v) < infinity}. Kruskal's Algorithm Maintains a set of partial minimum spanning trees (called forest), and repeatedly adds the shortest edge in the graph whose vertices are in different partial minimum spanning trees. The time required by Kruskal's algorithm is O(EdgeslogVertices).


Sammon's Mapping (Sammon, 1969) 

The algorithm:


Sequential Leader (Hartigan, 1975) 

The algorithm:


SelfOrganizing Map (Kohonen, 1982) 

The algorithm:


Neural Gas, Growing Neural Gas (Fritzke), Growing Cell (Fritzke), Growing Grid 

Neural Gas The algorithm:
Growing Neural Gas The algorithm:
on top of this, new clusters are being added by observing the maximum accumulated errors:
Growing Cell The algorithm:
on top of this, new clusters are being added by observing the maximum accumulated errors, as before. Growing Grid The algorithm:
on top of this, new clusters are being added by observing the maximum accumulated errors.


LBG(Linde, Buzo & Gray) / generalized Lloyd (Lloyd) 

The algorithm:


References 

S. P. Lloyd. Least squares quantization in pcm. technical note, Bell
Laboratories, 1957. published in 1982 in IEEE Transactions on Information
Theory IT28: 129 137. MacQueen, J.B. (1967), "Some Methods for Classification and Analysis of Multivariate Observations," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281 297. John W. Sammon, Jr. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C18(5):401409, May 1969. Hartigan, J.A. (1975), Clustering Algorithms, New York: John Wiley & Sons, Inc. Y. Linde, A. Buzo, and R. M. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communication, COM28:95, 1980. B. Fritzke. Growing cell structures  a selforganizing network for unsupervised and supervised learning. Neural Networks, 7(9):1460, 1994. B. Fritzke. Fast learning with incremental RBF networks. Neural Processing Letters, 1(1):5, 1994. Schreiber, Thomas, "A Voronoi Diagram Based Adaptive Kmeanstype
Clustering Algorithm for Multidimensional Weighted Data," in Computational
GeometryMethods, Algorithms, and Applications, Proceedings of
International Workshop on Computational Geometry CG '91, SpringerVerlag,
March 1991. 

Compiled by Kristof Van Laerhoven.