k-means empty cluster

Check out this example of how empty clusters can happen: http://www.ceng.metu.edu.tr/~tcan/ceng465_f1314/Schedule/KMeansEmpty.html It basically means either 1) a random tremor in the force, or 2) the number of clusters k is wrong. You should iterate over a few different values for k and pick the best. If during your iterating you should encounter an empty cluster, place a random data point into that cluster and carry on. I hope this helped on your homework assignment last year.


Handling empty clusters is not part of the k-means algorithm but might result in better clusters quality. Talking about convergence, it is never exactly but only heuristically guaranteed and hence the criterion for convergence is extended by including a maximum number of iterations.

Regarding the strategy to tackle down this problem, I would say randomly assigning some data point to it is not very clever since we might be affecting the clusters quality since the distance to its currently assigned center is large or small. An heuristic for this case would be to choose the farthest point from the biggest cluster and move that the empty cluster, then do so until there are no empty clusters.

Tags:

K Means