Affinity Propagation preferences initialization

No, there is no flaw. AP does not use distances, but requires you to specify a similarity. I don't know the scikit implementation so well, but according to what I read, it uses negative squared Euclidean distances by default to compute the similarity matrix. If you set the input preference to the minimal Euclidean distance, you get a positive value, while all similarities are negative. So this will typically result in as many clusters as you have samples (note: the higher the input preference, the more clusters). I'd rather suggest to set the input preference to the minimal negative squared distance, i.e. -1 times the square of the largest distance in the data set. This will give you a much smaller number of clusters, but not necessarily one single cluster. I don't know whether the preferenceRange() function exists also in the scikit implementation. There is Matlab code on the AP homepage and it is also implemented in the R package 'apcluster' that I am maintaining. This function allows for determining meaningful bounds for the input preference parameter. I hope that helps.

Affinity Propagation preferences initialization

Tags:

Machine Learning

Cluster Analysis

Unsupervised Learning

Scikit Learn

Related

Recent Posts