Vector Space Model: Cosine Similarity vs Euclidean Distance

I'll answer the questions in reverse order. For your second question, Cosine Similarity and Euclidian Distance are two different ways to measure vector similarity. The former measures the similarity of vectors with respect to the origin, while the latter measures the distance between particular points of interest along the vector. You can use either in isolation, combine them and use both, or look at one of many other ways to determine similarity. See these slides from a Michael Collins lecture for more info.

Your first question isn't very clear, but you should be able to use either measure to find a distance between two vectors regardless of whether you're comparing documents or your "models" (which would more traditionally be described as clusters, where the model is the sum of all clusters).


One informal but rather intuitive way to think about this is to consider the 2 components of a vector: direction and magnitude.

Direction is the "preference" / "style" / "sentiment" / "latent variable" of the vector, while the magnitude is how strong it is towards that direction.

When classifying documents we'd like to categorize them by their overall sentiment, so we use the angular distance.

Euclidean distance is susceptible to documents being clustered by their L2-norm (magnitude, in the 2 dimensional case) instead of direction. I.e. vectors with quite different directions would be clustered because their distances from origin are similar.