cosine similarity built-in function in matlab

Short version by calculating the similarity with pdist:

S2 = squareform(1-pdist(S1,'cosine')) + eye(size(S1,1));

Explanation:

pdist(S1,'cosine') calculates the cosine distance between all combinations of rows in S1. Therefore the similarity between all combinations is 1 - pdist(S1,'cosine') .

We can turn that into a square matrix where element (i,j) corresponds to the similarity between rows i and j with squareform(1-pdist(S1,'cosine')).

Finally we have to set the main diagonal to 1 because the similaritiy of a row with itself is obviously 1 but that is not explicitly calculated by pdist.


Your code loops over all rows, and for each row loops over (about) half the rows, computing the dot product for each unique combination of rows:

n_row = size(S1,1);
norm_r = sqrt(sum(abs(S1).^2,2)); % same as norm(S1,2,'rows')
S2 = zeros(n_row,n_row);
for i = 1:n_row
  for j = i:n_row
    S2(i,j) = dot(S1(i,:), S1(j,:)) / (norm_r(i) * norm_r(j));
    S2(j,i) = S2(i,j);
  end
end

(I've taken the liberty to complete your code so it actually runs. Note the initialization of S2 before the loop, this saves a lot of time!)

If you note that the dot product is a matrix product of a row vector with a column vector, you can see that the above, without the normalization step, is identical to

S2 = S1 * S1.';

This runs much faster than the explicit loop, even if it is (maybe?) not able to use the symmetry. The normalization is simply dividing each row by norm_r and each column by norm_r. Here I multiply the two vectors to produce a square matrix to normalize with:

S2 = (S1 * S1.') ./ (norm_r * norm_r.');