How do recommendation systems work?

GroupLens Research at the University of Minnesota studies recommender systems and generously shares their research and datasets.

Their research expands a bit each year and now considers specifics like online communities, social collaborative filtering, and the UI challenges in presenting complex data.


This is such a commercially important application that Netflix introduced a $1 million prize for improving their recommendations by 10%.

After a couple of years people are getting close (I think they're up around 9% now) but it's hard for many, many reasons. Probably the biggest factor or the biggest initial improvement in the Netflix Prize was the use of a statistical technique called singular value decomposition.

I highly recommend you read If You Liked This, You’re Sure to Love That for an in-depth discussion of the Netflix Prize in particular and recommendation systems in general.

Basically though the principle of Amazon and so on is the same: they look for patterns. If someone bought the Star Wars Trilogy well there's a better than even chance they like Buffy the Vampire Slayer more than the average customer (purely made up example).


At it's most basic, most recommendation systems work by saying one of two things.

User-based recommendations:
If User A likes Items 1,2,3,4, and 5,
And User B likes Items 1,2,3, and 4
Then User B is quite likely to also like Item 5

Item-based recommendations:
If Users who purchase item 1 are also disproportionately likely to purchase item 2
And User A purchased item 1
Then User A will probably be interested in item 2

And here's a brain dump of algorithms you ought to know:
- Set similarity (Jaccard index & Tanimoto coefficient)
- n-Dimensional Euclidean distance
- k-means algorithm
- Support Vector Machines


The O'Reilly book "Programming Collective Intelligence" has a nice chapter showing how it works. Very readable.

The code examples are all written in Python, but that's not a big problem.