1 vs 1 vote: calculate ratings (Flickchart.com)

This might not be exactly what flickchart is doing, but you could use a variant of the ELO algorithm used in chess (and other sports), since these are essentially fights/games that they win/lose.

Basically, all movies start off with 0 wins/losses and every time they get a win they get a certain amount of points. You usually have an average around 20 (but any number will do) and winning against a movie with the same rating as yourself will give exactly that 20. Winning against a bad movie will maybe give around 10 points, while winning against a better movie might give you 30 points. The other way around, losing to a good movie you only lose 10 points, but if you lose to a bad movie, you lose 30 points.

The specifics of the algorithm is in the wikipedia link.


As for flickchart, I've been playing around with it a little bit, and I think the rating system is pretty unsophisticated. In pseudo-code, my guess is that it looks something like this:

if rank(loser) == null and rank(winner) == null
    insert loser at position estimated from global rank
    insert winner at position estimated from global rank
else if rank(winner) == null or rank(winner) < rank(loser)
    then advance winner to loser's position and demote loser and all following by 1

Why do I think this? First, I'm completely convinced that their Bayesian priors are not based on a careful mining of my previous choices. They seem to have no way to guess that because I like Return of the Jedi that I like The Empire Strikes Back. In fact, they can't figure out that because I've seen Home Alone 2 that I may have seen Home Alone 1. After hundreds of ratings, the choice hasn't come up.

Second of all, if you look at the above code you might find a little bug, which you will definitely notice on the site. You may notice that sometimes you will make a choice and the winner will slide by one. This seems to only happen when the loser wasn't previously added. My guess is that what is happening is that the loser is being added higher than the winner.

Other than that, you will notice that rankings do not change at all unless a lower ranked movie beats a higher ranked movie directly. I don't think any real scores are being kept: the site seems to be entirely memoryless except for the ordinal rank of each movie and your most recent rating.


How are the ratings calculated? How do you decide which film is on the first place in the ranking? You have to consider how often an items wins and how good are the beaten items.

What you want is a weighted rating, also called a Bayesian estimate.

I think IMDB's Top 250 movies is a better starting point to make a ranking website. Some movies have 300,000+ votes while others others have fewer than 50,000. IMDB uses a Bayesian estimate to rank movies against one another without unfairly weighting popular movies. The algorithm is given at the bottom of the page:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C where:

  • R = average for the movie (mean) = (Rating)
  • v = number of votes for the movie = (votes)
  • m = minimum votes required to be listed in the Top 250 (currently 3000)
  • C = the mean vote across the whole report (currently 6.9)

for the Top 250, only votes from regular voters are considered.

I don't know how IMDB chose 3000 as their minimum vote. They could have chosen 1000 or 10000, and the list would have been more or less the same. Maybe they're using "average number of votes after 6 weeks in the box office" or maybe they're using trial and error.

In any case, it doesn't really matter. The formula above is pretty much the standard for normalizing votes on ranking websites, and I'm almost certain Flickrchart uses something similar in the background.

The formula works so well because it "pulls" ratings toward the mean, so ratings above the mean are slightly decreased, ratings below the mean are slightly increased. However, the strength of the pull is inversely proportional to the number of votes a movie has. So movies with few votes are pulled more aggressively toward the mean than movies with lots of votes. Here are two data points to demonstrate the property:

Rank  Movie            Votes            Avg Rating        Weighted Rating
----  -----            -----            ----------        ---------------
219   La Strada        15,000+          8.2               8.0
221   Pirates of the   210,000+         8.0               8.0
      Caribbean 2

Both movies' ratings are pulled down, but the pull on La Strada is more dramatic since it has fewer votes and therefore is not as representative as ratings for PotC.


For your specific case, you have two items in a "fight". You should probably design your table as follows:

Items
-----
ItemID (pk)
FightsWon (int)
FightsEngaged (int)

The average rating is FightsWon / FightsEngaged. The weighted rating is calculated using the formula above.

When a user chooses a winner in a fight, increase the winning item's FightsWon field by 1, increase both items FightsEngaged field by 1.

Hope this helps! - Juliet


I've been toying with the problem of ranking items by means of pair-wise comparison for some time myself, and wanted to take the time to describe the ideas I came up with so far.

For now I'm simply sorting by <fights won> / <total fights>, highest first. This works fine if you're the only one voting, or if there are a lot of people voting. Otherwise it can quickly become inaccurate.

One problem here is how to choose which two items should fight. One thing that does seem to work well (subjectively) is to let the item that has the least fights so far, fight against a random item. This leads to a relatively uniform number of fights for the items (-> accuracy), at the cost of possibly being boring for the voter(s). They will often be comparing the newest item against something else, which is kinda boring. To alleviate that, you can choose the n items with the lowest fight-count and chose one of those randomly as the first contender.

You mentioned that you want to make victories against strong opponents count more than against weak ones. As mentioned in other posts above, rating systems used for chess and the like (Elo, Glicko) may work. Personally I would love to use Microsoft's TrueSkill, as it seems to be the most accurate and also provides a good way to pick two items to pit against each other -- the ones with the highest draw-probability as calculated by TrueSkill. But alas, my math understanding is not good enough to really understand and implement the details of the system, and it may be subject to licensing fees anyway...

Collective Choice: Competitive Ranking Systems has a nice overview of a few different rating systems if you need more information/inspiration.

Other than rating systems, you could also try various simple ladder systems. One example:

  1. Randomize the list of items, so they are ranked 1 to n
  2. Pick two items at random and let them fight
  3. If the winner is ranked above the loser: Do nothing
  4. If the loser is ranked above the winner:
    • If the loser is directly above the winner: Swap them
    • Else: Move the winner up the ladder x% toward the loser of the fight.
  5. Goto 2

This is relatively unstable in the beginning, but should improve over time. It never ceases to fluctuate though.

Hope I could help at least a little.