MongoDB, MapReduce and sorting

As noted in discussion on the original question:

  • Map/Reduce with inline output currently cannot use an explicit sort key (see SERVER-3973). Possible workarounds include relying on the emitted key order (see @crodas's answer); outputting to a collection and querying that collection with sort order; or sorting the results in your application using something like usort().

  • OP's preference is for inline results rather than creating/deleting temporary collections.

  • The Aggregation Framework in MongoDB 2.2 (currently a production release candidate) would provide a suitable solution.

Here's an example of a similar query to the original Map/Reduce, but instead using the Aggregation Framework:

db.products.aggregate(
  { $match: { category_id: 20 }},
  { $group : {
     _id : "$product_id",
     'popularity' : { $sum : "$popularity" },
  }},
  { $sort: { 'popularity': -1 }}
)

.. and sample output:

{
    "result" : [
        {
            "_id" : 50,
            "popularity" : 139
        },
        {
            "_id" : 150,
            "popularity" : 99
        },
        {
            "_id" : 123,
            "popularity" : 55
        }
    ],
    "ok" : 1
}

First of all, Mongo map/reduce are not designed to be used in as a query tool (as it is in CouchDB), it is design for you to run background tasks. I use it at work to analyze traffic data.

What you are doing wrong however is that you're applying the sort() to your input, but it is useless because when the map() stage is done the intermediate documents are sorted by each keys. Because your key is a document, it is being sort by product_id, popularity.

This is how I generated my dataset

function generate_dummy_data() {
    for (i=2; i < 1000000; i++) { 
        db.foobar.save({
          _id: i, 
         category_id: parseInt(Math.random() * 30), 
         popularity:    parseInt(Math.random() * 50)
        }) 
    }
}

And this my map/reduce task:

var data = db.runCommand({
  'mapreduce': 'foobar',
  'map': function() {
    emit({
      sorting: this.popularity * -1,
      product_id: this._id,
      popularity: this.popularity,
    }, 1);
  },
  'reduce': function(key, values) {
    var sum = 0;
    values.forEach(function(v) {
      sum += v;
    });

    return sum;  
  },
  'query': {category_id: 20},
  'out': {inline: 1},
});

And this is the end result (very long to paste it here):

http://cesarodas.com/results.txt

This works because now we're sorting by sorting, product_id, popularity. You can play with the sorting how ever you like just remember that the final sorting is by key regardless of you how your input is sorted.

Anyway as I said before you should avoid doing queries with Map/Reduce it was designed for background processing. If I were you I would design my data in such a way I could access it with simple queries, there is always a trade-off in this case complex insert/updates to have simple queries (that's how I see MongoDB).