Why is MapReduce in CouchDB called "incremental"?

This page that you linked explained it.

The view (which is the whole point of map reduce in CouchDB) can be updated by re-indexing only the documents that have changed since the last index update. That's the incremental part.

This can be achieved by requiring the reduce function to be referentially transparent, which means that it always returns the same output for a given input.

The reduce function also must be commutative and associative for the array value input, which means that if you run the reducer on the output of that same reducer, you will receive the same result. In that wiki page it is expressed like:

f(Key, Values) == f(Key, [ f(Key, Values) ] )

Rereduce is where you take the output from several reducer calls and run that through the reducer again. This sometimes is required because CouchDB sends stuff through the reducer in batches, so sometimes not all keys that need to be reduced will be sent through in once shot.


Just to add slightly to what user1087981 said, the reduce functionality is incremental because of the way the reduce process is performed by CouchDB.

CouchDB uses the B-Tree that it creates from the view function, and in essence it performs the reduce calculations in clumps of values. Here's a very simple mockup of a B-Tree from the O'Reilly Guide showing the leaf nodes for the example in the section you quoted from.

Reduce B-Tree

So, why is this incremental? Well, the final reduce is only performed at query time and all the reduce calculations are stored in the B-Tree view index. So, let's say that you add a new value to your DB that is another "fr" value. The calculations for the 1st, 2nd and 4th nodes above don't need to be redone. The new "fr" value is added, and the reduce function is re-calculated only for that 3rd leaf node.

Then at query time the final (rereduce=true) calculation is performed on the indexed values, and the final value returned. You can see that this incremental nature of reduce allows the time taken to recalculate relative only to the new values being added, not to the size of the existing data set.

Having no side-effects is another important part of this process. If, for example, your reduce functions relied on some other state being maintained as you walked through all the values, then that might work for the very first run, but then when a new value is added and an incremental reduce calculation is made it wouldn't have that same state available to it - and so it would fail to result in the correct result. This is why reduce functions need to be side-effect free, or as user1087981 puts it "referentially transparent"