Suitability of MongoDB for hierarchial type queries

Note that this question was also asked on the google group. See http://groups.google.com/group/mongodb-user/browse_thread/thread/5cd5edd549813148 for that disucssion.

One option is to use an array key. You can store the hierarchy as an array of values (for example ['US','CA','Los Angeles']). Then you can query against records based on individual elements in that array key For example: First, store some documents with the array value representing the hierarchy

> db.hierarchical.save({ location: ['US','CA','LA'], name: 'foo'} ) 
> db.hierarchical.save({ location: ['US','CA','SF'], name: 'bar'} ) 
> db.hierarchical.save({ location: ['US','MA','BOS'], name: 'baz'} ) 

Make sure we have an index on the location field so we can perform fast queries against its values

> db.hierarchical.ensureIndex({'location':1}) 

Find all records in California

> db.hierarchical.find({location: 'CA'}) 
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" } 
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" } 

Find all records in Massachusetts

> db.hierarchical.find({location: 'MA'}) 
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" } 

Find all records in the US

> db.hierarchical.find({location: 'US'}) 
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" } 
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" } 
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" } 

Note that in this model, your values in the array would need to be unique. So for example, if you had 'springfield' in different states, then you would need to do some extra work to differentiate.

> db.hierarchical.save({location:['US','MA','Springfield'], name: 'one' }) 
> db.hierarchical.save({location:['US','IL','Springfield'], name: 'two' }) 
> db.hierarchical.find({location: 'Springfield'}) 
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" } 
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" } 

You can overcome this by using the $all operator and specifying more levels of the hierarchy. For example:

> db.hierarchical.find({location: { $all : ['US','MA','Springfield']} }) 
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" } 
> db.hierarchical.find({location: { $all : ['US','IL','Springfield']} }) 
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" } 

I realize this question was asked nearly a year ago, but since then MongoDB has an officially supported solution for this problem, and I just used their solution. Refer to their documentation here: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/

The concept relating closest to your question is named "partial path."

While it may feel a bit heavy to embed ancestor data; this approach is the most suitable way to solve your problem in MongoDB. The only pitfall to this, that I've experienced so far, is that if you're storing all of this in a single document you can hit the, as of this time, 16MB document size limit when working with enough data (although, I can only see this happening if you're using this structure to track user referrals [which could reach millions] rather than US cities [which is upwards of 26,000 according to the latest US Census]).


References:

http://www.mongodb.org/display/DOCS/Schema+Design

http://www.census.gov/geo/www/gazetteer/places2k.html


Modifications:

Replaced link: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB