How can graph databases scale horizontally, if at all?

Neo4j supports sharding and is trying to tackle with sharding problems. Please take a look at http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx


GoldenOrb was a concept that aimed to create a horizontally scalable Graph Database. It was released as open source, but the project appears to be dead now (link to GitHub is offline). It was based on Hadoop.

Even though, such model cannot be considered yet a fully capable graph database, given the amount of information that needs to be shared among nodes is too large and complex for certain use cases of graph databases. Evolution of computing, tiered caching layered architectures, could allow it to be considered fully scalable and a de-fact Graph database.

So overall, the answer to this date is "no", not fully.

The original web site hosting the project is this: http://goldenorbos.org


Replication can be useful for any kind of database - it's just creating multiple copies of data so you can serve more queries than a single server can handle.

Sharding is a little more complex, but actually not too different from key/value or document stores, because internally edges have to be represented as simple lists.

While it is true that finding independent subgraphs is impossible in most cases, it isn't actually necessary. As long as the node processing the query is able to get data from other nodes, having the data available locally is just a performance optimisation.

Once you have that set up, you have a lot of options for optimising performance based on the type of graph you are working with - for example in a social graph you might use location to select the node for a user because you know that most connections are local.

I'm not aware of any existing graph databases that have sharding built in, probably because the problem is much harder to solve for the general case and the small size of edge data means you need a really large graph to exceed the capacity of a single server.