Change management for graph databases?

Objectivity/DB is an object-oriented/graph database that has a feature call "Schema Evolution". This feature allows you to create your schema, load data, change the schema, and load more data. You can change the schema as many times as you'd like. We've had customers that have deploy operational systems and have changed their schema hundreds of times without having to reload data.

The Schema Evolution feature uses the concept of schema "shapes" where each shape is stored in the schema catalog and each object has a shape id. When an object is read from disk, the shape id is used to lookup the schema shape from the catalog. Then, if the catalog shape is not the "latest" shape for that schema type, the actual object data is "evolved" on the fly to match the newest shape for that object type. This allows operational system to not have to reload petabyte-scale databases just because someone wants an extra attribute.

There are many types of schema changes that are allowed, adding, removing, re-typing attributes, but there are a few schema changes that are not allowed because they would be functionally destructive to the data and/or schema.

Disclaimer: I am employed by Objectivity, Inc.


Pramod Sadalage and Martin Fowler's influential article from 2003 on Evolutionary Database Design had a big impact on how I approached managing schema changes in a database. I went on to use DbDeploy and DbDeploy.net in Java and .NET ecosystems, and now use ActiveRecord migrations. If you find liquibase interesting, I recommend taking a look at these tools.

The Neo4j.rb documentation discusses these kinds of migrations against Neo4j.

I personally haven't used a tool to manage migrations in Neo4j, but I've written migration scripts that have done things like rename properties, change edge labels, or create indexes. As an example use case, here's a snippet from a Gremlin Groovy script I used to remap some foreign keys stored in a Neo4j graph and update an index:

try {
  projects.each { node ->
    old_id = node.ref_id
    new_id = old_to_new_ids[old_id]
    index.remove('project', old_id, node)
    node.ref_id = new_id
    index.put('project', new_id, node)
  }
} catch (Throwable e) {
  println(e)
} finally {
  g.shutdown()
}

As of Neo4j version 1.8, there is a PropertyContainer that can be used for graph metadata. It would be simple to use this container to update a 'schema_version' property. The code would look something like:

EmbeddedGraphDatabase db = new EmbeddedGraphDatabase(dbFilename);        
Transaction tx = db.beginTx();
PropertyContainer properties = db.getNodeManager().getGraphProperties();
properties.setProperty("schema_version", 3);
tx.success();
tx.finish();

Personally, I would be more interested in something based on TinkerPop APIs. I think this API is supported by multiple different databases, that's what it is designed for. I'd prefer to be able to define my vertex labels, edge labels, properties, indexes etc - not trying to align with a (great) technology that is designed for the relational databases.


Liquigraph exists now and although still quite new, the author is very receptive to feedback and is actively working on the project.