Versioning Database Persisted Objects, How would you?

Think carefully about the requirements for revisions. Once your code-base has pervasive history tracking built into the operational system it will get very complex. Insurance underwriting systems are particularly bad for this, with schemas often running in excess of 1000 tables. Queries also tend to be quite complex and this can lead to performance issues.

If the historical state is really only required for reporting, consider implementing a 'current state' transactional system with a data warehouse structure hanging off the back for tracking history. Slowly Changing Dimensions are a much simpler structure for tracking historical state than trying to embed an ad-hoc history tracking mechanism directly into your operational system.

Also, Changed Data Capture is simpler for a 'current state' system with changes being done to the records in place - the primary keys of the records don't change so you don't have to match records holding different versions of the same entity together. An effective CDC mechanism will make an incremental warehouse load process fairly lightweight and possible to run quite frequently. If you don't need up-to-the minute tracking of historical state (almost, but not quite, and oxymoron) this can be an effective solution with a much simpler code base than a full history tracking mechanism built directly into the application.


An alternative to strict versioning is to split the data into 2 tables: current and history.

The current table has all the live data and has the benefits of all the performance that you build in. Any changes first write the current data into the associated "history" table along with a date marker which says when it changed.


A technique I've used for this in that past has been to have a concept of "generations" in the database, each change increments the current generation number for the database - if you use subversion, think revisions. Each record has 2 generation numbers associated with it (2 extra columns on the tables) - the generation that the record starts being valid for, and the generation the it stops being valid for. If the data is currently valid, the second number would be NULL or some other generic marker.

So to insert into the database:

  1. increment the generation number
  2. insert the data
  3. tag the lifetime of that data with valid from, and a valid to of NULL

If you're updating some data:

  1. mark all data that's about to be modified as valid to the current generation number
  2. increment the generation number
  3. insert the new data with the current generation number

deleting is just a matter of marking the data as terminating at the current generation.

To get a particular version of the data, find what generation you're after and look for data valid between those generation versions.

Example:

Create a person.

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |NULL|

Update tel no.

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |1   |
|Fred|1 april|555-43534|2   |NULL|

Delete fred:

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |1   |
|Fred|1 april|555-43534|2   |2   |