Delta Lake rollback

Here is a brutal solution. It is not ideal, but given that overwriting a large data set with partitions could be expensive, this easy solution could be helpful.

If you are not very sensitive to updates after the desired rollback time, simply remove all version files in _delta_log that are later than the rollback time. Unreferenced files could be released later using vacuum.

Another solution that preserves the full history is to 1) deltaTable.delete 2) Copy all logs up to the rollback sequentially (with increasing version number) to the end of the delete log file. This mimics the creation of the delta lake up to the rollback date. But it is surely not pretty.


As of Delta Lake 0.7.0, you can rollback to an earlier version of your Delta Lake table using the RESTORE command. This is a much simpler way to use time travel to roll back your tables.

Scala:

import io.delta.tables._

val deltaTable = DeltaTable.forPath(spark, "/path/to/delta-table")

deltaTable.restoreToVersion(0)

Python:

from delta.tables import *

deltaTable = DeltaTable.forPath(spark, "/path/to/delta-table")

deltaTable.restoreToVersion(0)

SQL:

RESTORE TABLE delta.`/path/to/delta-table` TO VERSION AS OF 0

You can also use the restoreToTimestamp command if you'd prefer to do things that way instead. Read the documentation for more details.