Save entire process for continuation after reboot

The best/simplest solution is to change your program to save the state to a file an reuse that file to restore the process.

Based upon the wikipedia page about application snapshots there are multiple alternatives:

  1. There is also cryopid but it seems to be unmaintained.
  2. Linux checkpoint/restart seems to be a good choice but your kernel needs to have CONFIG_CHECKPOINT_RESTORE enabled.
  3. criu is probably the most up to-date project and probably your best shot but depends also on some specific Kernel options which your distribution probably hasn't set.

This is already too late but another more hands-on approach is to start your process in a dedicated VM and just suspend and restore the whole Virtual machine. Depending on your hypervisor you can also move the machine between different hosts.

For the future think about where you run your long-running processes, how to parallize them and how to handle problems, e.g. full disks, process gets killed etc.


A fairly "cheap" way to do this would be to do the processing in a VM (e.g., with VirtualBox). Before you shut down suspend the VM and save the state. After booting restore the VM & state.

This does have the disadvantage of requiring killing and restarting the job. But if it's actually going to be running for several months then a nine days difference becomes trivial (5% increase over 6 months).


Edit: I just realized that Ulrich already mentioned this in unnumbered item 4 on his list.

I would still encourage you to consider this as an option, especially since none of the alternatives seem like a robust solution. Each has a reason why it may not work.

I suppose the best thing to do would be to try one of those and if it doesn't work restart the job in a VM.


Take a peek at the tool CryoPID.

From the home page: "CryoPID allows you to capture the state of a running process in Linux and save it to a file. This file can then be used to resume the process later on, either after a reboot or even on another machine."

Tags:

Process

Reboot