Tricks to make an AWS spot instance "persistent"?

Update: times have changed

EC2 Spot Instance requests can now be configured to stop instead of terminate an outbid spot instance or for any other capacity-related event that causes the spot instance to be interrupted.

See Interruption Behavior in the EC2 Developer Guide. Certain classes of instances can also hibernate, with the appropriate agent installed.

Note that this new feature does not guarantee that instances will continue to run, but only that they will restart with their previous EBS volumes, private IP, Elastic IP, and instance ID all intact.

Previously answer follows:


Spot instances cannot be persistent, but spot requests can.

Persistent Spot Requests: When you specify a Spot bid request as "persistent", you ensure that it is automatically resubmitted after its instance is terminated—by you or by Amazon EC2—until you cancel the bid request. This enables you to automate launching Spot instances any time the Spot price is below your maximum price.

http://aws.amazon.com/ec2/spot-instances/#4

That keeps the machines running any time the price is within range, but as for the rest of it, consider what your spot instances are doing that has you thinking that persistence of the disks is the way to go. Think "cloud." Think "ephemeral." Spot instances are intended to be ephemeral machines that start up, fetch work, do work, commit work, and if they go away, the work is still out there waiting for the next instance to fetch it again, complete it, and commit it. You "can" use them with EBS and persist the volume, but if you do, those instances cannot be restarted (as you have noticed).

If your AMI uses the instance store, and stores everything that needs to be persistent externally (in S3, for example) then you don't need to hack around the AWS architecture and you can sit back and watch your machines fire up when the price is right, do their work, and shut down again when the prices go out of range. And, no bit rot, because every boot is a shiny clean system.

Or, your instance(s) could mount NFS shares exported by a machine that's always on.

Or this: https://serverfault.com/questions/448043/auto-attach-ebs-volume-to-a-new-spot-instance


(Thanks to Ethan Barron for some of the original ideas. This is a version with some corrections and clarifications.)

[1]. Create a new spot instance. Deactivate "Delete on Termination" for the root device. Make a note of the architecture (x86_64) and the kernel ID.

[2]. SSH into your new instance and create some file, something that should persist beyond the reboot. Do NOT terminate the instance yet.

[3]. Create a snapshot of the instance while the instance is still running (this can cause file system inconsistency on rare occasions so limit writes to the boot volume). Note the name of that snapshot.

[4]. Now exit the SSH connection and terminate the instance.

[5]. Create an AMI from the snapshot created in step 3 (AWS does not support creating AMIS from volumes; it has to be a snapshot. If you are using a leftover volume, there is an additional step: creating that snapshot).

[6]. Request a new spot instance based on the architecture from step 1, the kernel ID from step 1, and the AMI created in step 5.

This should work.


We ended up finding a solution, and here is what we had to do. I'm going to list this out step-by-step, to make recreating this easier for those who may be looking for a similar type of solution...

  1. Create a new spot request instance. Make sure to uncheck "Delete on Termination" for the root device, so that the volume stays behind in the next step. Make sure to note the architecture (we always use x86_64) and the kernel ID that your instance is using (very important!)
  2. Now, SSH into your new instance and make a file or something, so you can see the effect of persistence first-hand. After making some changes to the filesystem, go ahead and logout of the SSH connection and terminate the instance.
  3. Awesome. Now, go to your EC2 web console and find the new volume that was being used for the instance we just terminated. Right click the volume and select "Create Image". Follow the wizard, making certain to select the same architecture and kernel ID that we noted earlier.
  4. Now, start the spot request wizard using your new image. Follow the wizard, again making certain to uncheck "Delete on Termination". Additionally, and this is the easy step to miss, make sure to expand the collapsed section titled 'Advanced Options' and set the correct kernel ID again.

If you follow the above steps to the T, you will have a new instance at the same point that your old instance was at when it was terminated. Therefore, we have achieved some form of persistence.