How to make GCE instance stop when its deployed container finishes?

Having grappled with the problem for some time, here's a full solution that works pretty well.

This solution doesn't use the "start machine with a container image" option. Instead it uses a startup script, which is more flexible. You still use a Container-Optimized OS instance.

  1. Create a startup script:
#!/usr/bin/env bash

# get image name and container parameters from the metadata
IMAGE_NAME=$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/image_name -H "Metadata-Flavor: Google")

CONTAINER_PARAM=$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/container_param -H "Metadata-Flavor: Google")

# This is needed if you are using a private images in GCP Container Registry
# (possibly also for the gcp log driver?)
sudo HOME=/home/root /usr/bin/docker-credential-gcr configure-docker

# Run! The logs will go to stack driver 
sudo HOME=/home/root  docker run --log-driver=gcplogs ${IMAGE_NAME} ${CONTAINER_PARAM}

# Get the zone
zoneMetadata=$(curl "http://metadata.google.internal/computeMetadata/v1/instance/zone" -H "Metadata-Flavor:Google")
# Split on / and get the 4th element to get the actual zone name
IFS=$'/'
zoneMetadataSplit=($zoneMetadata)
ZONE="${zoneMetadataSplit[3]}"

# Run compute delete on the current instance. Need to run in a container 
# because COS machines don't come with gcloud installed 
docker run --entrypoint "gcloud" google/cloud-sdk:alpine compute instances delete ${HOSTNAME}  --delete-disks=all --zone=${ZONE}
  1. Put the script somewhere public. For example put it on Cloud Storage and create a public URL. You can't use a gs:// URI for a COS startup script.

  2. Start an instance using a startup-script-url, and passing the image name and parameters, e.g.:

gcloud compute --project=PROJECT_NAME instances create INSTANCE_NAME  \
--zone=ZONE --machine-type=TYPE \
--metadata=image_name=IMAGE_NAME,\
container_param="PARAM1 PARAM2 PARAM3",\
startup-script-url=PUBLIC_SCRIPT_URL \
--maintenance-policy=MIGRATE --service-account=SERVICE_ACCUNT \
--scopes=https://www.googleapis.com/auth/cloud-platform --image-family=cos-stable \
--image-project=cos-cloud --boot-disk-size=10GB --boot-disk-device-name=DISK_NAME

(You probably want to limit the scopes, the example uses full access for simplicity)


When you create the VM, you'll need to give it write access to compute so you can delete the instance from within. You should also set container environment variables like gce_zone and gce_project_id at this time. You'll need them to delete the instance.

gcloud beta compute instances create-with-container {NAME} \
    --container-env=gce_zone={ZONE},gce_project_id={PROJECT_ID} \
    --service-account={SERVICE_ACCOUNT} \
    --scopes=https://www.googleapis.com/auth/compute,...
    ...

Then within the container, whenever YOU determine your task is finished:

  1. request an api token (im using curl for simplicity and DEFAULT gce service account)
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"

This will respond with json that looks like

{
  "access_token": "foobarbaz...",
  "expires_in": 1234,
  "token_type": "Bearer"
}
  1. Take that access token and hit the instances.delete api endpoint (notice the environment variables)
curl -XDELETE -H 'Authorization: Bearer {TOKEN}' https://www.googleapis.com/compute/v1/projects/$gce_project_id/zones/$gce_zone/instances/$HOSTNAME