How to speed up deployments on AWS Fargate?

Through further googling I found this Reddit thread. An AWS employee wrote:

With regard to time to provision and start a container it is definitely longer when using Fargate. We may reduce the length of the provisioning state in the future, but Fargate is doing much more under the hood than ECS on your own self managed hosts. When you self manage hosts they are already up and running, and may even already have your docker image downloaded and cached locally, so ECS is able to launch the container very quickly. That's not the case with Fargate.

So shrinking the image should help a little. But in general I guess I'll have to live with it and hope for optimizations on AWS' side.


Here's the breakdown of tasks and possible improvements that I've found while researching options to improve my deployment times with ECS Fargate:

Fargate Deployment Overview

Here's a breakdown of what's going on behind the scenes that attribute to the deployment duration:

  • Provision the Fargate worker instance
  • Provision/attach the ENI
  • Download the Docker image
    • Here you have opportunities for improvement:
      • reduce the size of your Docker image
      • Networking throughput is based on the CPU allocations to the Fargate Task - if you allocate more CPU then you get more networking and the image will download faster
  • Application Startup time
    • Becomes a factor if your application requires a health check grace period, again effected by CPU allocation

If your task is associated with a load balancer the deployment will also need to pass health checks, and you'll need to account for:

  • Load balancer deregistration delay
  • Pass health checks: (Health Check Interval * Threshold)

How to deploy Fargate Task updates faster

  • Over allocate the CPU
  • Reduce the deregistration delay
  • Set the health check threshold to 2 and interval to 5 seconds
    • don't forget to account for a health check grace period if your app needs it

My Results

During my testing, I was able to deploy my application that typically takes about 8 minutes w/1024 CPU (1vCPU) in under 4 minutes w/4096 CPU (4vCPU)

Disclaimer

Likely your tasks typically require considerably less CPU and you don't want to be always paying for over-allocating the CPU. So, run your deployment with overallocated resources and then run another deployment right after with the original CPU allocation.

Probably not a solution you want to use for every deployment, but could be a solution for hotfix deployments.

Additional Reading

Highly recommend reading Scaling containers on AWS in 2022


Two reasons they're slower, in my experience:

  1. awsvpc network mode attaches an ENI to the task. When it has to do this to a Lambda, if the Lambda is running in a VPC, it is known to dramatically increase the initial spin up time.

  2. Docker image size also affects startup time, since the image will usually need to be downloaded to whatever hidden host for a task to launch. I've done some benchmarking with a small 200MB container and a 2.5GB container. The former did start up quicker.

You can't do much about awsvpc, since Fargate requires it. Shrinking down that image would be your next biggest impact.