Airflow Worker Configuration

Some of the biggest pain points with Airflow come up around deployment and keeping DAG files and plugins in sync across your Airflow scheduler, Airflow webserver, and Celery worker nodes.

We've created an open source project called Astronomer Open that automates a Dockerized Airflow, Celery, and PostgreSQL with some other goodies baked in. The project was motivated by seeing so many people hit the same pain points creating a very similar setup.

For example, here's the Airflow Dockerfile: https://github.com/astronomer/astronomer/blob/master/docker/airflow/1.10.2/Dockerfile

And the docs: https://open.astronomer.io/

Full disclosure: This is a project I contribute to at work — we offer a paid enterprise edition as well that runs on Kubernetes (docs). That said, the Open Edition is totally free to use.


A little late on this, but it might still help someone, as from the existing answers it looks like there is no way to share DAGs other then "manual" deployment (via git/scp etc.), while there is a way.

Airflow supports pickling (-p parameter from the CLI or command: scheduler -p in your docker-compose file), which allows to deploy the DAGs on the server/master, and have them serialized and sent to the workers (so you don't have to deploy DAGs in multiple places and you avoid issues with out-of-sync DAGs).

Pickling is compatible with CeleryExecutor.

Pickling has some limitations that can bite you back, notably the actual code of classes and functions is not serialized (only the fully qualified name is), so there will be an error if you try to deserialize a DAG referring to code you don't have in the target environment. For more info on pickle you can have a look here: https://docs.python.org/3.3/library/pickle.html


Your configuration files look okay. As you suspected, all workers do indeed require a copy of the DAG folder. You can use something like git to keep them in sync and up to date.