Airflow S3KeySensor - How to make it continue running

Within Airflow, there isn't a concept that maps to an always running DAG. You could have a DAG run very frequently like every 1 to 5 minutes if that suits your use case.

The main thing here is that the S3KeySensor checks until it detects that the first file exists in the key's wildcard path (or timeout), then it runs. But when a second, or third, or fourth file lands, the S3 sensor will have already completed running for that DAG run. It won't get scheduled to run again until the next DAG run. (The looping idea you described is roughly equivalent to what the scheduler does when it creates DAG runs except not forever.)

An external trigger definitely sounds like the best approach for your use case, whether that trigger comes via the Airflow CLI's trigger_dag command ($ airflow trigger_dag ...):

https://github.com/apache/incubator-airflow/blob/972086aeba4616843005b25210ba3b2596963d57/airflow/bin/cli.py#L206-L222

Or via the REST API:

https://github.com/apache/incubator-airflow/blob/5de22d7fa0d8bc6b9267ea13579b5ac5f62c8bb5/airflow/www/api/experimental/endpoints.py#L41-L89

Both turn around and call the trigger_dag function in the common (experimental) API:

https://github.com/apache/incubator-airflow/blob/089c996fbd9ecb0014dbefedff232e8699ce6283/airflow/api/common/experimental/trigger_dag.py#L28-L67

You could, for instance, setup an AWS Lambda function, called when a file lands on S3, that runs the trigger DAG call.


Another way is to use the S3 trigger an aws lambda which will invoke the DAG using api

s3 event -> aws lambda -> Airflow api

Setup S3 notification to trigger lambda

https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

Airflow API

https://airflow.apache.org/docs/apache-airflow/stable/rest-api-ref.html