What is the difference between min_file_process_interval and dag_dir_list_interval in Apache Airflow 1.9.0?

min_file_process_interval:

In cases where there are only a small number of DAG definition files, the loop could potentially process the DAG definition files many times a minute. To control the rate of DAG file processing, the min_file_process_interval can be set to a higher value. This parameter ensures that a DAG definition file is not processed more often than once every min_file_process_interval seconds.

dag_dir_list_interval:

Since the scheduler can run indefinitely, it's necessary to periodically refresh the list of files in the DAG definition directory. The refresh interval is controlled with the dag_dir_list_interval configuration parameter.

Source: A Google search on both terms lead to this first result https://cwiki.apache.org/confluence/display/AIRFLOW/Scheduler+Basics


I manually did some experiment and found the below, hope this clarifies.

min_file_process_interval: for example, lets this is set to 10 seconds. This is the amount of time it takes to process dag files, which also means that, between completion of a task in any dag and to trigger the dependent task, there can be a maximum of 10 second delay as airflow checks for triggering dependent tasks every 10 seconds if the upstream jobs are completed.

If this value is higher, you tasks in dag will take more time to trigger, but airflow will consume less CPU.

Airbnb Airflow using all system resources

dag_dir_list_interval: any new python dag files that you put in dags folder, it will take this much time to be processed by airflow and show up in UI.

Tags:

Airflow