Airflow - creating dynamic Tasks from XCOM

I wouldn't do what you're trying to achieve mainly because:

  1. XCOM value is a state generated in runtime
  2. DAG structure is something determined in parse time

Even if you use something like the following to get an access to XCOM values generated by some upstream task:

from airflow.models import TaskInstance
from airflow.utils.db import provide_session

dag = DAG(...)

@provide_session
def get_files_list(session):
    execution_date = dag.previous_schedule(datetime.now())

    // Find previous task instance:
    ti = session.query(TaskInstance).filter(
        TaskInstance.dag_id == dag.dag_id,
        TaskInstance.execution_date == execution_date,
        TaskInstance.task_id == upstream_task_id).first()
    if ti:
        files_list = ti.xcom_pull()
        if files_list:
            return files_list
    // Return default state:
    return {...}


files_list = get_files_list()
// Generate tasks based on upstream task state:
task = PythonOperator(
    ...
    xcom_push=True,
    dag=dag)

But this would behave very strangely, because DAG parsing and task execution are not synchronised in a way you wish.

If the main reason you want to do this is parallelising files processing, I'd have some static number of processing tasks (determined by the required parallelism) that read files list from upstream task's XCOM value and operate on a relevant portion of that list.

Another option is parallelising files processing using some framework for distributed computations like Apache Spark.

Tags:

Python

Airflow