Is there a way to control the parallelism for particular tasks in an
airflow dag? Eg. say I have a dag definition like...

for dataset in list_of_datasets:
    # some simple operation
    task_1 = BashOperator(task_id=f'task_1_{dataset.name}', ...)
    # load intensive operation
    task_2 = BashOperator()
    # another simple operation
    task_3 = BashOperator()

    task_1 >> task_2 >> task_3

Is there a way to have something where task_1 can have, say, 5 of its kind
running in a dag instance, while only 2 instances of task_2 may be running
in a dag instance (also implying that if there are 2 instances of task_2
already running, then only 3 instances of task_1 can run)? Any other common
ways to work around this kind of requirement (I imagine this must come up
often for pipelines)?

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Reply via email to