I am not sure how you want to multiply the tasks - I imagine you want to
have something like task1_1, task1_2, task1_3 etc. Or maybe you think about
back-filling and running same tasks for different runs?

In both cases you can use pools:
https://airflow.apache.org/docs/stable/concepts.html#pools to limit how
many tasks can be run in parallel for given pool.

There is another way to limit parallelism of tasks - applicable for example
in cases where you have different kinds of machines with different
capabilities (for example with/without GPU). You can have some affinities
defined between tasks and actual machines that are executing them - in
Celery executor you can define queues in your celery configuration and you
can assign your task to one of the queues. Then you can have a number of
workers/slots defined for all machines in the queue and as effect you can
also limit parallelism of the tasks this way:
https://airflow.apache.org/docs/stable/concepts.html#queues

J.

On Thu, Dec 19, 2019 at 1:29 AM Reed Villanueva <[email protected]>
wrote:

> Is there a way to control the parallelism for particular tasks in an
> airflow dag? Eg. say I have a dag definition like...
>
> for dataset in list_of_datasets:
>     # some simple operation
>     task_1 = BashOperator(task_id=f'task_1_{dataset.name}', ...)
>     # load intensive operation
>     task_2 = BashOperator()
>     # another simple operation
>     task_3 = BashOperator()
>
>     task_1 >> task_2 >> task_3
>
> Is there a way to have something where task_1 can have, say, 5 of its kind
> running in a dag instance, while only 2 instances of task_2 may be running
> in a dag instance (also implying that if there are 2 instances of task_2
> already running, then only 3 instances of task_1 can run)? Any other common
> ways to work around this kind of requirement (I imagine this must come up
> often for pipelines)?
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to