Re: [DISCUSS] Leverage serialized DAG in airflow run local process to avoid dag parsing

2021-12-17 Thread Ping Zhang
Hi Ash, Thanks for the inputs about the fork approach. I have checked the code. The fork only applies when there is no run_as_user. I think the run_as_user is an important feature. I will create an AIP with more details. Best wishes Ping Zhang On Fri, Dec 17, 2021 at 9:59 AM Jarek Potiuk wro

Re: [LAZY CONSENSUS] Smart Sensors - Deprecation and removal

2021-12-17 Thread Ping Zhang
Hi Ash, Nice, thanks for the explanation. If it does not need to go back to a worker, it will be perfect. Thanks, Ping Best wishes Ping Zhang On Fri, Dec 17, 2021 at 2:16 AM Ash Berlin-Taylor wrote: > > A few lightweight long running smart sensors can help to reduce the > burden of the sch

Re: [DISCUSS] Docker runtime isolation for airflow tasks

2021-12-17 Thread Ping Zhang
Hi Jarek, Thanks for the inputs. Yep, docker runtime is an add-on feature that is controlled by a feature flag, and definitely *not* the default way to run tasks. I agree that this should be the next AIP. It actually does not affect how the AIP-43 proposal is designed and written. Best wishes P

Re: [DISCUSS] Docker runtime isolation for airflow tasks

2021-12-17 Thread Ping Zhang
Hi Alexander, Thanks for the inputs. Docker runtime is an add-on feature that is controlled by a feature flag. It does not force users to use docker to run tasks. Best wishes Ping Zhang On Fri, Dec 17, 2021 at 2:53 AM Alexander Shorin wrote: > How should your idea work on systems without do

Re: [DISCUSS] Docker runtime isolation for airflow tasks

2021-12-17 Thread Ping Zhang
Hi Ash, Thanks for the inputs. I should have specially called out that the docker runtime is an add-on feature that is controlled by a feature flag. Users/infra team can choose to enable it or not. When not enabled, it stays with the current behavior. This docker runtime feature has helped a lot

Re: [DISCUSS] Docker runtime isolation for airflow tasks

2021-12-17 Thread Jarek Potiuk
Yeah. I think for sure "Docker" as a "common execution environment" is convenient in certain situations. But for sure it should not be the default as mentioned before (as much as I love containers I also know - from the surveys we run for one but also from interacting with many users of Airflow - f

Re: [DISCUSS] Leverage serialized DAG in airflow run local process to avoid dag parsing

2021-12-17 Thread Jarek Potiuk
Yeah. I would also love to see some details in the meeting I proposed :). I am particularly interested about the current limitation of the solution in "general" case. J, On Fri, Dec 17, 2021 at 11:16 AM Ash Berlin-Taylor wrote: > > On Thu, Dec 16 2021 at 16:19:45 -0800, Ping Zhang wrote: > > To

Re: [DISCUSS] Move expensive dag run creation back to the DagFileProcessorManager loop

2021-12-17 Thread Jarek Potiuk
Yep. I second Ash. There were enormous changes under the hood in Airflow 2 especially when it comes to the performance. A lot of assumptions and problems from 1.10 do not hold any more on Airflow 2 when it comes to performance characteristics, so you might want to run your DAGs through Airflow 2 to

Re: [DISCUSS] AIP-1 and Airflow multi-tenancy

2021-12-17 Thread Jarek Potiuk
Hey everyone, Great to see others chiming-in :). It's really great that we have various stakeholders and users taking part in the discussion - and I hope what we come up with will be something that will be driven and supported by the whole community. Also John from Amazon promised to bring his tea

Re: [DISCUSS] Docker runtime isolation for airflow tasks

2021-12-17 Thread Alexander Shorin
How should your idea work on systems without docker? Like FreeBSD? And why you made such leaky tasks which couldn't be isolated with common tools like system packages, venv, etc. -- ,,,^..^,,, On Fri, Dec 17, 2021 at 2:53 AM Ping Zhang wrote: > Hi Airflow Community, > > This is Ping Zhang from

Re: [DISCUSS] Docker runtime isolation for airflow tasks

2021-12-17 Thread Ash Berlin-Taylor
Hi Ping, (The dev list doesn't allow attachments, so we can't see any of the images you've posted, so some of my questions might have been addressed by those images.) It seems that a lot of the goals here are overlapping with the AIP-1 and proposed separation of dag processor from scheduler

Re: [DISCUSS] Leverage serialized DAG in airflow run local process to avoid dag parsing

2021-12-17 Thread Ash Berlin-Taylor
On Thu, Dec 16 2021 at 16:19:45 -0800, Ping Zhang wrote: To run airflow tasks, airflow needs to parse dag file twice, once in airflow run local process, once in airflow run raw This isn't true in most cases anymore thanks to a change from spawning a new process (os.exec(["airflow",...]) to fo

Re: [DISCUSS] Move expensive dag run creation back to the DagFileProcessorManager loop

2021-12-17 Thread Ash Berlin-Taylor
We have massively re-worked (and benchmarked) verify_integrity as part of the HA work (including using a dummy sample of your large DAG structure provided by Kevin) since the 1.10.4 version, and it is no longer the bottleneck it once was. From memory this was mostly fixed around 1.10.12 by impr

Re: [LAZY CONSENSUS] Smart Sensors - Deprecation and removal

2021-12-17 Thread Ash Berlin-Taylor
> A few lightweight long running smart sensors can help to reduce the burden of the scheduler and workers. That is exactly what deferrable tasks are. When a task is deferred it gets picked up by a triggerer process, and it runs an in an async IO loop until the task is ready to continue. We c