Re: [PROPOSAL][AIP-36 DAG Versioning]

2022-06-01 Thread Jarek Potiuk
I think Airflow Summit and some 2.3.0 teething had (un) successfully :) dragged most of the committers from the few AIPs, but I believe there will shortly be a real "reinvigorating" of some work there (speaking for myself though :)). On Fri, May 27, 2022 at 3:28 AM Max Payton wrote: > Hey, I was

Re: [PROPOSAL][AIP-36 DAG Versioning]

2022-05-26 Thread Max Payton
Hey, I was wondering if the resurrected AIP was ever published? This is something that we (Lyft) are very interested in, and would like to contribute to as well. *Max Payton* He/Him/His Software Engineer 202.441.7757 <+12024417757> [image: Lyft] On Tue, Feb 15, 2022 at 4:23

Re: [PROPOSAL][AIP-36 DAG Versioning]

2022-02-15 Thread Jarek Potiuk
Woohoo! Looking forward to it! On Tue, Feb 15, 2022 at 1:11 PM Kaxil Naik wrote: > > Hey folks, > > Just reviving this old thread to provide an update that we (Astronomer) will > be resurrecting AIP-36 DAG Versioning with a different scope in the coming > days that will be more consistent with

Re: [PROPOSAL][AIP-36 DAG Versioning]

2022-02-15 Thread Kaxil Naik
Hey folks, Just reviving this old thread to provide an update that we (Astronomer) will be resurrecting AIP-36 DAG Versioning with a different scope in the coming days that will be more consistent with what has been discussed in this thread. Regards, Kaxil On Thu, Aug 13, 2020 at 9:32 PM Jarek P

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-09-02 Thread Yulei Li
Hi Team, My name is Yulei and i'm from the workflow team @ Pinterest. Want to join the discussion regarding the [AIP-36 DAG Versioning] as at Pinterest we implemented something similar to what has been proposed. So we want to hear and gather feedbacks from the community and discuss the possibil

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-08-13 Thread Jarek Potiuk
I fully agree with the "user" not having to know any of the "wheel' details. Similarly as they do not have to know python interpreter or the underlying libc library details. This all should be hidden from the users. I think the wheels API that we might have there, does not have to be user-facing.

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-08-11 Thread Ash Berlin-Taylor
Anything to doing with the process of building wheels should be a "power user" only feature, and should not be required for many users - many many users of airflow are not primarily Python developers, but data scientists, and needing them to understand anything about the python build toolchain i

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-08-10 Thread Tomasz Urbaszek
I like the idea of wheels as this is probably the "most pythonic" solution. And "DAG version" is not only defined by DAG code but also by all dependencies the DAG uses (custom functions, libraries etc) and it seems that wheels can address that. However, I second Ash - keeping wheels in db doesn't

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-08-08 Thread Ash Berlin-Taylor
Quick comment (as I'm still mostly on paternity leave): Storing wheels in the db sounds like a bad Idea to me, especially if we need to store deps in there too (and if we don't store deps, then they are incomplete) - they could get very large, and I've stored blobs of ~10mb in postgres before:

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-08-07 Thread Jacob Ferriero
I like the idea of wheels. > There were concerns about the size of the code to keep in the DB I think this may still be a valid concern and json serialization may not be comparable in size to compressed wheel if "additional packages" are many or large. If DAG Version = Wheel we could consider havi

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-08-02 Thread Jarek Potiuk
Few points from my sid (and proposal!): 1) Agree with Max - with a rather strong NO for pickles (however, indeed cloudpickle solves some of the problems). Pickles came up in our discussion in Polidea recently and the overall message was "no". I agree with Max here - if we can ship python code, tu

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-07-30 Thread Maxime Beauchemin
Having tried it early on, I'd advocate pretty strongly against pickles and would rather not get too deep into the why here. Short story is they can pull the entire memory space or much more than you want, and it's impossible to reason about where they end. For that reason and other reasons, they're

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-07-29 Thread Kaxil Naik
Thanks, both Max and Dan for your comments, please check my reply below: > Personally I vote for a DAG version to be pinned and consistent for the > duration of the DAG run. Some of the reasons why: > - it's easier to reason about, and therefore visualize and troubleshoot > - it prevents some ca

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-07-29 Thread Jacob Ward
I came here to say what Max has said, only less eloquently. I do have one concern with locking the version for a single run. Currently it is possible for a user to create a dag which intentionally changes as a dag executes, i.e. dynamically creating a task for the dag during a run by modifying ext

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-07-28 Thread Dan Davydov
Strongly agree with Max's points, also I feel the right way to go about this is instead of Airflow schedulers/webservers/workers reading DAG Python files, they would instead read from serialized representations of the DAGs (e.g. json representation in the Airflow DB). Instead of DAG owners pushing

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-07-28 Thread Maxime Beauchemin
> "mixed version" Personally I vote for a DAG version to be pinned and consistent for the duration of the DAG run. Some of the reasons why: - it's easier to reason about, and therefore visualize and troubleshoot - it prevents some cases where dependencies are never met - it prevents the explosion

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-07-28 Thread Kaxil Naik
Thanks Max for your comments. *DAG Fingerprinting: *this can be tricky, especially in regards to dynamic > DAGs, where in some cases each parsing of the DAG can result in a different > fingerprint. I think DAG and tasks attributes are left out from the > proposal that should be considered as part

Re: [PROPOSAL][AIP-36 DAG Versioning]

2020-07-27 Thread Maxime Beauchemin
Some notes and ideas: *DAG Fingerprinting: *this can be tricky, especially in regards to dynamic DAGs, where in some cases each parsing of the DAG can result in a different fingerprint. I think DAG and tasks attributes are left out from the proposal that should be considered as part of the fingerp

[PROPOSAL][AIP-36 DAG Versioning]

2020-07-24 Thread Vikram Koka
Team, We just created 'AIP-36 DAG Versioning' on Confluence and would very much appreciate feedback and suggestions from the community. https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning The DAG Versioning concept has been discussed on multiple occasions in the past