Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-07-24 Thread Gabriel Silk
I'm really excited about this feature, and I'd love to be able to provide feedback on the proposed design. On Thu, Jul 18, 2019 at 10:21 AM Tao Feng wrote: > Thanks Ash. This will be huge! > > On Thu, Jul 18, 2019 at 4:00 AM Jarek Potiuk > wrote: > > > Cool! > > > > On Thu, Jul 18, 2019 at 11:4

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-07-18 Thread Tao Feng
Thanks Ash. This will be huge! On Thu, Jul 18, 2019 at 4:00 AM Jarek Potiuk wrote: > Cool! > > On Thu, Jul 18, 2019 at 11:46 AM Ash Berlin-Taylor wrote: > > > We didn't reach any conclusion on this yet but I agree, and this is the > > big task that we at Astronomer are going to work on next for

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-07-18 Thread Jarek Potiuk
Cool! On Thu, Jul 18, 2019 at 11:46 AM Ash Berlin-Taylor wrote: > We didn't reach any conclusion on this yet but I agree, and this is the > big task that we at Astronomer are going to work on next for Airflow. > > I've started chatting to a few of the other committers about this to get a > an id

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-07-18 Thread Ash Berlin-Taylor
We didn't reach any conclusion on this yet but I agree, and this is the big task that we at Astronomer are going to work on next for Airflow. I've started chatting to a few of the other committers about this to get a an idea of people's priorities, and have had a chat with Alex at Uber about the

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-07-17 Thread Tao Feng
Do we reach any consensus on this topic /AIP? I think persisting DAG is pretty important actually. -Tao On Tue, Mar 12, 2019 at 3:01 AM Kevin Yang wrote: > Hi Fokko, > > As a large cluster maintainer, I’m not a big fan of large DAG files > neither. But I’m not sure if I’ll consider this bad pra

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-12 Thread Kevin Yang
Hi Fokko, As a large cluster maintainer, I’m not a big fan of large DAG files neither. But I’m not sure if I’ll consider this bad practice. We have some large frameworks, e.g. experimentation and machine learning, that are complex by nature and generate large number of DAGs from their customer con

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-10 Thread Driesprong, Fokko
Thanks Kevin for opening the discussion. I think it is important to have a clear overview on how to approach the AIP. First of all, how many DAGs do we have that take 30s to parse? I consider this bad practice, and this would also result in an unworkable situation with the current setup of Airflow

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-09 Thread Maxime Beauchemin
I want to raise the question of the amount of normalization we want to use here as it seems the to be an area that needs more attention. The SIP suggest having DAG blobs, task blobs and edges (call it the fairly-normalized). I also like the idea of all-encompassing (call it very-denormalized) DAG

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-08 Thread Kevin Yang
Hi Julian, I'm definitely aligned with you guys on making the webserver independent of DAG parsing, just the end goal to me would be to build a complete story around serializing DAG--and move with the story in mind. I feel like you guys may already have a list of dynamic features we need to depreca

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-08 Thread Dan Davydov
> > Personally I don’t understand why people are pushing for a JSON-based DAG > representation It sounds like you agree that DAGs should be serialized (just in the DB instead of JSON), so will only address why JSON is better than MySQL (AKA serializing at the DAG level vs the task level) as far as

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-08 Thread Ash Berlin-Taylor
Comments inline. > On 8 Mar 2019, at 11:28, Kevin Yang wrote: > > Hi all, > When I was preparing some work related to this AIP I found something very > concerning. I noticed this JIRA ticket > is trying to remove the > dependency of dagbag

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-08 Thread Julian De Ruiter
8 To: "dev@airflow.apache.org" Subject: Re: [DISCUSS] AIP-12 Persist DAG into DB Hi all, When I was preparing some work related to this AIP I found something very concerning. I noticed this JIRA ticket<https://issues.apache.org/jira/browse/AIRFLOW-3562> is trying to remove the

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-08 Thread Kevin Yang
Ty Xiangdong, my bad there. Attached the file to this email and also uploaded it here and here . Cheers, Kevin Y On Fri, Mar 8, 2019 at 3:42 AM Deng Xiaodong wrote: > Hi Kevin, > > The image you attached is not displayed p

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-08 Thread Deng Xiaodong
Hi Kevin, The image you attached is not displayed properly. May you consider uploading it somewhere then provide a link instead? Thanks! XD On Fri, Mar 8, 2019 at 19:38 Kevin Yang wrote: > Hi all, > When I was preparing some work related to this AIP I found something very > concerning. I noti

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-03-08 Thread Kevin Yang
Hi all, When I was preparing some work related to this AIP I found something very concerning. I noticed this JIRA ticket is trying to remove the dependency of dagbag from webserver, which is awesome--we wanted badly but never got to start work on

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-28 Thread Peter van t Hof
Hi all, Just some comments one the point Bolke dit give in relation of my PR. At first, the main focus is: making the webserver stateless. > 1) Make the webserver stateless: needs the graph of the *current* dag This is the main goal but for this a lot more PR’s will be coming once my current

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Dan Davydov
My main concern is around the data model, I feel like the data model should take into account future serialization efforts. This is mainly because migrations are not cheap, and sometimes it's not easy to automate with an alembic migration. e.g. the AIP proposes storing DAG edges, whereas I think w

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Jarek Potiuk
Just my two cents. I agree with Fokko that the discussion is about way bigger topic than just serialising DAGs. But I think it might be a very good direction nevertheless as it potentially addresses several other AIPs (I believe at least three other AIPs can benefit from better container support f

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Kevin Yang
Excellent discussion. +1 for Max and Dan's points (DAG serialization+SDK+docker). One step back, about this PR adding serialization on DagRun graph, it is already a step beyond just showing the latest version and can easily work with the plan proposed right? Later on if we serialize the entire DA

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Maxime Beauchemin
Agreed, I wasn't trying to inflate the scope of the AIP, just raising related topics to see how it all fits together. Max On Wed, Feb 27, 2019 at 1:17 PM Bolke de Bruin wrote: > I agree with Fokko here. We have been discussing serialisation for a > veerrryyy long time and nothing has come of it

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Bolke de Bruin
I agree with Fokko here. We have been discussing serialisation for a veerrryyy long time and nothing has come of it :-). Probably because we are making it too big. As Fokko states, there are several goals and each of them probably warrants an AIP: 1) Make the webserver stateless: needs the graph

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Ash Berlin-Taylor
I agree with Fokko - I feel this AIP is a decent stepping stone, and doesn't significantly change the airflow execution model - my major concern about _requiring_ kubernetes is that it is barrier to entry that might put off smaller users: operating a kubernetes cluster is hard to impossible if y

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Dan Davydov
Very happy to hear we are in agreement :)! Since it's pretty clear we need SimpleDAG serialization, and we can see > through the requirements, people can pretty much get started on this. I agree, just want to make sure we think through the big picture to make sure we don't have to undo any work,

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Driesprong, Fokko
I feel we're going a bit off topic here. Although it is good to discuss the possibilities. >From my perspective the AIP tries to kill two birds with one stone: 1. Decoupling the web-server from the actual Python DAG files 2. Versioning the DAGs so we can have a historical view of the dags a

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Maxime Beauchemin
I fully agree on all your points Dan. First about PEX vs Docker, Docker is a clear choice here. It's a superset of what PEX can do (PEX is limited to python env) and a great standard that has awesome tooling around it, works natively in k8s, which is becoming the preferred executor. PEX is a bit o

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread James Meickle
Love this idea, Max - this exactly fits with how we'd want to use it. Almost like an Airflow "wire protocol". I think there'd be two specifications: one for how the running container communicates back a DAG (or other state) via a REST API, and another for a container metadata specification (such as

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Dan Davydov
> > * on the topic of serialization, let's be clear whether we're talking about > unidirectional serialization and *not* deserialization back to the object. > This works for making the web server stateless, but isn't a solution around > how DAG definition get shipped around on the cluster (which wo

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Maxime Beauchemin
Oh a few more things I wanted to bring up. While playing with pex (years ago!), I added a feature to open up a full-featured, DAG-centric CLI from a DAG file. That feature could become the interface of a containerized DAG approach. As far as I know the feature is not well documented and important

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread Kyle Hamlin
This AIP looks really great, thanks to all involved with it! I do have a question about it that I hoping someone can answer. I’ve noticed with the kubernetes executor that when a DAG runs and there is say a syntax error with it or a misconfiguration of some sort that ends up in the task attributes,

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-27 Thread James Meickle
On the topic of using Docker, I highly recommend looking at Argo Workflows and some of their sample code: https://github.com/argoproj/argo tl;dr is that it's a workflow management tool where DAGs are expressed as YAML manifests, and tasks are just containers run on Kubernetes. I think that there'

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-26 Thread Maxime Beauchemin
Related thoughts: * on the topic of serialization, let's be clear whether we're talking about unidirectional serialization and *not* deserialization back to the object. This works for making the web server stateless, but isn't a solution around how DAG definition get shipped around on the cluster

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-26 Thread Kevin Yang
My bad, I was misunderstanding a bit and mixing up two issues. I was thinking about the multiple runs for one DagRun issue( e.g. after we clear the DagRun). This is an orthogonal issue. So the current implementation can work in the long term plan. Cheers, Kevin Y On Tue, Feb 26, 2019 at 2:34 AM

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-26 Thread Ash Berlin-Taylor
> On 26 Feb 2019, at 09:37, Kevin Yang wrote: > > Now since we're already trying to have multiple graphs for one > execution_date, maybe we should just have multiple DagRun. I thought that there is exactly 1 graph for a DAG run - dag_run has a "graph_id" column

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-26 Thread Kevin Yang
> > possible as it is potentially a big(breaking) change. We would like to >> > make sure nothing unexpected happens if we are going with this >> direction. >> > >> > Thanks, >> > -Tao >> > >> > On Sat, Feb 23, 2019 at 11:41 AM Bas Harens

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-24 Thread Kevin Yang
y a big(breaking) change. We would like to > > make sure nothing unexpected happens if we are going with this direction. > > > > Thanks, > > -Tao > > > > On Sat, Feb 23, 2019 at 11:41 AM Bas Harenslak < > > basharens...@godatadriven.com> wrote: > >

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-23 Thread Tao Feng
ld like to > make sure nothing unexpected happens if we are going with this direction. > > Thanks, > -Tao > > On Sat, Feb 23, 2019 at 11:41 AM Bas Harenslak < > basharens...@godatadriven.com> wrote: > >> Let's discuss AIP-12 here: >> https://cwiki.apache.

Re: [DISCUSS] AIP-12 Persist DAG into DB

2019-02-23 Thread Tao Feng
s direction. Thanks, -Tao On Sat, Feb 23, 2019 at 11:41 AM Bas Harenslak < basharens...@godatadriven.com> wrote: > Let's discuss AIP-12 here: > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB. > It involves persisting the entire DAG into the met

[DISCUSS] AIP-12 Persist DAG into DB

2019-02-23 Thread Bas Harenslak
Let's discuss AIP-12 here: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB. It involves persisting the entire DAG into the metastore. For full details, please read the AIP. A PR was made to create “versioned graphs” given by option #3 in the AIP:

Re: AIP-12 Persist DAG into DB

2019-02-01 Thread Dan Davydov
Gruns can be ensured to > > > > >run at > > > > >the same version of a DAG instead of whatever happens to live on the > > > > >worker > > > > >at the time). > > > > > > > > > >On Thu, Jan 31, 2019 at 9:44

Re: AIP-12 Persist DAG into DB

2019-02-01 Thread Ben Tallman
happens to live on the > > > >worker > > > >at the time). > > > > > > > >On Thu, Jan 31, 2019 at 9:44 AM Peter van ‘t Hof < > > > >petervant...@godatadriven.com> wrote: > > > > > > > >> Hi A

Re: AIP-12 Persist DAG into DB

2019-01-31 Thread Maxime Beauchemin
vant...@godatadriven.com> wrote: > > > > > >> Hi All, > > >> > > >> As most of you guys know, airflow got an issue when loading new dags > > >where > > >> the webserver sometimes sees it and sometimes not. > > >>

Re: AIP-12 Persist DAG into DB

2019-01-31 Thread Dan Davydov
> >petervant...@godatadriven.com> wrote: > > > >> Hi All, > >> > >> As most of you guys know, airflow got an issue when loading new dags > >where > >> the webserver sometimes sees it and sometimes not. > >> Because of this we did wro

Re: AIP-12 Persist DAG into DB

2019-01-31 Thread Ash Berlin-Taylor
t;> As most of you guys know, airflow got an issue when loading new dags >where >> the webserver sometimes sees it and sometimes not. >> Because of this we did wrote this AIP to solve this issue: >> >> >https://cwiki.apache.org/confluence/display/AIRF

Re: AIP-12 Persist DAG into DB

2019-01-31 Thread Dan Davydov
guys know, airflow got an issue when loading new dags where > the webserver sometimes sees it and sometimes not. > Because of this we did wrote this AIP to solve this issue: > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB > > Any feedback is wel

Re: AIP-12 Persist DAG into DB

2019-01-31 Thread airflowuser
, > > As most of you guys know, airflow got an issue when loading new dags where > the webserver sometimes sees it and sometimes not. > Because of this we did wrote this AIP to solve this issue: > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB >

AIP-12 Persist DAG into DB

2019-01-31 Thread Peter van ‘t Hof
Hi All, As most of you guys know, airflow got an issue when loading new dags where the webserver sometimes sees it and sometimes not. Because of this we did wrote this AIP to solve this issue: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB Any feedback is welcome