Re: [Discuss] EMR Step operator/sensor

2019-10-15 Thread Jonathan Miles
That's a great idea, to provide a generic way to do these. I feel like standalone sensors are a bit abused in the framework, like they're better suited as triggers when an external source is ready (e.g. new S3 file appears) than to poll for completion of a previous task (e.g. EmrStepSensor); ex

Re: [Notes] KNative executor sync

2019-10-15 Thread Gaetan Semet
Hi > Today Astronomer and Polidea have a very inspiring call about KNative > executor proposed in AIP-25. I am sending over notes from the meeting: > https://docs.google.com/document/d/1UGLq6KxEglvhp8wqKIwpevBoov4Rro8YiW-JufYfV_Q/edit?usp=sharing About the celery autoscaling, have you tried to wo

Managing external resources in Airflow

2019-10-15 Thread Jonathan Miles
Hi all, Posting this to dev instead of users because it crosses into framework territory. I've been using Airflow for six months or so and I'm starting to think about how to better manage Airflow tasks that are proxies for compute tasks running elsewhere, e.g. steps on Amazon EMR clusters. I

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Driesprong, Fokko
Big +1 from my side, looking forward to make this happen. Two sides that aren't completely clear to me: - Are we going to extend the existing data model, to allow the RDBMS to optimize queries on fields that we use a lot? - How are we going to do state evolution when we extend the JSON m

Re: Dockerhub issues

2019-10-15 Thread Driesprong, Fokko
Thanks for the update Jarek. I know that UTC+1 is the leading timezone, but for calling it a day in SF would be a bit optimistic :-) Op di 15 okt. 2019 om 21:46 schreef Jarek Potiuk : > Seems that we have continuous DockerHub incident that impacts nearly > everyone > https://twitter.com/search?q

Dockerhub issues

2019-10-15 Thread Jarek Potiuk
Seems that we have continuous DockerHub incident that impacts nearly everyone https://twitter.com/search?q=dockerhub&src=typed_query If you have better things to do, I'd say let's call it a day ;) . J. -- Jarek Potiuk Polidea | Principal Software Engineer M: +48 660

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Dan Davydov
I have been following it from the beginning as well. I understand there would be short-term wins for some users (I don't think a huge amount of users?), but I still feel like we are being a bit short-sighted here and that we are creating more work for ourselves and potentially our users in the futu

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Jarek Potiuk
Hello Dan, Alex, I believe all the points you make are super-valid ones. But maybe you are missing the full context a bit. I followed the original discussion from the very

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Alex Guziel
-1 (binding) Good points made by Dan. We don't need to have the future plan implemented completely but it would be nice to see more detailed notes about how this will play out in the future. We shouldn't walk into a system that causes more pain in the future. (I can't say for sure that it does, but

Re: [Discuss] EMR Step operator/sensor

2019-10-15 Thread Jarek Potiuk
I think it could be solved in a more consistent and future-proof way. There is this new proposal for Base Async operators proposed by Jacob: https://github.com/apache/airflow/pull/6210 This is an interesting approach that might solve the problem in a slightly different way. Rather than combining

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Dan Davydov
-1 (binding), this may sound a bit FUD-y but I don't feel this has been thought through enough... Having both a SimpleDagBag representation and the JSON representation doesn't make sense to me at the moment: *"**Quoting from Airflow code, it is “a simplified representation of a DAG that contains a

Re: [Notes] KNative executor sync

2019-10-15 Thread Jarek Potiuk
Fantastic! On Tue, Oct 15, 2019 at 6:24 PM Ash Berlin-Taylor wrote: > > https://cwiki.apache.org/confluence/display/AIRFLOW/2019-10-15+Knative+Executor+Meeting+notes?flashId=1743277309 > > Haven't put it anywhere in particular just yet. > > File list is for uploads (it's a built in Jira "compone

Re: [Discuss] EMR Step operator/sensor

2019-10-15 Thread Jonathan Miles
Yes, I often refer to this as the "atomicity problem": we usually want all four "create", "add steps", "wait for steps", "terminate cluster" tasks to retry together and in order if any one of them fails (well, the terminate one is questionable). In our current Dags we resolved this by putting t

Re: [Notes] KNative executor sync

2019-10-15 Thread Ash Berlin-Taylor
https://cwiki.apache.org/confluence/display/AIRFLOW/2019-10-15+Knative+Executor+Meeting+notes?flashId=1743277309 Haven't put it anywhere in particular just yet. File list is for uploads (it's a built in Jira "component") - and Jira has a template for meeting notes so I just used that. -a > On

Re: [Notes] KNative executor sync

2019-10-15 Thread Tomasz Urbaszek
I agree with you. Should we add something like 'Special Interest Groups` to page tree and then group notes by SIG subject. Or should we reuse already existing 'File list'? Bests, Tomek On Tue, Oct 15, 2019 at 6:06 PM Kamil Breguła wrote: > Hello, > > I think we should think about archiving the

Re: [Notes] KNative executor sync

2019-10-15 Thread Ash Berlin-Taylor
Agreed - already on it. Link to follow shortly. > On 15 Oct 2019, at 17:06, Kamil Breguła wrote: > > Hello, > > I think we should think about archiving these types of documents. What > do you think about copying this document to the cwiki? > > Best regards, > > On Tue, Oct 15, 2019 at 5:35 P

Re: [Notes] KNative executor sync

2019-10-15 Thread Kamil Breguła
Hello, I think we should think about archiving these types of documents. What do you think about copying this document to the cwiki? Best regards, On Tue, Oct 15, 2019 at 5:35 PM Tomasz Urbaszek wrote: > > Hi all, > > Today Astronomer and Polidea have a very inspiring call about KNative > execu

[Notes] KNative executor sync

2019-10-15 Thread Tomasz Urbaszek
Hi all, Today Astronomer and Polidea have a very inspiring call about KNative executor proposed in AIP-25. I am sending over notes from the meeting: https://docs.google.com/document/d/1UGLq6KxEglvhp8wqKIwpevBoov4Rro8YiW-JufYfV_Q/edit?usp=sharing I think we can treat this as a launch of the idea o

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Kamil Breguła
+1 (binding) On Tue, Oct 15, 2019 at 2:57 AM Kaxil Naik wrote: > Hello, Airflow community, > > This email calls for a vote to add the DAG Serialization feature at > https://github.com/apache/airflow/pull/5743. > > *AIP*: > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistenc

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Philippe Gagnon
+1 non-binding On Mon, Oct 14, 2019 at 8:57 PM Kaxil Naik wrote: > Hello, Airflow community, > > This email calls for a vote to add the DAG Serialization feature at > https://github.com/apache/airflow/pull/5743. > > *AIP*: > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persiste

Re: Kubernetes tests failing

2019-10-15 Thread Jarek Potiuk
Feel free to rebase on top of master branch. The failed Kubernetes tests should work now and they should be more stable (hopefully!). Once again thanks Gerardo for this KIND change :). Kubernetes instability was one of the recent pain points and it will make it easier to migrate to GitLab once the

Kubernetes tests failing

2019-10-15 Thread Jarek Potiuk
Today we started to have kubernetes tests failing due to some dependency change. Luckily Gerardo just completed a change which changes the way we run Kubernetes tests to be more stable (using Kind - Kubernetes in Docker. I took the liberty to merge that change (it should fix the problem). I am run

Re: [PROPOSAL] Production-ready Airflow docker image and helm chart proposal

2019-10-15 Thread Gaetan Semet
Hi > For the governance I am happy to be added to the OWNERS list and I think > Daniel will be happy as well :). Feel free to submit a pull request so that I can approve directly. > [...] then official release of the helm chart needs to be voted by the PMC Hum, as of today, each approved PR res

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Jarek Potiuk
(binding) On Tue, Oct 15, 2019 at 9:43 AM Jarek Potiuk wrote: > > I looked over your shoulders while you were implementing it. Big +1 from > me. > > On Tue, Oct 15, 2019 at 2:57 AM Kaxil Naik wrote: > >> Hello, Airflow community, >> >> This email calls for a vote to add the DAG Serialization fe

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Jarek Potiuk
I looked over your shoulders while you were implementing it. Big +1 from me. On Tue, Oct 15, 2019 at 2:57 AM Kaxil Naik wrote: > Hello, Airflow community, > > This email calls for a vote to add the DAG Serialization feature at > https://github.com/apache/airflow/pull/5743. > > *AIP*: > https://c