Re: [Discussion] In Prep for AIP: Stateful XComs and Poke Rescheduling in Operators

2020-01-10 Thread Alex Guziel
I feel like for this, we can incorporate the smart sensor we have implemented at Airbnb that we plan on open sourcing. The TL;DR is that it works by having the Sensor task run briefly and materialize some state into the DB which master sensor tasks poke for. This can be with custom time intervals.

Re: Improving the Airflow UI

2019-11-27 Thread Alex Guziel
The issue was before they re-licensed it. Now I believe the issue is put to bed as MIT is Apache compatible. On Wed, Nov 27, 2019 at 7:38 AM Kamil Breguła wrote: > But there is the question, does Apache have additional restrictions on > this issue? > > On Wed, Nov 27, 2019 at 4:30 PM Colin Ingar

Re: [DISCUSS] Using shared memory for inter-task communication

2019-11-27 Thread Alex Guziel
Agreed on running before we can crawl. The logical way to do this now is to group it as one big task with more resources. With respect to affinity on the same machine, that's basically what it is. I guess this hinges on well your solution can handle workloads with different resource requirements.

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

2019-10-15 Thread Alex Guziel
-1 (binding) Good points made by Dan. We don't need to have the future plan implemented completely but it would be nice to see more detailed notes about how this will play out in the future. We shouldn't walk into a system that causes more pain in the future. (I can't say for sure that it does, but

Re: How to manage Airflow SIGTERM excetion catches?

2019-10-03 Thread Alex Guziel
if people use either: > > try: > ... > except: > ... > > Or > > try: > ... > finally: > ... > > Their code will still run on this type of exit, but in the case of 1) this > can at least be put down to poor python code and case 2) any code

Re: How to manage Airflow SIGTERM excetion catches?

2019-10-02 Thread Alex Guziel
Task_copy.on_kill() should probably be killing the underlying process, but I think it's fuzzy where the exception gets thrown. I think the intention is for the exception to get caught in that same block, so the cleanup can happen, but this is not the case since it is thrown in the main thread. I th

Re: How to manage Airflow SIGTERM excetion catches?

2019-10-02 Thread Alex Guziel
Actually, reading the docs, the handler throws it in the main thread. In that case we should definitely change it to subclass SystemExit, or just use System.exit On Wed, Oct 2, 2019 at 12:53 PM Alex Guziel wrote: > It's been a while since I've looked at this code, but the exc

Re: How to manage Airflow SIGTERM excetion catches?

2019-10-02 Thread Alex Guziel
It's been a while since I've looked at this code, but the exception thrown there is thrown from a place where it should not be able to be caught by your operator code, so the issue may be somewhere else. On Wed, Oct 2, 2019 at 12:41 PM Shaw, Damian P. < damian.sha...@credit-suisse.com> wrote: > T

Re: Airflow node different versions

2019-09-14 Thread Alex Guziel
Agree with Bolke here. Not much is going on in worker as long as there aren’t breaking changes. On Sat, Sep 14, 2019 at 1:24 PM Bolke de Bruin wrote: > I actually think that it is not that risky (although ymmv). Worker nodes > are pretty independent from the scheduler/webserver. As long as the >

Re: [DISCUSS] Tweaks to the Airflow logo

2019-08-20 Thread Alex Guziel
Latest one looks great. On Tue, Aug 20, 2019 at 11:22 AM Aizhamal Nurmamat kyzy wrote: > Great job Chris! Love it :) Thank you for your patience and such a big > contribution! > > On Tue, Aug 20, 2019 at 10:45 AM Jarek Potiuk > wrote: > > > All for it :) > > > > On Tue, Aug 20, 2019 at 1:08 PM

Re: [ANNOUNCE] Please welcome new Airflow committer Kevin Yang

2019-04-30 Thread Alex Guziel
Congratulations Kevin! On Tue, Apr 30, 2019 at 10:58 AM Tao Feng wrote: > Congrats! > > On Tue, Apr 30, 2019 at 10:09 AM Daniel Imberman < > dimberman.opensou...@gmail.com> wrote: > > > Congrats Kevin! > > > > On Tue, Apr 30, 2019 at 9:09 AM Aizhamal Nurmamat kyzy > > wrote: > > > > > Congratul

Re: Database referral integrity

2019-04-10 Thread Alex Guziel
flow > clusters. > > On Wed, Apr 10, 2019 at 1:05 PM Alex Guziel .invalid> > wrote: > > > I'm not a huge fan of having foreign keys. I know Airbnb has and > definitely > > still has problems with DB load. I don't see any real convincing > arguments &g

Re: Database referral integrity

2019-04-10 Thread Alex Guziel
I'm not a huge fan of having foreign keys. I know Airbnb has and definitely still has problems with DB load. I don't see any real convincing arguments for how adding referential integrity will improve Airflow meaningfully (yet). On Wed, Apr 10, 2019 at 12:38 PM Bas Harenslak < basharens...@godatad

Re: [Discuss] Airflow sensor optimization

2019-03-06 Thread Alex Guziel
Sensor-service thing seems to open the door to make sensors a pubsub-type deal where possible. For example, in Hive, you can keep an in-memory registry of what partitions to sense for, and tail the audit log to see when they are populated, instead of polling. On Wed, Mar 6, 2019 at 1:51 PM Alex

Re: [Discuss] Airflow sensor optimization

2019-03-06 Thread Alex Guziel
Smart sensor seems like a good idea, but I wonder how much performance will be improved in practice. And of course, one must think about sharding and such. I'm not sure how helpful rescheduling sensors is, since it will add scheduler and DB load seemingly, which is already a bottleneck. On Wed, M

Re: question on an embarrassingly parallelism

2019-02-05 Thread Alex Guziel
The scheduler isn't guaranteed to compute them in that order to maximize parallelism. You can imagine in the case where m = n -1, that it just computes the m branches in parallel, then it has to complete the nth branch with parallelism 1. On Tue, Feb 5, 2019 at 7:20 AM soma dhavala wrote: > Imag