Re: Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

2020-02-24 Thread Driesprong, Fokko
Sweet work Kamil and others! I'll try to go through them today! Cheers, Fokko Op ma 24 feb. 2020 om 22:37 schreef Tao Feng : > Great work Kamil! Let us know once it is landed in one of the future > releases. Would love to try it out :) > > Best, > -Tao > > On Mon, Feb 24, 2020 at 12:54 PM Qingpi

Re: Airflow and Machine Learning

2020-02-24 Thread Jarek Potiuk
Agree some kind of benchmarking is needed indeed. And I think this discussion is great BTW :) This would be great if we could achieve the same consistent environment through the whole lifecycle of a DAG. But I am not sure it can be achieved - I got quite disappointed with the iteration speed that

Re: Airflow and Machine Learning

2020-02-24 Thread Dan Davydov
I see things the same way as James. That being said I have not worked with Docker very much (maybe Daniel Imberman can comment?), so I could have some blindspots. I have heard latency concerns expressed by several people for example (can't remember in which areas). The main thing that draws me to

Re: [DISCUSS] Reduce (remove?) automated imports in Airflow 2.0

2020-02-24 Thread Jarek Potiuk
Cool! On Mon, Feb 24, 2020 at 3:58 PM Ash Berlin-Taylor wrote: > https://github.com/apache/airflow/pull/7517 has been merged now, so we > have kept airflow.DAG working, just lazily loaded. > > -a > On Feb 23 2020, at 10:53 pm, Kaxil Naik wrote: > > Yay !! Nice suggestion Kamil, good work Ash >

Re: Airflow and Machine Learning

2020-02-24 Thread James Meickle
I appreciate where you're coming from on wanting to enhance productivity for different types of users, but as a cluster administrator, I _really_ don't want to be running software that's managing its own Docker builds, virtualenvs, zip uploads, etc.! It will almost certainly not do so in a way comp

Re: Airflow and Machine Learning

2020-02-24 Thread Jarek Potiuk
I think what would help a lot to solve the problems of env/deps/DAGS is the wheel packaging we started to talk about in another thread. I personally think what argo does is too much "cloud native" - you have to build the image, push to registry, get it pulled by the engine, execute etc. I've been

Re: Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

2020-02-24 Thread Tao Feng
Great work Kamil! Let us know once it is landed in one of the future releases. Would love to try it out :) Best, -Tao On Mon, Feb 24, 2020 at 12:54 PM Qingping Hou wrote: > Awesome work Kamil! Great to see us embracing query batching in the > code base. I can't wait to deploy those optimization

Re: Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

2020-02-24 Thread Qingping Hou
Awesome work Kamil! Great to see us embracing query batching in the code base. I can't wait to deploy those optimizations into our production environment. Thanks, QP Hou On Mon, Feb 24, 2020 at 8:35 AM Kamil Breguła wrote: > > Hello, > > Polidea [1] together with Databand [2] has taken steps to

Re: airflow webserver start and stops immediately.

2020-02-24 Thread Ash Berlin-Taylor
Attacment's are allowed on this mailing list, sorry. On Feb 24 2020, at 7:08 pm, Sasi kumar Deivasikamani wrote: > attached is the error. > > https://linux.die.net/man/8/initctl > > On Mon, Feb 24, 2020 at 12:51 PM Ash Berlin-Taylor wrote: > > > > What is initctl configured to run? You will nee

Re: airflow webserver start and stops immediately.

2020-02-24 Thread Sasi kumar Deivasikamani
attached is the error. https://linux.die.net/man/8/initctl On Mon, Feb 24, 2020 at 12:51 PM Ash Berlin-Taylor wrote: > > What is initctl configured to run? You will need to find logs from initctl as > from what you've described airflow isn't even starting up enough to configure > it's logging

Re: airflow webserver start and stops immediately.

2020-02-24 Thread Ash Berlin-Taylor
What is initctl configured to run? You will need to find logs from initctl as from what you've described airflow isn't even starting up enough to configure it's logging. -a On Feb 24 2020, at 5:39 pm, Sasi kumar Deivasikamani wrote: > after installation > airflow initdb > > i am attempting to

Re: airflow webserver start and stops immediately.

2020-02-24 Thread Sasi kumar Deivasikamani
after installation airflow initdb i am attempting to kick off webserver and scheduler using sudo command ie. sudo initctl start airflow-webserver sudo initctl start airflow-scheduler above helps me start the process (supervisor i am using hadoop use id - EMR setup). unfortunately the process sho

Re: airflow webserver start and stops immediately.

2020-02-24 Thread Ash Berlin-Taylor
You haven't given us enough information to help debug this. What command is initctl running? (I'm not familiar with which process supervisor that is) What logs have you looked in to for an error? What steps have you already tried? -ash On Feb 24 2020, at 5:27 pm, Sasi kumar Deivasikamani wrote:

Re: [DISCUSS] AIP-31: Airflow functional DAG API

2020-02-24 Thread Jarek Potiuk
Ah yeah... I totally forgot about that :) (shame on me) ... But it does seem appropriate if I came to the same conclusion again looking from another angle :D J. On Mon, Feb 24, 2020 at 6:25 PM Gerard Casas Saez wrote: > Agree, I initially pitched the idea on the lineage thread and was > e

airflow webserver start and stops immediately.

2020-02-24 Thread Sasi kumar Deivasikamani
Hi, Anybody help me out - why webserver after starting immediately it goes to stop/waiting state. sudo initctl status airflow-webserver airflow-webserver start/running, process 4480 sudo initctl status airflow-webserver airflow-webserver stop/waiting Thanks, Sasi

Re: [DISCUSS] AIP-31: Airflow functional DAG API

2020-02-24 Thread Gerard Casas Saez
Agree, I initially pitched the idea on the lineage thread and was encouraged to pitch it separately. I would love to help figure out how to align this 2 projects better. Bolke - want to set up a call or how should we discuss this better? Would love to hear feedback on my proposal. Gerard Casas

Re: Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

2020-02-24 Thread Evgeny Shulman
This is a really great improvement! Great job by everybody, we are really excited about this contribution! These changes make it easier for Airflow to support much more complex/large scale use cases in the future. Looking forward to more improvements like this one! * Huge thanks to friends from Pol

Re: Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

2020-02-24 Thread Jarek Potiuk
Those are all great improvements Kamil! It would be great to have them reviewed, tested and merged for 2.0 ! J. On Mon, Feb 24, 2020 at 5:35 PM Kamil Breguła wrote: > Hello, > > Polidea [1] together with Databand [2] has taken steps to optimize > scheduler performance. > I made many changes l

Re: Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

2020-02-24 Thread Tomasz Urbaszek
Thanks Kamil for the work! I've reviewed your PRs and everything looks good so I keep my fingers crossed for this optimization to be true ;) T. On Mon, Feb 24, 2020 at 5:35 PM Kamil Breguła wrote: > > Hello, > > Polidea [1] together with Databand [2] has taken steps to optimize > scheduler per

Big performance optimization of Scheduler - 10x faster , 2000+ fewer queries count

2020-02-24 Thread Kamil Breguła
Hello, Polidea [1] together with Databand [2] has taken steps to optimize scheduler performance. I made many changes last weekend: 1. [AIRFLOW-6856] Bulk fetch paused_dag_ids https://github.com/apache/airflow/pull/7476 2. [AIRFLOW-6857] Bulk sync DAGs https://github.com/apache/airflow/pull/7477 3

Re: Airflow and Machine Learning

2020-02-24 Thread Ash Berlin-Taylor
> DAG state (currently stored directly in the DB) Can you expand on this point James? What is the problem or limitation here? And would those be solved by expanding on the APIs to allow this to be set by some external process? On Feb 24 2020, at 3:45 pm, James Meickle wrote: > I really agree w

Re: Airflow and Machine Learning

2020-02-24 Thread Ash Berlin-Taylor
To address one point: > stores git commits and retrieves DAG definitions as needed Is sometimes counter to the desired behaviour. For instance if there is a bug in your python operator or even your dag parameters you don't want it to to re-run with the same version of the code it ran with last t

Re: Airflow and Machine Learning

2020-02-24 Thread James Meickle
I really agree with most of what was posted above but particularly love what Evgeny wrote about having a DAG API. As an end user, I would love to be able to provide different implementations of core DAG functionality, similar to how hExecutor can already be subclassed. Some key behavior points I ei

Re: MySQL version support for Airflow 5.6 ->? 5.7

2020-02-24 Thread Kaxil Naik
> > What do you think ? Should we move to 5.7 as a base with 2.0? Or should we > test both? Move to 5.7 On Mon, Feb 24, 2020 at 2:55 PM Ash Berlin-Taylor wrote: > I'm in favour of moving forward, possibly even to 8.0. But it seems that > Cloud SQL is tardy and haven't supported a new major vers

Re: [DISCUSS] Reduce (remove?) automated imports in Airflow 2.0

2020-02-24 Thread Ash Berlin-Taylor
https://github.com/apache/airflow/pull/7517 has been merged now, so we have kept airflow.DAG working, just lazily loaded. -a On Feb 23 2020, at 10:53 pm, Kaxil Naik wrote: > Yay !! Nice suggestion Kamil, good work Ash > ᐧ > > On Sun, Feb 23, 2020 at 10:51 PM Ash Berlin-Taylor wrote: > > Thanks

Re: MySQL version support for Airflow 5.6 ->? 5.7

2020-02-24 Thread Ash Berlin-Taylor
I'm in favour of moving forward, possibly even to 8.0. But it seems that Cloud SQL is tardy and haven't supported a new major version even though it has been out for almost two years: https://cloud.google.com/sql/docs/db-versions I think given Cloud SQL now defaults to 5.7 for MySQL that 5.7 maki

task failed without running

2020-02-24 Thread heng gu
I have this dag with a branchpythonoperator task kicking off many of 24 tasks, in this case, 4 tasks. 2 of the tasks were successful, the other two (register_YZ, register_ZY) were failed without running (see the attached UI screen shots). There is no log for tasks register_YZ and register_ZY. I

Re: Internal APIs of Airflow

2020-02-24 Thread Jarek Potiuk
Ah yeah I had a good discussion today about unsuccessful (and abandoned) attempts to try to sandbox python in Python 2. You are probably right that it is not pythonic and likely we will never be able to fully "forbid" something. Indeed "forbidding" might be a bit too strong. But warnings and explic

Re: Internal APIs of Airflow

2020-02-24 Thread James Meickle
I think that trying to *forbid* this is really not Pythonic. A more appropriate way would be to have import paths ("from airflow.internals"), docstrings, and warnings (via the silenceable warnings module) indicating which APIs are "internal" (i.e., subject to change even in patch versions). That is

Internal APIs of Airflow

2020-02-24 Thread Jarek Potiuk
I would like to open another discussion :) .. Following recent discussions about the "DAG" being an important internal API of Airflow but also following this post: https://medium.com/maisonsdumonde/road-to-add-form-for-airflows-dag-1dcf2e7583ef I think we should consider marking certain internal p

MySQL version support for Airflow 5.6 ->? 5.7

2020-02-24 Thread Jarek Potiuk
Currently we are running our tests for MySQL using both client and server of mysql in 5.6 version and I wonder how we should proceed with it?. Is there any reason why we would stick to 5.6 and not use 5.7 (or maybe we should do both)? While 5.6 is still "alive", it's been released 7 years ago and