Re: Airflow 2.0

2016-11-21 Thread Gerard Toonstra
More ideas: - An "airflow" plugin at the moment is more of an extension; operators, hooks, macros. Consider an additional plugin API + default implementation for code inside airflow that has a cross-cutting concern, like: * We start to use datadog for heavier monitoring of what's going on

Re: Dynamic creation of DAG

2016-11-21 Thread Maxime Beauchemin
I just added a bit of information about dynamic DAG creation here: https://github.com/apache/incubator-airflow/pull/1889/files#diff-c6f0a0722c6a2f86277535d7bcec7f8cR162 Let me know if it helps. Max On Mon, Nov 21, 2016 at 2:58 AM, Deepak Kumar Malladi wrote: > Hi, > > I want to dynamically cre

Re: Airflow 2.0

2016-11-21 Thread siddharth anand
1) The restart should not be needed, but if folks are reporting it, I'm curious what the problem might be. If yo are running on master, then you may not be aware of the min_file_process_interval setting. [scheduler] min_file_process_interval = 0 max_threads = 4 2) Yes.. security is not there. I

Re: Airflow 2.0

2016-11-21 Thread Boris Tyukin
I am still deciding between Airflow and oozie for our brand new Hadoop project but here is a few things that I did not like during my limited testing: 1) pain with scheduler/webserver restarts - things magically begin working after restart or disappear (like DAG tasks that are no longer part of DA

Re: Airflow 2.0

2016-11-21 Thread siddharth anand
Also, a survey will be a little less noisy and easier to summarize than +1s in this email thread. -s (Sid) On Mon, Nov 21, 2016 at 2:25 PM, siddharth anand wrote: > Sergei, > These are some great ideas -- I would classify at least half of them as > pain points. > > Folks! > I suggest people (on

Re: Airflow 2.0

2016-11-21 Thread siddharth anand
Sergei, These are some great ideas -- I would classify at least half of them as pain points. Folks! I suggest people (on the dev list) keep feeding this thread at least for the next 2 days. I can then float a survey based on these ideas and give the community a chance to vote so we can prioritize

Re: Airflow 2.0

2016-11-21 Thread Gerard Toonstra
+1 on driving everything through a REST API including the UI. This unifies the access to the scheduler and increases stability. Consider running a very small webserver (node.js + socket.io), which enables airflow to communicate scheduler events as they happen to anything that connects to it throug

Re: Airflow 2.0

2016-11-21 Thread Arunprasad Venkatraman
> Add FK to dag_run to the task_instance table on Postgres so that task_instances can be uniquely attributed to dag runs. > Ensure scheduler can be run continuously without needing restarts. > Ensure scheduler can handle tens of thousands of active workflows +1 We are planning to run around 40

Re: Airflow 2.0

2016-11-21 Thread Chris Riccomini
> Ensure scheduler can be run continuously without needing restarts +1 On Mon, Nov 21, 2016 at 5:25 AM, David Batista wrote: > A small request, which might be handy. > > Having the possibility to select multiple tasks and mark them as > Success/Clear/etc. > > Allow the UI to select individual ta

Dynamic creation of DAG

2016-11-21 Thread Deepak Kumar Malladi
Hi, I want to dynamically create DAG during run time. I tried the snippet given in the documentation. But it didnt work for me. Any pointer on how to trigger DAGs which aren't actually present in DAG folder but are created through code execution (dynamically created)? Thanks & Regards, Deepak

Re: Airflow 2.0

2016-11-21 Thread David Batista
A small request, which might be handy. Having the possibility to select multiple tasks and mark them as Success/Clear/etc. Allow the UI to select individual tasks (i.e., inside the Tree View) and then have a button to mark them as Success/Clear/etc. On 21 November 2016 at 14:22, Sergei Iakhnin

Re: Airflow 2.0

2016-11-21 Thread Sergei Iakhnin
I've been running Airflow on 1500 cores in the context of scientific workflows for the past year and a half. Features that would be important to me for 2.0: - Add FK to dag_run to the task_instance table on Postgres so that task_instances can be uniquely attributed to dag runs. - Ensure scheduler

RE: Airflow 2.0

2016-11-21 Thread Ryabchuk, Pavlo
-1. We extremely rely on data profiling, as a pipeline health monitoring tool -Original Message- From: Chris Riccomini [mailto:criccom...@apache.org] Sent: Saturday, November 19, 2016 1:57 AM To: dev@airflow.incubator.apache.org Subject: Re: Airflow 2.0 > RIP out the charting applicatio

Re: Airflow 2.0

2016-11-21 Thread twinkle
Hi, Like we have an admin panel,where we can configure the database connections and query them . Similarly based on the executor backend chosen, some information should be provided. Like for Airflow + rabbit Mq + Celery backend, if rabbit mq goes down, it keeps on showing the message that task ha