Re: dag file processing times

2017-04-24 Thread Bolke de Bruin
Cross-platform is nice, but OSX and Windows are imho not production systems for Airflow. We have too many idiosyncrasies tied to linux (cgroups etc), that do not warrant a need for cross-platform. Bolke. > On 25 Apr 2017, at 08:10, Gerard Toonstra wrote: > > Hey, > > Wow, many responses...

Re: dag file processing times

2017-04-24 Thread Gerard Toonstra
Hey, Wow, many responses... Trying to keep things simple: The easiest is to reduce scope to stop processing files that are currently not active, i.e. don't have a currently active dagrun. Without an active dagrun, the processormanager would put a (datetime, filename) tuple on a deque or priority

Re: dag file processing times

2017-04-24 Thread Maxime Beauchemin
[I wrote this while offline without having received the full conversation, sorry if it's a bit off and looks like it's disregarding previous comments] On Mon, Apr 24, 2017 at 4:09 PM, Maxime Beauchemin < maximebeauche...@gmail.com> wrote: > With configuration as code, you can't really know whethe

Re: dag file processing times

2017-04-24 Thread Maxime Beauchemin
With configuration as code, you can't really know whether the DAG definition has changed based on whether the module was altered. This python module could be importing other modules that have been changed, could have read a config file somewhere on the drive that might have changed, or read from a

Re: dag file processing times

2017-04-24 Thread Alex Guziel
It wouldn't really be serialization. You would still need to watch all the dependent code unless you wanted a continuous parse going on. On Mon, Apr 24, 2017 at 3:19 PM, Bolke de Bruin wrote: > That would be close to serialization which you could do with marshmallow > (which works better than pi

Re: dag file processing times

2017-04-24 Thread Bolke de Bruin
That would be close to serialization which you could do with marshmallow (which works better than pickle). B. Sent from my iPhone > On 25 Apr 2017, at 00:07, Alex Guziel wrote: > > You can also use reflection in Python to read the modules all the way down. > > On Mon, Apr 24, 2017 at 3:05

Re: dag file processing times

2017-04-24 Thread Alex Guziel
You can also use reflection in Python to read the modules all the way down. On Mon, Apr 24, 2017 at 3:05 PM, Dan Davydov wrote: > Was talking with Alex about the DB case offline, for those we could support > a force refresh arg with an interval param. > > Manifests would need to be hierarchal bu

Re: dag file processing times

2017-04-24 Thread Dan Davydov
Was talking with Alex about the DB case offline, for those we could support a force refresh arg with an interval param. Manifests would need to be hierarchal but I feel like it would spin out into a full blown build system inevitably. On Mon, Apr 24, 2017 at 3:02 PM, Arthur Wiedmer wrote: > Wha

Re: dag file processing times

2017-04-24 Thread Arthur Wiedmer
What if the DAG actually depends on configuration that only exists in a database and is retrieved by the Python code generating the DAG? Just asking because we have this case in production here. It is slowly changing, so still fits within the Airflow framework, but you cannot just watch a file...

Re: dag file processing times

2017-04-24 Thread Bolke de Bruin
Inotify can work without a daemon. Just fire a call to the API when a file changes. Just a few lines in bash. If you bundle you dependencies in a zip you should be fine with the above. Or if we start using manifests that list the files that are needed in a dag... Sent from my iPhone > On 24

Re: dag file processing times

2017-04-24 Thread Dan Davydov
One idea to solve this is to use a daemon that uses inotify to watch for changes in files and then reprocesses just those files. The hard part is without any kind of dependency/build system for DAGs it can be hard to tell which DAGs depend on which files. On Mon, Apr 24, 2017 at 1:21 PM, Gerard To

dag file processing times

2017-04-24 Thread Gerard Toonstra
Hey, I've seen some people complain about DAG file processing times. An issue was raised about this today: https://issues.apache.org/jira/browse/AIRFLOW-1139 I attempted to provide a good explanation what's going on. Feel free to validate and comment. I'm noticing that the file processor is a

Re: [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-24 Thread Bolke de Bruin
> On 23 Apr 2017, at 09:17, Bolke de Bruin wrote: > > > > Sent from my iPhone > >> On 23 Apr 2017, at 03:46, Hitesh Shah wrote: >> >> On Fri, Apr 21, 2017 at 8:19 AM, Chris Riccomini >> wrote: >> >>> Version in pkg-info has an rc0 notation. It should just be >>> 1.8.1-incubating. >>

Re: [VOTE] Release Airflow 1.8.1 RC1

2017-04-24 Thread Sumit Maheshwari
+1 (binding) On Mon, Apr 24, 2017 at 10:09 PM, Chris Riccomini wrote: > Dear All, > > I've made Airflow 1.8.1 RC1 available at: > https://dist.apache.org/repos/dist/dev/incubator/airflow, public keys are > available at https://dist.apache.org/repos/dist/release/incubator/airflow. > > New issue

[VOTE] Release Airflow 1.8.1 RC1

2017-04-24 Thread Chris Riccomini
Dear All, I've made Airflow 1.8.1 RC1 available at: https://dist.apache.org/repos/dist/dev/incubator/airflow, public keys are available at https://dist.apache.org/repos/dist/release/incubator/airflow. New issues fixed in 1.8.1 RC1: [AIRFLOW-1138] Add licenses to files in scripts directory [AIRFL

[CANCEL] [VOTE] Release Airflow 1.8.1 based on Airflow 1.8.1 RC0

2017-04-24 Thread Chris Riccomini
Canceling this vote due to various concerns and bug fixes listed below. Will start an RC1 vote shortly. On Sun, Apr 23, 2017 at 12:17 AM, Bolke de Bruin wrote: > > > Sent from my iPhone > > > On 23 Apr 2017, at 03:46, Hitesh Shah wrote: > > > > On Fri, Apr 21, 2017 at 8:19 AM, Chris Riccomini