Re: execution_date - can we stop the confusion?

2018-09-26 Thread George Leslie-Waksman
This comes up a lot. I've seen it on this mailing list multiple times and it's something that I have to explicitly call out to every single person that I've helped train up on Airflow. If we take a moment to set aside why things are the way they are, what the documentation says, and how

Re: execution_date - can we stop the confusion?

2018-09-26 Thread Brian Greene
It took a minute to grok, but in the larger context of how af works it makes perfect sense the way it is. Changing something so fundamentally breaking to every dag in existence should bring a comparable benefit. Beyond the avoiding teaching a concept you disagree with, what benefits does the

Re: hooks & operators improvement proposal

2018-09-26 Thread Jeff Payne
Ah, OK. Thanks for the clarification. Get Outlook for Android From: Daniel Cohen Sent: Wednesday, September 26, 2018 1:15:16 PM To: dev@airflow.incubator.apache.org Subject: Re: hooks & operators improvement proposal Hi Jeff, seems that I

Re: hooks & operators improvement proposal

2018-09-26 Thread Daniel Cohen
Hi Jeff, seems that I was a bit unclear The DAG ETL spans across multiple tasks. and usually looks like kickoff >> source_to_staging >> staging_to_warehouse >> warehouse_post_process. I'm not proposing changes to operators they are gr8 , what i am proposing is to borrow the same concept to the

Re: hooks & operators improvement proposal

2018-09-26 Thread Jeff Payne
So, in your scenario, the ETL pipeline happens inside the single operator/task? If so, would it not make sense for the pipeline to span multiple tasks and provide a standard set of functions/decorators/etc for defining the input/output to/from each task? That way you would leverage the ability

Re: execution_date - can we stop the confusion?

2018-09-26 Thread Kaxil Naik
This has been clearly documented as Bolke stated. It is an integral part of Airflow and a user learning Airflow needs to learn this. If you think it in an ETL perspective it completely makes sense. Also, if you can use you real name than "airflowuser" would be good, your preference though. Also,

hooks & operators improvement proposal

2018-09-26 Thread Daniel Cohen
Some thoughts about operators / hooks: Operators are composable, typical ETL flow looks like `kickoff >> source_to_staging >> staging_to_warehouse >> warehouse_post_process` where tasks use shared state (like s3) or naming conventions to continue work where upstream task left off. hooks on the

Re: execution_date - can we stop the confusion?

2018-09-26 Thread Maxime Beauchemin
I think if you have a functional mindset (as in "functional data engineering ") as opposed to a cron mindset, using the left bound of the time interval makes a lot of sense.

Re: execution_date - can we stop the confusion?

2018-09-26 Thread Sam Elamin
Hi Bolke Speaking as a consultant who is constantly training other teams how to use airflow, I do frequently see this confusion. Another one is how the batch_date is always batch_date + interval or as the docs make it quite clear "*Let’s Repeat That* The scheduler runs your job one

Re: execution_date - can we stop the confusion?

2018-09-26 Thread Bolke de Bruin
I dont think this makes sense and I dont that think anyone had a real issue with this. Execution date has been clearly documented and is part of the core principles of airflow. Renaming will create more confusion. Please note that I do think that as an anonymous user you cannot speak for any

execution_date - can we stop the confusion?

2018-09-26 Thread airflowuser
One of the most annoying, hard to understand and against all common sense is the execution_date behavior. I assume that any new Airflow user has been struggling with it. The amount of questions with answers referring to : https://airflow.apache.org/scheduler.html?scheduling-triggers is

SqlAlchemy Pool config parameters to minimize connectivity issue impact

2018-09-26 Thread ramandumcs
Hi All, We are observing sometimes Dag tasks get failed because of some connectivity issues with Mysql server. So Are there any recommended settings for mysql pool's related parameters like sql_alchemy_pool_size = 5 sql_alchemy_pool_recycle = 3600 to minimise the connectivity issue impact.