Re: Question regarding data usage in the DAG itself

2020-10-04 Thread Tomasz Urbaszek
Chen take a look at `processor_poll_interval` and `min_file_process_interval` options in airflow configuration. But still, I would strongly recommend removing from your DAGs any top-level code that is executed. Cheers, Tomek On Sun, Oct 4, 2020 at 6:45 PM Chen Michaeli wrote: > > Hi, a quick fo

Re: Question regarding data usage in the DAG itself

2020-10-04 Thread Chen Michaeli
Hi, a quick follow-up. Is there a parameter I can configure to alter that behavior? Say I want a specific DAG/all DAGs to be parsed every X minutes instead of the default few seconds? Thanks again :) בתאריך יום ג׳, 22 בספט׳ 2020, 20:15, מאת Tomasz Urbaszek ‏< turbas...@apache.org>: > The DAG i

Re: Question regarding data usage in the DAG itself

2020-09-22 Thread Chen Michaeli
Oh I thought the DAG is parsed only prior to execution. Thank you so much! :) בתאריך יום ג׳, 22 בספט׳ 2020, 20:15, מאת Tomasz Urbaszek ‏< turbas...@apache.org>: > The DAG is parsed every few seconds (by scheduler). It means that any > top-level code is executed every few seconds. So if you will

Re: Question regarding data usage in the DAG itself

2020-09-22 Thread Tomasz Urbaszek
The DAG is parsed every few seconds (by scheduler). It means that any top-level code is executed every few seconds. So if you will request an external API or database on DAG level (not in operator) it means that the request will be send quite often and that's definitely not an expected behavior :)

Question regarding data usage in the DAG itself

2020-09-21 Thread Chen Michaeli
Hello, I am using Apache Airflow for my fun and experience and it is great! I hope I was meant to send questions to this address, please correct me if I'm wrong. I was wondering why I shouldn't let the DAG itself do any data gathering? For example and for the sake of simplicity, I have a pipeline