Re: Airflow and Machine Learning

2020-02-19 Thread Evgeny Shulman
Hey Everybody (Fully agreed on Dan's post. These are the main pain points we see/trying to fix. Here is our reply on the thread topic) We have numerous ML engineers that use our open source project (DBND) with Airflow for their everyday work. We help them create and monitor ML/DATA pipelines

Re: Airflow Celery Executor - Some Task instances turning to failed without running and exporting log

2020-02-19 Thread Mehmet Ersoy
Hi Tomasz, What do you mean when you say "The mentioned DAG is missing"? I was add the DAG, but gmail may have rejected that .py file. I have attached again the file as .txt. Thanks, Mehmet. Mehmet Ersoy , 19 Şub 2020 Çar, 15:20 tarihinde şunu yazdı: > I'm using 1.10.6 version of Airflow. > Yes

Re: Airflow and Machine Learning

2020-02-19 Thread Soma S Dhavala
daggit design doc outlines the vision of what we were looking for in terms of an ML-As-A-Serive platform. some ideas on making the apps composable is here

Re: Airflow and Machine Learning

2020-02-19 Thread Soma S Dhavala
At project sunbird, we built daggit , an open source ML-As-A-Service platform on the top of airflow. While airflow and other ML platforms have taken *code-as- * *configuration* approach, we like to have users declaratively specify their ML A

Re: Airflow and Machine Learning

2020-02-19 Thread Daniel Imberman
Thank you everyone for this feedback! I will organize these (and other) ideas and look forward to the conversation it starts! On Wed, Feb 19, 2020 at 9:54 AM, Ben Tallman wrote: I don’t really have time to unpack a lot here, but we use airflow to extensively orchestrate Databricks Notebook base

Custom DAG throws OOops

2020-02-19 Thread Shubhada
Hi Team, Here is the link to my stack overflow, Appreciate your help in Advance. https://stackoverflow.com/questions/60307169/airflow-dag-throws-recursionerror-triggered-via-web-console after this error, if I still keep scheduler running, get following error, self.callHandlers(record) Fil

RE: How is DAG's default_arg parameter forwarded to operators ?

2020-02-19 Thread Shaw, Damian P.
I saw that PR today and it made me realize I've not been using apply_defaults in much of my code! Look forward to this being an automaic operation via a MetaClass, great work :) ! In general I wish Airflow would more heavily leverage MetaClasses and __init_subclass__ for stuff like this, having

Re: How is DAG's default_arg parameter forwarded to operators ?

2020-02-19 Thread Kamil Breguła
Hello, You may also be interested in this PR: https://github.com/apache/airflow/pull/7450 Best regards, Kamil On Wed, Feb 19, 2020 at 4:57 PM Shaw, Damian P. wrote: > > I'm not an expert of the Airflow code, but in 1.10.2 I notice the decorator > on the __init__ of the BaseOperator named "appl

Re: Airflow and Machine Learning

2020-02-19 Thread Ben Tallman
I don’t really have time to unpack a lot here, but we use airflow to extensively orchestrate Databricks Notebook based jobs. To date, we haven’t really exposed the notebook visualizations in the Airflow UI, but instead provide deep links to the job output. We spent a not insignificant amount of

Re: Airflow and Machine Learning

2020-02-19 Thread Maxime Beauchemin
I'd have a lot of thoughts to unpack here, but top of mind is a deeper integration with [jupyter] notebooks and/or hosted notebooks-type systems. Notebooks [with papermill ] can be parameterized predictably, and notebook files provide rich log outputs (organize

Re: Airflow and Machine Learning

2020-02-19 Thread Dan Davydov
Twitter uses Airflow primarily for ML, to create automated pipelines for retraining data, but also for more ad-hoc training jobs. The biggest gaps are on the experimentation side. It takes too long for a new user to set up and run a pipeline and then iterate on it. This problem is a bit more uniqu

RE: How is DAG's default_arg parameter forwarded to operators ?

2020-02-19 Thread Shaw, Damian P.
I'm not an expert of the Airflow code, but in 1.10.2 I notice the decorator on the __init__ of the BaseOperator named "apply_defaults": https://github.com/apache/airflow/blob/1.10.2/airflow/models.py#L2472 Which is located here: https://github.com/apache/airflow/blob/1.10.2/airflow/utils/decorato

Re: Airflow and Machine Learning

2020-02-19 Thread Germain Tanguy
Hello Daniel, In my company we use airflow to update our ML models and to predict. As we use kubernetesOperator to trigger jobs, each ML DAG are similar and ML/Data science engineer can reuse a template and choose which type of machine they needs (highcpu, highmem, GPU or not..etc) We have a p

Re: How is DAG's default_arg parameter forwarded to operators ?

2020-02-19 Thread Ash Berlin-Taylor
https://github.com/apache/airflow/blob/175a1604638016b0a663711cc584496c2fdcd828/airflow/utils/decorators.py#L30-L92 Or the specific version from 1.10.2 https://github.com/apache/airflow/blob/1.10.2/airflow/utils/decorators.py#L37-L100 On Feb 19 2020, at 3:49 pm, Massy Bourennani wrote: > Hello al

How is DAG's default_arg parameter forwarded to operators ?

2020-02-19 Thread Massy Bourennani
Hello all, I'm searching through the Git repo but I couldn't find the code responsible for forwardings DAG's default_args to DAG's operators. I'm using Airflow 1.10.2 Many thanks, Massy

Airflow and Machine Learning

2020-02-19 Thread Daniel Imberman
Hello everyone! I’m working on a few proposals to make Apache Airflow more friendly for ML/Data science use-cases, and I wanted to reach out in hopes of hearing from people that are using/wish to use Airflow for ML. If you have any opinions on the subject, I’d love to hear what you’re all worki

Re: [PROPOSAL] Approach for releasing the backported "providers" packages

2020-02-19 Thread Jarek Potiuk
And just to clarify - I do not think we should make "massive" releases of all providers. Ash - you are completely right we should only release what's changed AND when it is tested. And it can all be automated so that the overhead will be rather small. And in case it is your concern - together with

Re: [PROPOSAL] Approach for releasing the backported "providers" packages

2020-02-19 Thread Jarek Potiuk
> To address point 1: I would favour individual, non-cal ver releases of the > providers, and that way we don't have to release the providers that don't > change. > Agree with releasing only what's needed. I still think CalVer is good as we can release (for example) GCP packages several times if w

Airflow Celery Executor - Adjustment Airflow roles as Linux service (Worker, Scheduler, Web Server)

2020-02-19 Thread Mehmet Ersoy
Hello friends, Is there any really healthy and consistent method about using virtual environment Airflow roles as a Linux service. There is so many methods on the blogs but most of them is not related especially virtual environment. Is there anyone who ever experiences it? What do you use when you

Re: [PROPOSAL] Approach for releasing the backported "providers" packages

2020-02-19 Thread Jarek Potiuk
> > 1. So instead of releasing one package, as stated in the wiki, we're going > to release over 50 provider packages > (apache-airflow-providers-*-*-1.10-.MM.DD)? If I understand > it correctly. That sounds to me like a lot of work to release. > The proposal is to release only the packages th

Re: [VOTE] Release process for backported (AIP-21) package

2020-02-19 Thread Ash Berlin-Taylor
[DISCUSS] all the things! \o/ I guess I wasn't clear either what I was voting on - was this a vote on the process, or the specific change? Probably best it was cancelled for now then :) -a On Feb 19 2020, at 12:37 pm, Jarek Potiuk wrote: > I think it's something that we can discuss (joking - let

Re: [VOTE] Release process for backported (AIP-21) package

2020-02-19 Thread Jarek Potiuk
I think it's something that we can discuss (joking - let's not do it ;) ) According to https://www.apache.org/foundation/voting.html#votes-on-code-modification, -1 is a veto on code modifications. "A code-modification proposal may be stopped dead in its tracks by a -1 vote by a qualified voter. T

Re: [PROPOSAL] Approach for releasing the backported "providers" packages

2020-02-19 Thread Ash Berlin-Taylor
To address point 1: I would favour individual, non-cal ver releases of the providers, and that way we don't have to release the providers that don't change. We _could_ split out the providers in to separate repos if we wanted, much like https://github.com/terraform-providers repos (though we wo

Re: [VOTE] Release process for backported (AIP-21) package

2020-02-19 Thread Ash Berlin-Taylor
I should clarify, my -1 here is not a veto. -ash On Feb 19 2020, at 9:40 am, Jarek Potiuk wrote: > Since we have -1 from Ash. The vote is cancelled for now. I have some ideas > to address Ash's concern and will continue the discussion in the > corresponding [PROPOSAL] thread here: > https://lists

Re: Airflow Celery Executor - Some Task instances turning to failed without running and exporting log

2020-02-19 Thread Mehmet Ersoy
I'm using 1.10.6 version of Airflow. Yes, this problem occurs in all of my parallel DAGs. And I attached one of my DAGs in my first mail. In addition, graph view of my DAG is as follows: [image: image.png] Thanks. Tomasz Urbaszek , 19 Şub 2020 Çar, 15:09 tarihinde şunu yazdı: > What version of

Re: [PROPOSAL] Approach for releasing the backported "providers" packages

2020-02-19 Thread Driesprong, Fokko
I'm sorry, I've been a bit busy lately. Keeping track of these discussions in the evening doesn't always for out, as it seems. 1. So instead of releasing one package, as stated in the wiki, we're going to release over 50 provider packages (apache-airflow-providers-*-*-1.10-.MM.DD)? If I unders

Re: Airflow Celery Executor - Some Task instances turning to failed without running and exporting log

2020-02-19 Thread Tomasz Urbaszek
What version of Airflow do you use? The mentioned DAG is missing but I'm curious about "parallel" jobs you are running :) Does this problem occur with only one DAG? T. On Wed, Feb 19, 2020 at 12:58 PM Mehmet Ersoy wrote: > Hi Tomasz, > For now, I'm syncing the DAGs manually sending across airf

Re: [PROPOSAL] Approach for releasing the backported "providers" packages

2020-02-19 Thread Jarek Potiuk
+Fokko Driesprong -> in case you missed it this is the thread where I proposed and we had some discussion about backporting release process (you asked for it in the cancelled [VOTE] thread). The links to PR are earlier in the discussion but for the sake of restarting the discussion I will summaris

Re: Airflow Celery Executor - Some Task instances turning to failed without running and exporting log

2020-02-19 Thread Mehmet Ersoy
Hi Tomasz, For now, I'm syncing the DAGs manually sending across airflow hosts. So, for now there is no git repository etc. In addition, My configs related with parallelism are as follows: # How many processes CeleryExecutor uses to sync task state. # 0 means to use max(1, number of cores - 1)

Re: Airflow Celery Executor - Some Task instances turning to failed without running and exporting log

2020-02-19 Thread Tomasz Urbaszek
Can you please tell me more about your environment? Especially how do you sync your DAGs / logs from celery workers? I know one setup where I've seen I/O error when wiritting to a log... T. On Wed, Feb 19, 2020 at 10:56 AM Mehmet Ersoy wrote: > > Hello Friends, > > I'm new to Airflow and I'm us

Airflow Celery Executor - Some Task instances turning to failed without running and exporting log

2020-02-19 Thread Mehmet Ersoy
Hello Friends, I'm new to Airflow and I'm using Airflow Celery executor with Postgres backend and Redis Message Queue service. For now, there is 4 worker, 1 Scheduler and 1 Web Server. I have been preparing parallel Sqoop Jobs in my daily DAGs. When I scheduled a daily DAG, Often some task instanc

Re: [DISCUSS] Reduce (remove?) automated imports in Airflow 2.0

2020-02-19 Thread Kaxil Naik
+1 - happy with that On Wed, Feb 19, 2020 at 8:09 AM Driesprong, Fokko wrote: > I'm totally fine with that. > > Cheers, Fokko > > Op di 18 feb. 2020 om 13:46 schreef Jarek Potiuk >: > > > I believe this is one of the cases where we can just go with the > consensus > > indeed :). > > > > J. > >

Re: [VOTE] Release process for backported (AIP-21) package

2020-02-19 Thread Jarek Potiuk
Since we have -1 from Ash. The vote is cancelled for now. I have some ideas to address Ash's concern and will continue the discussion in the corresponding [PROPOSAL] thread here: https://lists.apache.org/thread.html/rf6f2de8056b00ad084c96a9428670c14421a89ba2bbbd362d833bb50%40%3Cdev.airflow.apache.o

Re: [VOTE] Release process for backported (AIP-21) package

2020-02-19 Thread Driesprong, Fokko
Where is the code? I don't know what I'm voting for. If we want to do this, why aren't we cherry-picking this in the branch? I feel like I'm missing something. Cheers, Fokko Op di 18 feb. 2020 om 23:14 schreef Jarek Potiuk : > How do you propose to address this Ash (and _we_ especially) ? Any id

Re: [DISCUSS] Reduce (remove?) automated imports in Airflow 2.0

2020-02-19 Thread Driesprong, Fokko
I'm totally fine with that. Cheers, Fokko Op di 18 feb. 2020 om 13:46 schreef Jarek Potiuk : > I believe this is one of the cases where we can just go with the consensus > indeed :). > > J. > > On Tue, Feb 18, 2020 at 12:51 PM Ash Berlin-Taylor wrote: > > > Do we need to have a vote on it? I'm