Hello Massy,

I just answer on reddit, I copy/paste answer here in case someone is interested 

Dataflow support python 

In my company we do use apache-beam/dataflow in prod with a setup.py to 
initialize dependencies, even non-python 
 like polyglot<https://polyglot.readthedocs.io/en/latest/Installation.html>. 
The juliaset example is helpful to start.

We have the same constraint as you regarding DS, but in our side it is mainly 

Don't hesitate to take a look at this 
 which give an overview on how we work with DS.

You should be able to wrap apache-beam/dataflow code to have the same syntax as 
sklearn. Then, DS will be able to be autonomous with the scalability and 
without to know the complexity of the cluster-computing framework.

Hope this helps.


From: Massy Bourennani <massybourenn...@gmail.com>
Reply-To: "user@beam.apache.org" <user@beam.apache.org>
Date: Tuesday 16 July 2019 at 10:49
To: "user@beam.apache.org" <user@beam.apache.org>
Subject: Industrializing batch ML algorithm using Apache Beam/Dataflow (on 
Google Cloud Platform)

Hi all,
Here is the link to the Reddit post[1]
Many thanks for your help.


Reply via email to