Hello Massy,

I just answer on reddit, I copy/paste answer here in case someone is interested 
too.


Dataflow support python 
3.5<https://beam.apache.org/roadmap/python-sdk/#python-3-support>.



In my company we do use apache-beam/dataflow in prod with a setup.py to 
initialize dependencies, even non-python 
one<https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#nonpython>
 like polyglot<https://polyglot.readthedocs.io/en/latest/Installation.html>. 
The juliaset example is helpful to start.

We have the same constraint as you regarding DS, but in our side it is mainly 
tensorflow.



Don't hesitate to take a look at this 
article<https://medium.com/dailymotion/collaboration-between-data-engineers-data-analysts-and-data-scientists-97c00ab1211f>
 which give an overview on how we work with DS.


You should be able to wrap apache-beam/dataflow code to have the same syntax as 
sklearn. Then, DS will be able to be autonomous with the scalability and 
without to know the complexity of the cluster-computing framework.

Hope this helps.

Germain.

From: Massy Bourennani <massybourenn...@gmail.com>
Reply-To: "user@beam.apache.org" <user@beam.apache.org>
Date: Tuesday 16 July 2019 at 10:49
To: "user@beam.apache.org" <user@beam.apache.org>
Subject: Industrializing batch ML algorithm using Apache Beam/Dataflow (on 
Google Cloud Platform)

Hi all,
Here is the link to the Reddit post[1]
Many thanks for your help.
Massy

[1] 
https://www.reddit.com/r/dataengineering/comments/cdp5i3/industrializing_batch_ml_algorithm_using_apache/<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.reddit.com%2Fr%2Fdataengineering%2Fcomments%2Fcdp5i3%2Findustrializing_batch_ml_algorithm_using_apache%2F&data=02%7C01%7Cgermain.tanguy%40dailymotion.com%7C5fade1523efd48ce9add08d709ca7bbb%7C37530da3f7a748f4ba462dc336d55387%7C0%7C1%7C636988637568962821&sdata=jGht7l2BuvLMzGCn42l3M1opsKIz%2FHZuPTMQg%2FllDoM%3D&reserved=0>

Reply via email to