Hi,
We are using Big Query for our querying needs.
We are also looking to use Dataflow with some of the statistical libraries.
We are using R libraries to build these statistical models.

We are looking to run our data through the statistical models such as ELM ,
GAM, ARIMA etc. We see that python doesn't have all these libraries which
we get as Cran packages in R.

We have seen this example where there is a possibility to run R on data
flow.


https://medium.com/google-cloud/cloud-dataflow-can-autoscale-r-programs-for-
massively-parallel-data-processing-492b57bd732d
https://github.com/gregmcinnes/incubator-beam/blob/python-sdk/sdks/python/
apache_beam/examples/complete/wordlength/wordlength_R/wordlength_R.py

If we are able to use parallelization provided by Dataflow along with R
libraries this would be a great for us as a team and also the whole Data
science community which relies on R Packages.

We would need some help from the Beam to achieve this.

I see that it will be a very good use case for the whole of data science
community that will enable usage of both Python and R on Beam and Dataflow.

Regards,
Anant

Reply via email to