Hi,

I have been working on a POC on some time series related stuff, i'm using
python since i need spark streaming and sparkR is yet to have a spark
streaming front end,  couple of algorithms i want to use are not yet
present in Spark-TS package, so I'm thinking of invoking a external R
script for the Algorithm part & pass the data from Spark to the R script
via pipeRdd,


What i wanted to understand is can something like this be used in a
production deployment,  since passing the data via R script would mean lot
of serializing and would actually not use the power of spark for parallel
execution,

Has anyone used this kind of workaround  Spark -> pipeRdd-> R script.


Thanks,
Sujeet

Reply via email to