Hi, I have been working on a POC on some time series related stuff, i'm using python since i need spark streaming and sparkR is yet to have a spark streaming front end, couple of algorithms i want to use are not yet present in Spark-TS package, so I'm thinking of invoking a external R script for the Algorithm part & pass the data from Spark to the R script via pipeRdd,
What i wanted to understand is can something like this be used in a production deployment, since passing the data via R script would mean lot of serializing and would actually not use the power of spark for parallel execution, Has anyone used this kind of workaround Spark -> pipeRdd-> R script. Thanks, Sujeet