I had a look at the new R "on Spark" API / Feature in Spark 1.4.0
For those "skilled in the art" (of R and distributed computing) it will be immediately clear that "ON" is a marketing ploy and what it actually is is "TO" ie Spark 1.4.0 offers INTERFACE from R TO DATA stored in Spark in distributed fashion and some distributed queries which can be initiated FROM R and run on that data within Spark - these are essentially certain types of SQL style queries In order to deserve the "ON" label, RSpark has to be able to run ON Spark most of the Statistical Analysis and Machine Learning Algos as found in the R engine. This is absolutely not the case at the moment. As an example of what type of Solution/Architecture I am referring to you can review Revolution Analytics (recently acquired by Microsoft) and some other open source frameworks for running R ON distributed clusters -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/R-on-spark-tp23512.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org