I had a look at the new R "on Spark" API / Feature in Spark 1.4.0

For those "skilled in the art" (of R and distributed computing) it will be
immediately clear that "ON" is a marketing ploy and what it actually is is
"TO" ie Spark 1.4.0 offers INTERFACE from R TO DATA stored in Spark in
distributed fashion and some distributed queries which can be initiated FROM
R and run on that data within Spark - these are essentially certain types of
SQL style queries 

In order to deserve the "ON" label, RSpark has to be able to run ON Spark
most of the Statistical Analysis and Machine Learning Algos as found in the
R engine. This is absolutely not the case at the moment.

As an example of what type of Solution/Architecture I am referring to you
can review Revolution Analytics (recently acquired by Microsoft) and some
other open source frameworks for running R ON distributed clusters 

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/R-on-spark-tp23512.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to