Hi Kui,

DDF (open sourced) also aims to do something similar, adding RDBMS idioms,
and is already implemented on top of Spark.

One philosophy is that the DDF API aggressively hides the notion of
parallel datasets, exposing only (mutable) tables to users, on which they
can apply R and other familiar data mining/machine learning idioms, without
having to know about the distributed representation underneath. Now, you
can get to the underlying RDDs if you want to, simply by asking for it.

This was launched at the July Spark Summit. See
http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us
.

Sent while mobile. Please excuse typos etc.
On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman" <shiva...@eecs.berkeley.edu>
wrote:

> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
> things we are working on. One of the main features is to expose a data
> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
> be integrating this with Spark's MLLib.  At a high-level this will
> allow R users to use a familiar API but make use of MLLib's efficient
> distributed implementation. This is the same strategy used in Python
> as well.
>
> Also we do hope to merge SparkR with mainline Spark -- we have a few
> features to complete before that and plan to shoot for integration by
> Spark 1.3.
>
> Thanks
> Shivaram
>
> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <oppo...@gmail.com> wrote:
> > Thanks, Shivaram.
> >
> > No specific use case yet. We try to use R in our project as data
> scientest
> > are all knowing R. We had a concern that how R handles the mass data.
> Spark
> > does a better work on big data area, and Spark ML is focusing on
> predictive
> > analysis area. Then we are thinking whether we can merge R and Spark
> > together. We tried SparkR and it is pretty easy to use. But we didn’t see
> > any feedback on this package in industry. It will be better if Spark team
> > has R support just like scala/Java/Python.
> >
> > Another question is that MLlib will re-implement all famous data mining
> > algorithms in Spark, then what is the purpose of using R?
> >
> > There is another technique for us H2O which support R natively. H2O is
> more
> > friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> > Water).  It is better than using SparkR?
> >
> > Thanks and Regards.
> >
> > Kui
> >
> >
> > On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
> > <shiva...@eecs.berkeley.edu> wrote:
> >
> > Hi
> >
> > Do you have a specific use-case where SparkR doesn't work well ? We'd
> love
> > to hear more about use-cases and features that can be improved with
> SparkR.
> >
> > Thanks
> > Shivaram
> >
> >
> > On Wed, Sep 3, 2014 at 3:19 AM, oppokui <oppo...@gmail.com> wrote:
> >>
> >> Does spark ML team have plan to support R script natively? There is a
> >> SparkR project, but not from spark team. Spark ML used netlib-java to
> talk
> >> with native fortran routines or use NumPy, why not try to use R in some
> >> sense.
> >>
> >> R had lot of useful packages. If spark ML team can include R support, it
> >> will be a very powerful.
> >>
> >> Any comment?
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to