Re: Support R in Spark

Shivaram Venkataraman Thu, 18 Sep 2014 09:59:54 -0700

As R is single-threaded, SparkR launches one R process per-executor on
the worker side.


Thanks
Shivaram

On Thu, Sep 18, 2014 at 7:49 AM, oppokui <oppo...@gmail.com> wrote:
> Shivaram,
>
> As I know, SparkR used rJava package. In work node, spark code will execute R 
> code by launching R process and send/receive byte array.
> I have a question on when to launch R process. R process is per Work process, 
> or per executor thread, or per each RDD processing?
>
> Thanks and Regards.
>
> Kui
>
>> On Sep 6, 2014, at 5:53 PM, oppokui <oppo...@gmail.com> wrote:
>>
>> Cool! It is a very good news. Can’t wait for it.
>>
>> Kui
>>
>>> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman 
>>> <shiva...@eecs.berkeley.edu> wrote:
>>>
>>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>>> things we are working on. One of the main features is to expose a data
>>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>>> be integrating this with Spark's MLLib.  At a high-level this will
>>> allow R users to use a familiar API but make use of MLLib's efficient
>>> distributed implementation. This is the same strategy used in Python
>>> as well.
>>>
>>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>>> features to complete before that and plan to shoot for integration by
>>> Spark 1.3.
>>>
>>> Thanks
>>> Shivaram
>>>
>>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <oppo...@gmail.com> wrote:
>>>> Thanks, Shivaram.
>>>>
>>>> No specific use case yet. We try to use R in our project as data scientest
>>>> are all knowing R. We had a concern that how R handles the mass data. Spark
>>>> does a better work on big data area, and Spark ML is focusing on predictive
>>>> analysis area. Then we are thinking whether we can merge R and Spark
>>>> together. We tried SparkR and it is pretty easy to use. But we didn’t see
>>>> any feedback on this package in industry. It will be better if Spark team
>>>> has R support just like scala/Java/Python.
>>>>
>>>> Another question is that MLlib will re-implement all famous data mining
>>>> algorithms in Spark, then what is the purpose of using R?
>>>>
>>>> There is another technique for us H2O which support R natively. H2O is more
>>>> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>>>> Water).  It is better than using SparkR?
>>>>
>>>> Thanks and Regards.
>>>>
>>>> Kui
>>>>
>>>>
>>>> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>>>> <shiva...@eecs.berkeley.edu> wrote:
>>>>
>>>> Hi
>>>>
>>>> Do you have a specific use-case where SparkR doesn't work well ? We'd love
>>>> to hear more about use-cases and features that can be improved with SparkR.
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>>
>>>> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <oppo...@gmail.com> wrote:
>>>>>
>>>>> Does spark ML team have plan to support R script natively? There is a
>>>>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>>>>> with native fortran routines or use NumPy, why not try to use R in some
>>>>> sense.
>>>>>
>>>>> R had lot of useful packages. If spark ML team can include R support, it
>>>>> will be a very powerful.
>>>>>
>>>>> Any comment?
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>
>>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Support R in Spark

Reply via email to