Re: How to Map and Reduce in sparkR

Shivaram Venkataraman Thu, 25 Jun 2015 15:21:17 -0700

We don't support UDFs on DataFrames in SparkR in the 1.4 release. The
existing functionality can be seen as a pre-processing step which you can
do and then collect data back to the driver to do more complex processing.
Along with the RDD API ticket, we are also working on UDF support. You can
see the Spark summit talk slides from last week for a bigger picture view
http://www.slideshare.net/SparkSummit/07-venkataraman-sun


Thanks
Shivaram

On Thu, Jun 25, 2015 at 3:08 PM, Wei Zhou <zhweisop...@gmail.com> wrote:

> Hi Shivaram/Alek,
>
> I understand that a better way to import data is to DataFrame rather than
> RDD. If one wants to do a map-like transformation for such row in sparkR,
> one could use sparkR:::lapply(), but is there a counterpart row operation
> on DataFrame? The use case I am working on requires complicated row level
> pre-processing and then goes to the actually modeling.
>
> Thanks.
>
> Best,
> Wei
>
> 2015-06-25 9:25 GMT-07:00 Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu>:
>
>> In addition to Aleksander's point please let us know what use case would
>> use RDD-like API in https://issues.apache.org/jira/browse/SPARK-7264 --
>> We are hoping to have a version of this API in upcoming releases.
>>
>> Thanks
>> Shivaram
>>
>> On Thu, Jun 25, 2015 at 6:02 AM, Eskilson,Aleksander <
>> alek.eskil...@cerner.com> wrote:
>>
>>>  The  simple answer is that SparkR does support map/reduce operations
>>> over RDD’s through the RDD API, but since Spark v 1.4.0, those functions
>>> were made private in SparkR. They can still be accessed by prepending the
>>> function with the namespace, like SparkR:::lapply(rdd, func). It was
>>> thought though that many of the functions in the RDD API were too low level
>>> to expose, with much more of the focus going into the DataFrame API. The
>>> original rationale for this decision can be found in its JIRA [1]. The devs
>>> are still deciding which functions of the RDD API, if any, should be made
>>> public for future releases. If you feel some use cases are most easily
>>> handled in SparkR through RDD functions, go ahead and let the dev email
>>> list know.
>>>
>>>  Alek
>>> [1] -- https://issues.apache.org/jira/browse/SPARK-7230
>>>
>>>   From: Wei Zhou <zhweisop...@gmail.com>
>>> Date: Wednesday, June 24, 2015 at 4:59 PM
>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: How to Map and Reduce in sparkR
>>>
>>>   Anyone knows whether sparkR supports map and reduce operations as the
>>> RDD transformations? Thanks in advance.
>>>
>>>  Best,
>>> Wei
>>>    CONFIDENTIALITY NOTICE This message and any included attachments are
>>> from Cerner Corporation and are intended only for the addressee. The
>>> information contained in this message is confidential and may constitute
>>> inside or non-public information under international, federal, or state
>>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>>> or use of such information is strictly prohibited and may be unlawful. If
>>> you are not the addressee, please promptly delete this message and notify
>>> the sender of the delivery error by e-mail or you may call Cerner's
>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>>
>>
>>
>

Re: How to Map and Reduce in sparkR

Reply via email to