Re: Creating a SparkR standalone job

Shivaram Venkataraman Mon, 07 Apr 2014 15:40:32 -0700

You can create standalone jobs in SparkR as just R files that are run using
the sparkR script. These commands will be sent to a Spark cluster and the
examples on the SparkR repository (
https://github.com/amplab-extras/SparkR-pkg#examples-unit-tests) are in
fact standalone jobs.

However I don't think that will completely solve your use case of using
Streaming + R. We don't yet have a way to call R functions from Spark's
Java or Scala API. So right now one thing you can try is to save data from
SparkStreaming to HDFS and then run a SparkR job which reads in the file.

Regarding the other idea of calling R from Scala -- it might be possible to
do that in your code if the classpath etc. is setup correctly. I haven't
tried it out though, but do let us know if you get it to work.

Thanks
Shivaram

On Mon, Apr 7, 2014 at 2:21 PM, pawan kumar <pkv...@gmail.com> wrote:

> Hi,
>
> Is it possible to create a standalone job in scala using sparkR? If
> possible can you provide me with the information of the setup process.
> (Like the dependencies in SBT and where to include the JAR files)
>
> This is my use-case:
>
> 1. I have a Spark Streaming standalone Job running in local machine which
> streams twitter data.
> 2. I have an R script which performs Sentiment Analysis.
>
> I am looking for an optimal way where I could combine these two operations
> into a single job and run using "SBT Run" command.
>
> I came across this document which talks about embedding R into scala (
> http://dahl.byu.edu/software/jvmr/dahl-payne-uppalapati-2013.pdf) but was
> not sure if that would work well within the spark context.
>
> Thanks,
> Pawan Venugopal
>
>

Re: Creating a SparkR standalone job

Reply via email to