Re: Spark 1.4.0 - Using SparkR on EC2 Instance

Shivaram Venkataraman Tue, 30 Jun 2015 08:57:54 -0700

The API exported in the 1.4 release is different from the one used in the
2014 demo. Please see the latest documentation at
http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html or
Chris's demo from Spark Summit at
https://spark-summit.org/2015/events/a-data-frame-abstraction-layer-for-sparkr/


Thanks
Shivaram

On Tue, Jun 30, 2015 at 7:40 AM, Nicholas Sharkey <nicholasshar...@gmail.com
> wrote:

> Good morning Sivaram,
>
> I believe I have our setup close but I'm getting an error on the last step
> of the word count example from the Spark Summit
> <https://spark-summit.org/2014/wp-content/uploads/2014/07/SparkR-SparkSummit.pdf>
> .
>
> Off the top of your head can you think of where this error (below, and
> attached) is coming from? I can get into the details of how I setup this
> machine if needed, but wanted to keep the initial question short.
>
> Thanks.
>
> *Begin Code*
>
> *library(SparkR)*
>
> *# sc <- sparkR.init("local[2]")*
> *sc <-
> sparkR.init("http://ec2-54-171-173-195.eu-west-1.compute.amazonaws.com:[2]";)*
>
> *lines <- textFile(sc, "mytextfile.txt") # hi hi all all all one one one
> one*
>
> *words <- flatMap(lines,*
> *                 function(line){*
> *                   strsplit(line, " ")[[1]]*
> *                 })*
>
> *wordcount <- lapply(words,*
> *                    function(word){*
> *                      list(word, 1)*
> *                    })*
>
> *counts <- reduceByKey(wordcount, "+", numPartitions=2)*
>
> *# Error in (function (classes, fdef, mtable)  : *
> *#             unable to find an inherited method for function
> ‘reduceByKey’ for signature ‘"PipelinedRDD", "character", "numeric"’*
>
>
> *End Code *
>
> On Fri, Jun 26, 2015 at 7:04 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> My workflow as to install RStudio on a cluster launched using Spark EC2
>> scripts. However I did a bunch of tweaking after that (like copying the
>> spark installation over etc.). When I get some time I'll try to write the
>> steps down in the JIRA.
>>
>> Thanks
>> Shivaram
>>
>>
>> On Fri, Jun 26, 2015 at 10:21 AM, <m...@redoakstrategic.com> wrote:
>>
>>> So you created an EC2 instance with RStudio installed first, then
>>> installed Spark under that same username?  That makes sense, I just want to
>>> verify your work flow.
>>>
>>> Thank you again for your willingness to help!
>>>
>>>
>>>
>>> On Fri, Jun 26, 2015 at 10:13 AM -0700, "Shivaram Venkataraman" <
>>> shiva...@eecs.berkeley.edu> wrote:
>>>
>>>  I was using RStudio on the master node of the same cluster in the
>>>> demo. However I had installed Spark under the user `rstudio` (i.e.
>>>> /home/rstudio) and that will make the permissions work correctly. You will
>>>> need to copy the config files from /root/spark/conf after installing Spark
>>>> though and it might need some more manual tweaks.
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>> On Fri, Jun 26, 2015 at 9:59 AM, Mark Stephenson <
>>>> m...@redoakstrategic.com> wrote:
>>>>
>>>>> Thanks!
>>>>>
>>>>> In your demo video, were you using RStudio to hit a separate EC2 Spark
>>>>> cluster?  I noticed that it appeared your browser that you were using EC2
>>>>> at that time, so I was just curious.  It appears that might be one of the
>>>>> possible workarounds - fire up a separate EC2 instance with RStudio Server
>>>>> that initializes the spark context against a separate Spark cluster.
>>>>>
>>>>> On Jun 26, 2015, at 11:46 AM, Shivaram Venkataraman <
>>>>> shiva...@eecs.berkeley.edu> wrote:
>>>>>
>>>>> We don't have a documented way to use RStudio on EC2 right now. We
>>>>> have a ticket open at https://issues.apache.org/jira/browse/SPARK-8596
>>>>> to discuss work-arounds and potential solutions for this.
>>>>>
>>>>> Thanks
>>>>> Shivaram
>>>>>
>>>>> On Fri, Jun 26, 2015 at 6:27 AM, RedOakMark <m...@redoakstrategic.com>
>>>>> wrote:
>>>>>
>>>>>> Good morning,
>>>>>>
>>>>>> I am having a bit of trouble finalizing the installation and usage of
>>>>>> the
>>>>>> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and
>>>>>> using
>>>>>> RStudio to run on top of it.
>>>>>>
>>>>>> Using these instructions (
>>>>>> http://spark.apache.org/docs/latest/ec2-scripts.html
>>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html>  ) we can
>>>>>> fire up an
>>>>>> EC2 instance (which we have been successful doing - we have gotten the
>>>>>> cluster to launch from the command line without an issue).  Then, I
>>>>>> installed RStudio Server on the same EC2 instance (the master) and
>>>>>> successfully logged into it (using the test/test user) through the web
>>>>>> browser.
>>>>>>
>>>>>> This is where I get stuck - within RStudio, when I try to
>>>>>> reference/find the
>>>>>> folder that SparkR was installed, to load the SparkR library and
>>>>>> initialize
>>>>>> a SparkContext, I get permissions errors on the folders, or the
>>>>>> library
>>>>>> cannot be found because I cannot find the folder in which the library
>>>>>> is
>>>>>> sitting.
>>>>>>
>>>>>> Has anyone successfully launched and utilized SparkR 1.4.0 in this
>>>>>> way, with
>>>>>> RStudio Server running on top of the master instance?  Are we on the
>>>>>> right
>>>>>> track, or should we manually launch a cluster and attempt to connect
>>>>>> to it
>>>>>> from another instance running R?
>>>>>>
>>>>>> Thank you in advance!
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>

Re: Spark 1.4.0 - Using SparkR on EC2 Instance

Reply via email to