The API exported in the 1.4 release is different from the one used in the 2014 demo. Please see the latest documentation at http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html or Chris's demo from Spark Summit at https://spark-summit.org/2015/events/a-data-frame-abstraction-layer-for-sparkr/
Thanks Shivaram On Tue, Jun 30, 2015 at 7:40 AM, Nicholas Sharkey <nicholasshar...@gmail.com > wrote: > Good morning Sivaram, > > I believe I have our setup close but I'm getting an error on the last step > of the word count example from the Spark Summit > <https://spark-summit.org/2014/wp-content/uploads/2014/07/SparkR-SparkSummit.pdf> > . > > Off the top of your head can you think of where this error (below, and > attached) is coming from? I can get into the details of how I setup this > machine if needed, but wanted to keep the initial question short. > > Thanks. > > *Begin Code* > > *library(SparkR)* > > *# sc <- sparkR.init("local[2]")* > *sc <- > sparkR.init("http://ec2-54-171-173-195.eu-west-1.compute.amazonaws.com:[2]")* > > *lines <- textFile(sc, "mytextfile.txt") # hi hi all all all one one one > one* > > *words <- flatMap(lines,* > * function(line){* > * strsplit(line, " ")[[1]]* > * })* > > *wordcount <- lapply(words,* > * function(word){* > * list(word, 1)* > * })* > > *counts <- reduceByKey(wordcount, "+", numPartitions=2)* > > *# Error in (function (classes, fdef, mtable) : * > *# unable to find an inherited method for function > ‘reduceByKey’ for signature ‘"PipelinedRDD", "character", "numeric"’* > > > *End Code * > > On Fri, Jun 26, 2015 at 7:04 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> My workflow as to install RStudio on a cluster launched using Spark EC2 >> scripts. However I did a bunch of tweaking after that (like copying the >> spark installation over etc.). When I get some time I'll try to write the >> steps down in the JIRA. >> >> Thanks >> Shivaram >> >> >> On Fri, Jun 26, 2015 at 10:21 AM, <m...@redoakstrategic.com> wrote: >> >>> So you created an EC2 instance with RStudio installed first, then >>> installed Spark under that same username? That makes sense, I just want to >>> verify your work flow. >>> >>> Thank you again for your willingness to help! >>> >>> >>> >>> On Fri, Jun 26, 2015 at 10:13 AM -0700, "Shivaram Venkataraman" < >>> shiva...@eecs.berkeley.edu> wrote: >>> >>> I was using RStudio on the master node of the same cluster in the >>>> demo. However I had installed Spark under the user `rstudio` (i.e. >>>> /home/rstudio) and that will make the permissions work correctly. You will >>>> need to copy the config files from /root/spark/conf after installing Spark >>>> though and it might need some more manual tweaks. >>>> >>>> Thanks >>>> Shivaram >>>> >>>> On Fri, Jun 26, 2015 at 9:59 AM, Mark Stephenson < >>>> m...@redoakstrategic.com> wrote: >>>> >>>>> Thanks! >>>>> >>>>> In your demo video, were you using RStudio to hit a separate EC2 Spark >>>>> cluster? I noticed that it appeared your browser that you were using EC2 >>>>> at that time, so I was just curious. It appears that might be one of the >>>>> possible workarounds - fire up a separate EC2 instance with RStudio Server >>>>> that initializes the spark context against a separate Spark cluster. >>>>> >>>>> On Jun 26, 2015, at 11:46 AM, Shivaram Venkataraman < >>>>> shiva...@eecs.berkeley.edu> wrote: >>>>> >>>>> We don't have a documented way to use RStudio on EC2 right now. We >>>>> have a ticket open at https://issues.apache.org/jira/browse/SPARK-8596 >>>>> to discuss work-arounds and potential solutions for this. >>>>> >>>>> Thanks >>>>> Shivaram >>>>> >>>>> On Fri, Jun 26, 2015 at 6:27 AM, RedOakMark <m...@redoakstrategic.com> >>>>> wrote: >>>>> >>>>>> Good morning, >>>>>> >>>>>> I am having a bit of trouble finalizing the installation and usage of >>>>>> the >>>>>> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and >>>>>> using >>>>>> RStudio to run on top of it. >>>>>> >>>>>> Using these instructions ( >>>>>> http://spark.apache.org/docs/latest/ec2-scripts.html >>>>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> ) we can >>>>>> fire up an >>>>>> EC2 instance (which we have been successful doing - we have gotten the >>>>>> cluster to launch from the command line without an issue). Then, I >>>>>> installed RStudio Server on the same EC2 instance (the master) and >>>>>> successfully logged into it (using the test/test user) through the web >>>>>> browser. >>>>>> >>>>>> This is where I get stuck - within RStudio, when I try to >>>>>> reference/find the >>>>>> folder that SparkR was installed, to load the SparkR library and >>>>>> initialize >>>>>> a SparkContext, I get permissions errors on the folders, or the >>>>>> library >>>>>> cannot be found because I cannot find the folder in which the library >>>>>> is >>>>>> sitting. >>>>>> >>>>>> Has anyone successfully launched and utilized SparkR 1.4.0 in this >>>>>> way, with >>>>>> RStudio Server running on top of the master instance? Are we on the >>>>>> right >>>>>> track, or should we manually launch a cluster and attempt to connect >>>>>> to it >>>>>> from another instance running R? >>>>>> >>>>>> Thank you in advance! >>>>>> >>>>>> Mark >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506.html >>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>>> >>>> >> >