I am also having exactly the same problem, calling using pyspark. Has anyone managed to get this script to work?
-- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Wed, Jul 16, 2014 at 2:10 PM, Ian Wilkinson <ia...@me.com> wrote: > Hi, > > I’m trying to run the Spark (1.0.0) shell on EMR and encountering a > classpath issue. > I suspect I’m missing something gloriously obviously, but so far it is > eluding me. > > I launch the EMR Cluster (using the aws cli) with: > > aws emr create-cluster --name "Test Cluster" \ > --ami-version 3.0.3 \ > --no-auto-terminate \ > --ec2-attributes KeyName=<...> \ > --bootstrap-actions > Path=s3://elasticmapreduce/samples/spark/1.0.0/install-spark-shark.rb \ > --instance-groups > InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium \ > InstanceGroupType=CORE,InstanceCount=1,InstanceType=m1.medium > --region eu-west-1 > > then, > > $ aws emr ssh --cluster-id <...> --key-pair-file <...> --region eu-west-1 > > On the master node, I then launch the shell with: > > [hadoop@ip-... spark]$ ./bin/spark-shell > > and try performing: > > scala> val logs = sc.textFile("s3n://.../“) > > this produces: > > 14/07/16 12:40:35 WARN storage.BlockManager: Putting block broadcast_0 > failed > java.lang.NoSuchMethodError: > com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; > > > Any help mighty welcome, > ian > >