Hi Steve, I am trying to read it from S3n://"bucket" and already included aws-java-sdk 1.7.4 in my classpath . My machine is AWS EMR with HAdoop 2.7.2 and Spark 1.6.1 installed . As per the below post its shows that issue with EMR Hadoop2.7.2 http://stackoverflow.com/questions/30385981/how-to-access-s3a-files-from-apache-spark Is it really the issue ? Could somebody help me validate the above ?
Thanks, Divya On 1 September 2016 at 16:59, Steve Loughran <ste...@hortonworks.com> wrote: > > On 1 Sep 2016, at 03:45, Divya Gehlot <divya.htco...@gmail.com> wrote: > > Hi, > I am using Spark 1.6.1 in EMR machine > I am trying to read s3 buckets in my Spark job . > When I read it through Spark shell I am able to read it ,but when I try to > package the job and and run it as spark submit I am getting below error > > 16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for > [TERM, HUP, INT] > >> 16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId: >> appattempt_1468570153734_2851_000001 >> Exception in thread "main" java.util.ServiceConfigurationError: >> org.apache.hadoop.fs.FileSystem: Provider >> org.apache.hadoop.fs.s3a.S3AFileSystem >> could not be instantiated >> at java.util.ServiceLoader.fail(ServiceLoader.java:224) >> at java.util.ServiceLoader.access$100(ServiceLoader.java:181) >> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377) >> > I have already included > > "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15", > > in my build.sbt > > > Assuming you are using a released version of Hadoop 2.6 or 2.7 underneath > spark, you will need to make sure your classpath has aws-java-sdk 1.7.4 on > your CP. You can't just drop in a new JAR as it is incompatible at the API > level ( https://issues.apache.org/jira/browse/HADOOP-12269 ) > > > <dependency> > <groupId>com.amazonaws</groupId> > <artifactId>aws-java-sdk</artifactId> > <version>1.7.4</version> > <scope>compile</scope> > </dependency> > > > and jackson artifacts databind and annotations in sync with the rest of > your app > > > <dependency> > <groupId>com.fasterxml.jackson.core</groupId> > <artifactId>jackson-databind</artifactId> > </dependency> > <dependency> > <groupId>com.fasterxml.jackson.core</groupId> > <artifactId>jackson-annotations</artifactId> > </dependency> > > > I tried the provinding the access key also in my job still the same error > persists. > > when I googled it I if you have IAM role created there is no need to > provide access key . > > > > You don't get IAM support until Hadoop 2.8 ships. sorry. Needed a fair > amount of reworking of how S3A does authentication. > > Note that if you launch spark jobs with the AWS environment variables set, > these will be automatically picked up and used to set the relevant > properties in the configuration. >