Hi, I am on EMR 4.7 with Spark 1.6.1 I am trying to read from s3n buckets in spark Option 1 : If I set up
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3.S3FileSystem") hadoopConf.set("fs.s3.awsSecretAccessKey", sys.env("AWS_SECRET_ACCESS_KEY")) hadoopConf.set("fs.s3.awsAccessKeyId", sys.env("AWS_ACCESS_KEY_ID")) and access the bucket as s3://bucket-name I am getting below error Exception in thread "main" java.io.IOException: /batch-1473134100000 doesn't exist at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy37.retrieveINode(Unknown Source) at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:252) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1730) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:231) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:281) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) Option 2: If I set up hadoopConf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") hadoopConf.set("fs.s3n.awsSecretAccessKey", sys.env("AWS_SECRET_ACCESS_KEY")) hadoopConf.set("fs.s3n.awsAccessKeyId", sys.env("AWS_ACCESS_KEY_ID")) and try to access the bucket as s3n://bucket-name getting the below error : Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: s3n://bucket-name/batch-1473137100000_$folder$ at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:449) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) When I try to list the bucket using aws cli aws s3 ls s3://bucket-name/ It gives me the bucket listing. Would really appreciate the help. Thanks, Divya