Hey, Please recheck your access key and secret key being used to fetch the parquet file. It seems to be a credential error. Either mismatch/load. If load, then first use it directly in code and see if the issue resolves, then it can be hidden and read from Input Params.
Thanks, Aakash. On 23-Feb-2017 11:54 PM, "Benjamin Kim" <bbuil...@gmail.com> wrote: We are trying to use Spark 1.6 within CDH 5.7.1 to retrieve a 1.3GB Parquet file from AWS S3. We can read the schema and show some data when the file is loaded into a DataFrame, but when we try to do some operations, such as count, we get this error below. com.cloudera.com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain at com.cloudera.com.amazonaws.auth.AWSCredentialsProviderChain. getCredentials(AWSCredentialsProviderChain.java:117) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client. invoke(AmazonS3Client.java:3779) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client. headBucket(AmazonS3Client.java:1107) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client. doesBucketExist(AmazonS3Client.java:1070) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize( S3AFileSystem.java:239) at org.apache.hadoop.fs.FileSystem.createFileSystem( FileSystem.java:2711) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:97) at org.apache.hadoop.fs.FileSystem$Cache.getInternal( FileSystem.java:2748) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2730) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at parquet.hadoop.ParquetFileReader.readFooter( ParquetFileReader.java:385) at parquet.hadoop.ParquetRecordReader.initializeInternalReader( ParquetRecordReader.java:162) at parquet.hadoop.ParquetRecordReader.initialize( ParquetRecordReader.java:145) at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init> (SqlNewHadoopRDD.scala:180) at org.apache.spark.rdd.SqlNewHadoopRDD.compute( SqlNewHadoopRDD.scala:126) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute( MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute( MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask( ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask( ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run( Executor.scala:229) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Can anyone help? Cheers, Ben --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org