Hi All,
I have some code to access s3 from Spark. The code is as simple as:
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
Configuration hadoopConf = ctx.hadoopConfiguration();
// aws.secretKey=Zqhjim3GB69hMBvfjh+7NX84p8sMF39BHfXwO3Hs
// aws.accessKey=AKIAI4YXBAJTJ77VKS4A
hadoopConf.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConf.set("fs.s3n.awsAccessKeyId",
"-----------------------");
hadoopConf.set("fs.s3n.awsSecretAccessKey",
"------------------------------");
SQLContext sql = new SQLContext(ctx);
DataFrame grid_lookup =
sql.parquetFile("s3n://-------------------");
grid_lookup.count();
ctx.stop();
The code works for 1.3.1. And for 1.4.0 and latest 1.5.0, it always give me
below exception:
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access
Key ID and Secret Access Key must be specified as the username or password
(respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or
fs.s3.awsSecretAccessKey properties (respectively).
I don't know why, I remember this is a known issue in 1.3.0:
https://issues.apache.org/jira/browse/SPARK-6330, and solved in 1.3.1
But now it is not working again for a newer version?
I remember while I switched to 1.4.0, for a while it works (while I worked
with the master branch of the latest source code), and I just refresh latest
code, and I am given this error again.
Anyone has idea?
Regards,
Shuai