RE: hadoopConfiguration for StreamingContext

Andrew Lee Tue, 10 Feb 2015 09:01:02 -0800

It looks like this is related to the underlying Hadoop configuration.
Try to deploy the Hadoop configuration with your job with --files and 
--driver-class-path, or to the default /etc/hadoop/conf core-site.xml.
If that is not an option (depending on how your Hadoop cluster is setup), then 
hard code the value vie -Dkey=value to see if it works. The downside is your 
credentials are exposed in plaintext in the java commands.
or by defining it in spark-defaults.conf property 
"spark.executor.extraJavaOptions"
e.g.s3n








spark.executor.extraJavaOptions "-Dfs.s3n.awsAccessKeyId=XXXXX 
-Dfs.s3n.awsSecretAccessKey=YYYY"
s3spark.executor.extraJavaOptions "-Dfs.s3.awsAccessKeyId=XXXXX 
-Dfs.s3.awsSecretAccessKey=YYYY"
Hope this works. Or embed them in the s3n path. Not good security practice 
though.

From: mslimo...@gmail.com
Date: Tue, 10 Feb 2015 10:57:47 -0500
Subject: Re: hadoopConfiguration for StreamingContext
To: ak...@sigmoidanalytics.com
CC: u...@spark.incubator.apache.org

Thanks, Akhil.  I had high hopes for #2, but tried all and no luck.  
I was looking at the source and found something interesting.  The Stack Trace 
(below) directs me to FileInputDStream.scala (line 141).  This is version 
1.1.1, btw.  Line 141 has:
  private def fs: FileSystem = {
    if (fs_ == null) fs_ = directoryPath.getFileSystem(new Configuration())
    fs_
  }
So it looks to me like it doesn't make any attempt to use a configured 
HadoopConf.
Here is the StackTrace:








java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key 
must be specified as the username or password (respectively) of a s3n URL, or 
by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties 
(respectively).
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.fs.s3native.$Proxy5.initialize(Unknown Source)
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:216)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at 
org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$fs(FileInputDStream.scala:141)
at 
org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107)
at 
org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75)
...




















On Tue, Feb 10, 2015 at 10:28 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
Try the following:
1. Set the access key and secret key in the sparkContext:
ssc.sparkContext.hadoopConfiguration.set("AWS_ACCESS_KEY_ID",yourAccessKey)
ssc.sparkContext.hadoopConfiguration.set("AWS_SECRET_ACCESS_KEY",yourSecretKey)

2. Set the access key and secret key in the environment before startingyour 
application:
export AWS_ACCESS_KEY_ID=<your access>
export AWS_SECRET_ACCESS_KEY=<your secret>

3. Set the access key and secret key inside the hadoop configurations
val 
hadoopConf=ssc.sparkContext.hadoopConfiguration;hadoopConf.set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey)hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey)
4. You can also try:
val stream = 
ssc.textFileStream("s3n://yourAccessKey:yourSecretKey@<yourBucket>/path/")ThanksBest
 Regards

On Tue, Feb 10, 2015 at 8:27 PM, Marc Limotte <mslimo...@gmail.com> wrote:
I see that StreamingContext has a hadoopConfiguration() method, which can be 
used like this sample I found:
 sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX");
sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX");
But StreamingContext doesn't have the same thing.  I want to use a 
StreamingContext with s3n: text file input, but can't find a way to set the AWS 
credentials.  I also tried (with no success):
adding the properties to 
conf/spark-defaults.conf$HADOOP_HOME/conf/hdfs-site.xmlENV variablesEmbedded as 
user:password in s3n://user:password@... (w/ url encoding)Setting the conf as 
above on a new SparkContext and passing that the StreamingContext constructor: 
StreamingContext(sparkContext: SparkContext, batchDuration: Duration)Can 
someone point me in the right direction for setting AWS creds (hadoop conf 
options) for streamingcontext?
thanks,Marc LimotteClimate Corporation

RE: hadoopConfiguration for StreamingContext

Reply via email to