Re: hadoopConfiguration for StreamingContext

Marc Limotte Tue, 10 Feb 2015 07:59:10 -0800

Thanks, Akhil.  I had high hopes for #2, but tried all and no luck.

I was looking at the source and found something interesting.  The Stack
Trace (below) directs me to FileInputDStream.scala (line 141).  This is
version 1.1.1, btw.  Line 141 has:


  private def fs: FileSystem = {
>     if (fs_ == null) fs_ = directoryPath.getFileSystem(new Configuration())
>     fs_
>   }


So it looks to me like it doesn't make any attempt to use a configured
HadoopConf.

Here is the StackTrace:

java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
> must be specified as the username or password (respectively) of a s3n URL,
> or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
> properties (respectively).
> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at org.apache.hadoop.fs.s3native.$Proxy5.initialize(Unknown Source)
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:216)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
> at org.apache.spark.streaming.dstream.FileInputDStream.org
> $apache$spark$streaming$dstream$FileInputDStream$$fs(FileInputDStream.scala:141)
> at
> org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107)
> at
> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75)
> ...



On Tue, Feb 10, 2015 at 10:28 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Try the following:
>
> 1. Set the access key and secret key in the sparkContext:
>
> ssc.sparkContext.hadoopConfiguration.set("AWS_ACCESS_KEY_ID",yourAccessKey)
>
>
> ssc.sparkContext.hadoopConfiguration.set("AWS_SECRET_ACCESS_KEY",yourSecretKey)
>
>
>
> 2. Set the access key and secret key in the environment before starting
> your application:
>
> 
> export AWS_ACCESS_KEY_ID=<your access>
>
> export AWS_SECRET_ACCESS_KEY=<your secret>
>
>
>
> 3. Set the access key and secret key inside the hadoop configurations
>
> val hadoopConf=ssc.sparkContext.hadoopConfiguration;
>
> hadoopConf.set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey)
> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey)
>
>
> 4. You can also try:
>
> val stream = ssc.textFileStream("s3n://yourAccessKey:yourSecretKey@
> <yourBucket>/path/")
>
>
> Thanks
> Best Regards
>
> On Tue, Feb 10, 2015 at 8:27 PM, Marc Limotte <mslimo...@gmail.com> wrote:
>
>> I see that StreamingContext has a hadoopConfiguration() method, which can
>> be used like this sample I found:
>>
>> sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX");
>>> sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX");
>>
>>
>> But StreamingContext doesn't have the same thing.  I want to use a
>> StreamingContext with s3n: text file input, but can't find a way to set the
>> AWS credentials.  I also tried (with no success):
>>
>>
>>    - adding the properties to conf/spark-defaults.conf
>>    - $HADOOP_HOME/conf/hdfs-site.xml
>>    - ENV variables
>>    - Embedded as user:password in s3n://user:password@... (w/ url
>>    encoding)
>>    - Setting the conf as above on a new SparkContext and passing that
>>    the StreamingContext constructor: StreamingContext(sparkContext:
>>    SparkContext, batchDuration: Duration)
>>
>> Can someone point me in the right direction for setting AWS creds (hadoop
>> conf options) for streamingcontext?
>>
>> thanks,
>> Marc Limotte
>> Climate Corporation
>>
>
>

Re: hadoopConfiguration for StreamingContext

Reply via email to