Re: hadoopConfiguration for StreamingContext

Marc Limotte Tue, 10 Feb 2015 11:45:13 -0800

Looks like the latest version 1.2.1 actually does use the configured hadoop
conf.  I tested it out and that does resolve my problem.


thanks,
marc


On Tue, Feb 10, 2015 at 10:57 AM, Marc Limotte <mslimo...@gmail.com> wrote:

> Thanks, Akhil.  I had high hopes for #2, but tried all and no luck.
>
> I was looking at the source and found something interesting.  The Stack
> Trace (below) directs me to FileInputDStream.scala (line 141).  This is
> version 1.1.1, btw.  Line 141 has:
>
>   private def fs: FileSystem = {
>>     if (fs_ == null) fs_ = directoryPath.getFileSystem(new
>> Configuration())
>>     fs_
>>   }
>
>
> So it looks to me like it doesn't make any attempt to use a configured
> HadoopConf.
>
> Here is the StackTrace:
>
> java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access
>> Key must be specified as the username or password (respectively) of a s3n
>> URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
>> properties (respectively).
>> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
>> at
>> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>> at org.apache.hadoop.fs.s3native.$Proxy5.initialize(Unknown Source)
>> at
>> org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:216)
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
>> at org.apache.spark.streaming.dstream.FileInputDStream.org
>> $apache$spark$streaming$dstream$FileInputDStream$$fs(FileInputDStream.scala:141)
>> at
>> org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107)
>> at
>> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75)
>> ...
>
>
>
> On Tue, Feb 10, 2015 at 10:28 AM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> Try the following:
>>
>> 1. Set the access key and secret key in the sparkContext:
>>
>>
>> ssc.sparkContext.hadoopConfiguration.set("AWS_ACCESS_KEY_ID",yourAccessKey)
>>
>>
>> ssc.sparkContext.hadoopConfiguration.set("AWS_SECRET_ACCESS_KEY",yourSecretKey)
>>
>>
>>
>> 2. Set the access key and secret key in the environment before starting
>> your application:
>>
>> 
>> export AWS_ACCESS_KEY_ID=<your access>
>>
>> export AWS_SECRET_ACCESS_KEY=<your secret>
>>
>>
>>
>> 3. Set the access key and secret key inside the hadoop configurations
>>
>> val hadoopConf=ssc.sparkContext.hadoopConfiguration;
>>
>> hadoopConf.set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey)
>> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey)
>>
>>
>> 4. You can also try:
>>
>> val stream = ssc.textFileStream("s3n://yourAccessKey:yourSecretKey@
>> <yourBucket>/path/")
>>
>>
>> Thanks
>> Best Regards
>>
>> On Tue, Feb 10, 2015 at 8:27 PM, Marc Limotte <mslimo...@gmail.com>
>> wrote:
>>
>>> I see that StreamingContext has a hadoopConfiguration() method, which
>>> can be used like this sample I found:
>>>
>>> sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX");
>>>> sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX");
>>>
>>>
>>> But StreamingContext doesn't have the same thing.  I want to use a
>>> StreamingContext with s3n: text file input, but can't find a way to set the
>>> AWS credentials.  I also tried (with no success):
>>>
>>>
>>>    - adding the properties to conf/spark-defaults.conf
>>>    - $HADOOP_HOME/conf/hdfs-site.xml
>>>    - ENV variables
>>>    - Embedded as user:password in s3n://user:password@... (w/ url
>>>    encoding)
>>>    - Setting the conf as above on a new SparkContext and passing that
>>>    the StreamingContext constructor: StreamingContext(sparkContext:
>>>    SparkContext, batchDuration: Duration)
>>>
>>> Can someone point me in the right direction for setting AWS creds
>>> (hadoop conf options) for streamingcontext?
>>>
>>> thanks,
>>> Marc Limotte
>>> Climate Corporation
>>>
>>
>>
>

Re: hadoopConfiguration for StreamingContext

Reply via email to