Looks like the latest version 1.2.1 actually does use the configured hadoop conf. I tested it out and that does resolve my problem.
thanks, marc On Tue, Feb 10, 2015 at 10:57 AM, Marc Limotte <mslimo...@gmail.com> wrote: > Thanks, Akhil. I had high hopes for #2, but tried all and no luck. > > I was looking at the source and found something interesting. The Stack > Trace (below) directs me to FileInputDStream.scala (line 141). This is > version 1.1.1, btw. Line 141 has: > > private def fs: FileSystem = { >> if (fs_ == null) fs_ = directoryPath.getFileSystem(new >> Configuration()) >> fs_ >> } > > > So it looks to me like it doesn't make any attempt to use a configured > HadoopConf. > > Here is the StackTrace: > > java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access >> Key must be specified as the username or password (respectively) of a s3n >> URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey >> properties (respectively). >> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) >> at >> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) >> at org.apache.hadoop.fs.s3native.$Proxy5.initialize(Unknown Source) >> at >> org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:216) >> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) >> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) >> at org.apache.spark.streaming.dstream.FileInputDStream.org >> $apache$spark$streaming$dstream$FileInputDStream$$fs(FileInputDStream.scala:141) >> at >> org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107) >> at >> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75) >> ... > > > > On Tue, Feb 10, 2015 at 10:28 AM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> Try the following: >> >> 1. Set the access key and secret key in the sparkContext: >> >> >> ssc.sparkContext.hadoopConfiguration.set("AWS_ACCESS_KEY_ID",yourAccessKey) >> >> >> ssc.sparkContext.hadoopConfiguration.set("AWS_SECRET_ACCESS_KEY",yourSecretKey) >> >> >> >> 2. Set the access key and secret key in the environment before starting >> your application: >> >> >> export AWS_ACCESS_KEY_ID=<your access> >> >> export AWS_SECRET_ACCESS_KEY=<your secret> >> >> >> >> 3. Set the access key and secret key inside the hadoop configurations >> >> val hadoopConf=ssc.sparkContext.hadoopConfiguration; >> >> hadoopConf.set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem") >> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey) >> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey) >> >> >> 4. You can also try: >> >> val stream = ssc.textFileStream("s3n://yourAccessKey:yourSecretKey@ >> <yourBucket>/path/") >> >> >> Thanks >> Best Regards >> >> On Tue, Feb 10, 2015 at 8:27 PM, Marc Limotte <mslimo...@gmail.com> >> wrote: >> >>> I see that StreamingContext has a hadoopConfiguration() method, which >>> can be used like this sample I found: >>> >>> sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX"); >>>> sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX"); >>> >>> >>> But StreamingContext doesn't have the same thing. I want to use a >>> StreamingContext with s3n: text file input, but can't find a way to set the >>> AWS credentials. I also tried (with no success): >>> >>> >>> - adding the properties to conf/spark-defaults.conf >>> - $HADOOP_HOME/conf/hdfs-site.xml >>> - ENV variables >>> - Embedded as user:password in s3n://user:password@... (w/ url >>> encoding) >>> - Setting the conf as above on a new SparkContext and passing that >>> the StreamingContext constructor: StreamingContext(sparkContext: >>> SparkContext, batchDuration: Duration) >>> >>> Can someone point me in the right direction for setting AWS creds >>> (hadoop conf options) for streamingcontext? >>> >>> thanks, >>> Marc Limotte >>> Climate Corporation >>> >> >> >