Thanks, Akhil. I had high hopes for #2, but tried all and no luck. I was looking at the source and found something interesting. The Stack Trace (below) directs me to FileInputDStream.scala (line 141). This is version 1.1.1, btw. Line 141 has:
private def fs: FileSystem = { > if (fs_ == null) fs_ = directoryPath.getFileSystem(new Configuration()) > fs_ > } So it looks to me like it doesn't make any attempt to use a configured HadoopConf. Here is the StackTrace: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key > must be specified as the username or password (respectively) of a s3n URL, > or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey > properties (respectively). > at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66) > at > org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at org.apache.hadoop.fs.s3native.$Proxy5.initialize(Unknown Source) > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:216) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) > at org.apache.spark.streaming.dstream.FileInputDStream.org > $apache$spark$streaming$dstream$FileInputDStream$$fs(FileInputDStream.scala:141) > at > org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107) > at > org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75) > ... On Tue, Feb 10, 2015 at 10:28 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Try the following: > > 1. Set the access key and secret key in the sparkContext: > > ssc.sparkContext.hadoopConfiguration.set("AWS_ACCESS_KEY_ID",yourAccessKey) > > > ssc.sparkContext.hadoopConfiguration.set("AWS_SECRET_ACCESS_KEY",yourSecretKey) > > > > 2. Set the access key and secret key in the environment before starting > your application: > > > export AWS_ACCESS_KEY_ID=<your access> > > export AWS_SECRET_ACCESS_KEY=<your secret> > > > > 3. Set the access key and secret key inside the hadoop configurations > > val hadoopConf=ssc.sparkContext.hadoopConfiguration; > > hadoopConf.set("fs.s3.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem") > hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey) > hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey) > > > 4. You can also try: > > val stream = ssc.textFileStream("s3n://yourAccessKey:yourSecretKey@ > <yourBucket>/path/") > > > Thanks > Best Regards > > On Tue, Feb 10, 2015 at 8:27 PM, Marc Limotte <mslimo...@gmail.com> wrote: > >> I see that StreamingContext has a hadoopConfiguration() method, which can >> be used like this sample I found: >> >> sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX"); >>> sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX"); >> >> >> But StreamingContext doesn't have the same thing. I want to use a >> StreamingContext with s3n: text file input, but can't find a way to set the >> AWS credentials. I also tried (with no success): >> >> >> - adding the properties to conf/spark-defaults.conf >> - $HADOOP_HOME/conf/hdfs-site.xml >> - ENV variables >> - Embedded as user:password in s3n://user:password@... (w/ url >> encoding) >> - Setting the conf as above on a new SparkContext and passing that >> the StreamingContext constructor: StreamingContext(sparkContext: >> SparkContext, batchDuration: Duration) >> >> Can someone point me in the right direction for setting AWS creds (hadoop >> conf options) for streamingcontext? >> >> thanks, >> Marc Limotte >> Climate Corporation >> > >