Hi Kostiantyn, I want to confirm that it works first by using hdfs-site.xml. If yes, you could define different spark-{user-x}.conf and source them during spark-submit. let us know if hdfs-site.xml works first. It should.
Best Regards, Jerry Sent from my iPhone > On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev > <kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > I want to run different jobs on different S3 buckets - different AWS creds - > on the same instances. Could you shed some light if it's possible to achieve > with hdfs-site? > > Thank you, > Konstantin Kudryavtsev > >> On Wed, Dec 30, 2015 at 2:10 PM, Jerry Lam <chiling...@gmail.com> wrote: >> Hi Kostiantyn, >> >> Can you define those properties in hdfs-site.xml and make sure it is visible >> in the class path when you spark-submit? It looks like a conf sourcing issue >> to me. >> >> Cheers, >> >> Sent from my iPhone >> >>> On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev >>> <kudryavtsev.konstan...@gmail.com> wrote: >>> >>> Chris, >>> >>> thanks for the hist with AIM roles, but in my case I need to run different >>> jobs with different S3 permissions on the same cluster, so this approach >>> doesn't work for me as far as I understood it >>> >>> Thank you, >>> Konstantin Kudryavtsev >>> >>>> On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> wrote: >>>> couple things: >>>> >>>> 1) switch to IAM roles if at all possible - explicitly passing AWS >>>> credentials is a long and lonely road in the end >>>> >>>> 2) one really bad workaround/hack is to run a job that hits every worker >>>> and writes the credentials to the proper location (~/.awscredentials or >>>> whatever) >>>> >>>> ^^ i wouldn't recommend this. ^^ it's horrible and doesn't handle >>>> autoscaling, but i'm mentioning it anyway as it is a temporary fix. >>>> >>>> if you switch to IAM roles, things become a lot easier as you can >>>> authorize all of the EC2 instances in the cluster - and handles >>>> autoscaling very well - and at some point, you will want to autoscale. >>>> >>>>> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev >>>>> <kudryavtsev.konstan...@gmail.com> wrote: >>>>> Chris, >>>>> >>>>> good question, as you can see from the code I set up them on driver, so >>>>> I expect they will be propagated to all nodes, won't them? >>>>> >>>>> Thank you, >>>>> Konstantin Kudryavtsev >>>>> >>>>>> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> wrote: >>>>>> are the credentials visible from each Worker node to all the Executor >>>>>> JVMs on each Worker? >>>>>> >>>>>>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev >>>>>>> <kudryavtsev.konstan...@gmail.com> wrote: >>>>>>> >>>>>>> Dear Spark community, >>>>>>> >>>>>>> I faced the following issue with trying accessing data on S3a, my code >>>>>>> is the following: >>>>>>> >>>>>>> val sparkConf = new SparkConf() >>>>>>> >>>>>>> val sc = new SparkContext(sparkConf) >>>>>>> sc.hadoopConfiguration.set("fs.s3a.impl", >>>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") >>>>>>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---") >>>>>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---") >>>>>>> val sqlContext = SQLContext.getOrCreate(sc) >>>>>>> val df = sqlContext.read.parquet(...) >>>>>>> df.count >>>>>>> >>>>>>> It results in the following exception and log messages: >>>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load >>>>>>> credentials from BasicAWSCredentialsProvider: Access key or secret key >>>>>>> is null >>>>>>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance >>>>>>> metadata service at URL: >>>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>>>>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load >>>>>>> credentials from InstanceProfileCredentialsProvider: The requested >>>>>>> metadata is not found at >>>>>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>>>>>> 15/12/30 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 >>>>>>> (TID 3) >>>>>>> com.amazonaws.AmazonClientException: Unable to load AWS credentials >>>>>>> from any provider in the chain >>>>>>> at >>>>>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) >>>>>>> at >>>>>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) >>>>>>> at >>>>>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) >>>>>>> at >>>>>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) >>>>>>> at >>>>>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) >>>>>>> >>>>>>> I run standalone spark 1.5.2 and using hadoop 2.7.1 >>>>>>> >>>>>>> any ideas/workarounds? >>>>>>> >>>>>>> AWS credentials are correct for this bucket >>>>>>> >>>>>>> Thank you, >>>>>>> Konstantin Kudryavtsev >>>> >>>> >>>> >>>> -- >>>> >>>> Chris Fregly >>>> Principal Data Solutions Engineer >>>> IBM Spark Technology Center, San Francisco, CA >>>> http://spark.tc | http://advancedspark.com >