Hi Jerry, I want to run different jobs on different S3 buckets - different AWS creds - on the same instances. Could you shed some light if it's possible to achieve with hdfs-site?
Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 2:10 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Kostiantyn, > > Can you define those properties in hdfs-site.xml and make sure it is > visible in the class path when you spark-submit? It looks like a conf > sourcing issue to me. > > Cheers, > > Sent from my iPhone > > On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > Chris, > > thanks for the hist with AIM roles, but in my case I need to run > different jobs with different S3 permissions on the same cluster, so this > approach doesn't work for me as far as I understood it > > Thank you, > Konstantin Kudryavtsev > > On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> wrote: > >> couple things: >> >> 1) switch to IAM roles if at all possible - explicitly passing AWS >> credentials is a long and lonely road in the end >> >> 2) one really bad workaround/hack is to run a job that hits every worker >> and writes the credentials to the proper location (~/.awscredentials or >> whatever) >> >> ^^ i wouldn't recommend this. ^^ it's horrible and doesn't handle >> autoscaling, but i'm mentioning it anyway as it is a temporary fix. >> >> if you switch to IAM roles, things become a lot easier as you can >> authorize all of the EC2 instances in the cluster - and handles autoscaling >> very well - and at some point, you will want to autoscale. >> >> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev < >> kudryavtsev.konstan...@gmail.com> wrote: >> >>> Chris, >>> >>> good question, as you can see from the code I set up them on driver, so >>> I expect they will be propagated to all nodes, won't them? >>> >>> Thank you, >>> Konstantin Kudryavtsev >>> >>> On Wed, Dec 30, 2015 at 1:06 PM, Chris Fregly <ch...@fregly.com> wrote: >>> >>>> are the credentials visible from each Worker node to all the Executor >>>> JVMs on each Worker? >>>> >>>> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev < >>>> kudryavtsev.konstan...@gmail.com> wrote: >>>> >>>> Dear Spark community, >>>> >>>> I faced the following issue with trying accessing data on S3a, my code >>>> is the following: >>>> >>>> val sparkConf = new SparkConf() >>>> >>>> val sc = new SparkContext(sparkConf) >>>> sc.hadoopConfiguration.set("fs.s3a.impl", >>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") >>>> sc.hadoopConfiguration.set("fs.s3a.access.key", "---") >>>> sc.hadoopConfiguration.set("fs.s3a.secret.key", "---") >>>> >>>> val sqlContext = SQLContext.getOrCreate(sc) >>>> >>>> val df = sqlContext.read.parquet(...) >>>> >>>> df.count >>>> >>>> >>>> It results in the following exception and log messages: >>>> >>>> 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load >>>> credentials from BasicAWSCredentialsProvider: *Access key or secret key is >>>> null* >>>> 15/12/30 17:00:32 DEBUG EC2MetadataClient: Connecting to EC2 instance >>>> metadata service at URL: >>>> http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>>> 15/12/30 >>>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >>>> 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load credentials >>>> from InstanceProfileCredentialsProvider: The requested metadata is not >>>> found at http://x.x.x.x/latest/meta-data/iam/security-credentials/ >>>> 15/12/30 >>>> <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >>>> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3) >>>> com.amazonaws.AmazonClientException: Unable to load AWS credentials from >>>> any provider in the chain >>>> at >>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) >>>> at >>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) >>>> at >>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) >>>> at >>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) >>>> at >>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) >>>> >>>> >>>> I run standalone spark 1.5.2 and using hadoop 2.7.1 >>>> >>>> any ideas/workarounds? >>>> >>>> AWS credentials are correct for this bucket >>>> >>>> Thank you, >>>> Konstantin Kudryavtsev >>>> >>>> >>> >> >> >> -- >> >> *Chris Fregly* >> Principal Data Solutions Engineer >> IBM Spark Technology Center, San Francisco, CA >> http://spark.tc | http://advancedspark.com >> > >