Re: S3 Bucket Access

Rishi Pidva Tue, 14 Oct 2014 14:42:30 -0700

As per EMR documentation: 
http://docs.amazonaws.cn/en_us/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles.html
Access AWS Resources Using IAM Roles


If you've launched your cluster with an IAM role, applications running on the 
EC2 instances of that cluster can use the IAM role to obtain temporary account 
credentials to use when calling services in AWS.

The version of Hadoop available on AMI 2.3.0 and later has already been updated 
to make use of IAM roles. If your application runs strictly on top of the 
Hadoop architecture, and does not directly call any service in AWS, it should 
work with IAM roles with no modification.

If your application calls services in AWS directly, you'll need to update it to 
take advantage of IAM roles. This means that instead of obtaining account 
credentials from/home/hadoop/conf/core-site.xml on the EC2 instances in the 
cluster, your application will now either use an SDK to access the resources 
using IAM roles, or call the EC2 instance metadata to obtain the temporary 
credentials.

--

Maybe you can use AWS SDK in your application to provide AWS credentials?

https://github.com/seratch/AWScala


On Oct 14, 2014, at 11:10 AM, Ranga <sra...@gmail.com> wrote:

> One related question. Could I specify the 
> "com.amazonaws.services.s3.AmazonS3Client" implementation for the  
> "fs.s3.impl" parameter? Let me try that and update this thread with my 
> findings. 
> 
> On Tue, Oct 14, 2014 at 10:48 AM, Ranga <sra...@gmail.com> wrote:
> Thanks for the input. 
> Yes, I did use the "temporary" access credentials provided by the IAM role 
> (also detailed in the link you provided). The session token needs to be 
> specified and I was looking for a way to set that in the header (which 
> doesn't seem possible).
> Looks like a static key/secret is the only option.
> 
> On Tue, Oct 14, 2014 at 10:32 AM, Gen <gen.tan...@gmail.com> wrote:
> Hi,
> 
> If I remember well, spark cannot use the IAMrole credentials to access to
> s3. It use at first the id/key in the environment. If it is null in the
> environment, it use the value in the file core-site.xml.  So, IAMrole is not
> useful for spark. The same problem happens if you want to use distcp command
> in hadoop.
> 
> 
> Do you use curl http://169.254.169.254/latest/meta-data/iam/... to get the
> "temporary" access. If yes, this code cannot use directly by spark, for more
> information, you can take a look
> http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html
> <http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html>
> 
> 
> 
> sranga wrote
> > Thanks for the pointers.
> > I verified that the access key-id/secret used are valid. However, the
> > secret may contain "/" at times. The issues I am facing are as follows:
> >
> >    - The EC2 instances are setup with an IAMRole () and don't have a
> > static
> >    key-id/secret
> >    - All of the EC2 instances have access to S3 based on this role (I used
> >    s3ls and s3cp commands to verify this)
> >    - I can get a "temporary" access key-id/secret based on the IAMRole but
> >    they generally expire in an hour
> >    - If Spark is not able to use the IAMRole credentials, I may have to
> >    generate a static key-id/secret. This may or may not be possible in the
> >    environment I am in (from a policy perspective)
> >
> >
> >
> > - Ranga
> >
> > On Tue, Oct 14, 2014 at 4:21 AM, Rafal Kwasny &lt;
> 
> > mag@
> 
> > &gt; wrote:
> >
> >> Hi,
> >> keep in mind that you're going to have a bad time if your secret key
> >> contains a "/"
> >> This is due to old and stupid hadoop bug:
> >> https://issues.apache.org/jira/browse/HADOOP-3733
> >>
> >> Best way is to regenerate the key so it does not include a "/"
> >>
> >> /Raf
> >>
> >>
> >> Akhil Das wrote:
> >>
> >> Try the following:
> >>
> >> 1. Set the access key and secret key in the sparkContext:
> >>
> >> sparkContext.set("
> >>> 
> >>> AWS_ACCESS_KEY_ID",yourAccessKey)
> >>
> >> sparkContext.set("
> >>> 
> >>> AWS_SECRET_ACCESS_KEY",yourSecretKey)
> >>
> >>
> >> 2. Set the access key and secret key in the environment before starting
> >> your application:
> >>
> >> 
> >>>
> >> export
> >>> 
> >>> AWS_ACCESS_KEY_ID=
> > <your access>
> >>
> >> export
> >>> 
> >>> AWS_SECRET_ACCESS_KEY=
> > <your secret>
> > 
> >>
> >>
> >> 3. Set the access key and secret key inside the hadoop configurations
> >>
> >> val hadoopConf=sparkContext.hadoopConfiguration;
> >>>
> >>> hadoopConf.set("fs.s3.impl",
> >>>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
> >>>
> >>> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey)
> >>>
> >>> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey)
> >>>
> >>>
> >> 4. You can also try:
> >>
> >> val lines =
> >>
> >> s
> >>> parkContext.textFile("s3n://yourAccessKey:yourSecretKey@
> >>>
> > <yourBucket>
> > /path/")
> >>
> >>
> >> Thanks
> >> Best Regards
> >>
> >> On Mon, Oct 13, 2014 at 11:33 PM, Ranga &lt;
> 
> > sranga@
> 
> > &gt; wrote:
> >>
> >>> Hi
> >>>
> >>> I am trying to access files/buckets in S3 and encountering a permissions
> >>> issue. The buckets are configured to authenticate using an IAMRole
> >>> provider.
> >>> I have set the KeyId and Secret using environment variables (
> >>> AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID). However, I am still unable
> >>> to access the S3 buckets.
> >>>
> >>> Before setting the access key and secret the error was:
> >>> "java.lang.IllegalArgumentException:
> >>> AWS Access Key ID and Secret Access Key must be specified as the
> >>> username
> >>> or password (respectively) of a s3n URL, or by setting the
> >>> fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties
> >>> (respectively)."
> >>>
> >>> After setting the access key and secret, the error is: "The AWS Access
> >>> Key Id you provided does not exist in our records."
> >>>
> >>> The id/secret being set are the right values. This makes me believe that
> >>> something else ("token", etc.) needs to be set as well.
> >>> Any help is appreciated.
> >>>
> >>>
> >>> - Ranga
> >>>
> >>
> >>
> >>
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/S3-Bucket-Access-tp16303p16397.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
>

Re: S3 Bucket Access

Reply via email to