[
https://issues.apache.org/jira/browse/YARN-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414584#comment-17414584
]
Girish Ganesan commented on YARN-10940:
---
Thanks Steve! I will try to fix the. Your point on AWS environments is well
taken. I will look into the method you mentioned.
In my specific use case when AWS credentials are not static but they are
regenerated about every 12 hours. I was testing out Yarn on a single node use
case and was taken by surprise when an environment variable was not read. But
in retrospect that makes sense: even a single node is a "different machine" in
a way.
In case this thread shows up in a search for a similar problem: One thing to be
aware about AWS_PROFILE variable is that it looks for the credentials for that
named user in a local file on the machine (usually ~./aws/credentials). This
file needs to be accessible for all the hosts.
> Fix Documentation for AWS-Hadoop integration / yarn-site.xml
>
>
> Key: YARN-10940
> URL: https://issues.apache.org/jira/browse/YARN-10940
> Project: Hadoop YARN
> Issue Type: Task
>Reporter: Girish Ganesan
>Priority: Major
> Labels: Documentation
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> The following document on AWS-Hadoop integration specified authenticating via
> AWS environment variables:
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_via_the_AWS_Environment_Variables]
> It provides a warning:
> _Important: These environment variables are generally not propagated from
> client to server when YARN applications are launched. That is: having the AWS
> environment variables set when an application is launched will not permit the
> launched application to access S3 resources. The environment variables must
> (somehow) be set on the hosts/processes where the work is executed._
> This is somewhat cryptic. A few things need to be clarified in the doc:
> # This is true even when Yarn is running on a single node (pseudo
> distributed).
> # *This also affects authentication via named profile*
> credentials:[https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Using_Named_Profile_Credentials_with_ProfileCredentialsProvider]
> __ This method depends on AWS_PROFILE variable.
> # Please give some pointers on how the variables can be propagated. One way
> is to whitelist the variable in yarn.nodemanager.env-whitelist (set in
> yarn-site.xml):
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons]
> I was trying to figure out why hive was failing on a query (using mapred) on
> an external table created from S3. After a while I realized it was not
> getting the AWS_PROFILE variable. Eventually I realized that adding the
> variable to the Yarn whitelist will do the trick. Hopefully this ticket will
> help someone else.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org