[jira] [Commented] (YARN-10940) Fix Documentation for AWS-Hadoop integration / yarn-site.xml

2021-09-13 Thread Girish Ganesan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414584#comment-17414584
 ] 

Girish Ganesan commented on YARN-10940:
---

Thanks Steve! I will try to fix the. Your point on AWS environments is well 
taken. I will look into the method you mentioned.

In my specific use case when AWS credentials are not static but they are 
regenerated about every 12 hours.  I was testing out Yarn on a single node use 
case and was taken by surprise when an environment variable was not read. But 
in retrospect that makes sense: even a single node is a "different machine" in 
a way. 

In case this thread shows up in a search for a similar problem: One thing to be 
aware about AWS_PROFILE variable is that it looks for the credentials for that 
named user in a local file on the machine (usually ~./aws/credentials). This 
file needs to be accessible for all the hosts.  

> Fix Documentation for AWS-Hadoop integration / yarn-site.xml
> 
>
> Key: YARN-10940
> URL: https://issues.apache.org/jira/browse/YARN-10940
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Girish Ganesan
>Priority: Major
>  Labels: Documentation
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The following document on AWS-Hadoop integration specified authenticating via 
> AWS environment variables:
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_via_the_AWS_Environment_Variables]
> It provides a warning:
> _Important: These environment variables are generally not propagated from 
> client to server when YARN applications are launched. That is: having the AWS 
> environment variables set when an application is launched will not permit the 
> launched application to access S3 resources. The environment variables must 
> (somehow) be set on the hosts/processes where the work is executed._
> This is somewhat cryptic. A few things need to be clarified in the doc:
>  # This is true even when Yarn is running on a single node (pseudo 
> distributed).
>  # *This also affects authentication via named profile* 
> credentials:[https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Using_Named_Profile_Credentials_with_ProfileCredentialsProvider]
>  __ This method depends on AWS_PROFILE variable.
>  # Please give some pointers on how the variables can be propagated. One way 
> is to whitelist the variable in yarn.nodemanager.env-whitelist (set in 
> yarn-site.xml): 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons]
> I was trying to figure out why hive was failing on a query (using mapred) on 
> an external table created from S3. After a while I realized it was not 
> getting the AWS_PROFILE variable. Eventually I realized that adding the 
> variable to the Yarn whitelist will do the trick. Hopefully this ticket will 
> help someone else.  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10940) Fix Documentation for AWS-Hadoop integration / yarn-site.xml

2021-09-09 Thread Girish Ganesan (Jira)
Girish Ganesan created YARN-10940:
-

 Summary: Fix Documentation for AWS-Hadoop integration / 
yarn-site.xml
 Key: YARN-10940
 URL: https://issues.apache.org/jira/browse/YARN-10940
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Girish Ganesan


The following document on AWS-Hadoop integration specified authenticating via 
AWS environment variables:
[https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_via_the_AWS_Environment_Variables]

It provides a warning:

_Important: These environment variables are generally not propagated from 
client to server when YARN applications are launched. That is: having the AWS 
environment variables set when an application is launched will not permit the 
launched application to access S3 resources. The environment variables must 
(somehow) be set on the hosts/processes where the work is executed._

This is somewhat cryptic. A few things need to be clarified in the doc:
 # This is true even when Yarn is running on a single node (pseudo distributed).
 # *This also affects authentication via named profile* 
credentials:[https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Using_Named_Profile_Credentials_with_ProfileCredentialsProvider]
 __ This method depends on AWS_PROFILE variable.
 # Please give some pointers on how the variables can be propagated. One way is 
to whitelist the variable in yarn.nodemanager.env-whitelist (set in 
yarn-site.xml): 
[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons]

I was trying to figure out why hive was failing on a query (using mapred) on an 
external table created from S3. After a while I realized it was not getting the 
AWS_PROFILE variable. Eventually I realized that adding the variable to the 
Yarn whitelist will do the trick. Hopefully this ticket will help someone else. 
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org