[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.

2021-08-02 Thread Yongjun Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391814#comment-17391814
 ] 

Yongjun Zhang commented on YARN-1541:
-

HI [~jianhe],

Thanks for your earlier work here. One question, if we set AM host to N/A, we 
lost the chance to report to appsummary the AM host. I wonder if the issue to 
to address here can be fixed by adding a flag to indicate no need to talk to 
AM, and change the logic to check the flag instead of setting the AM host to 
N/A?

Thanks

> Invalidate AM Host/Port when app attempt is done so that in the mean-while 
> client doesn’t get wrong information.
> 
>
> Key: YARN-1541
> URL: https://issues.apache.org/jira/browse/YARN-1541
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, 
> YARN-1541.3.patch, YARN-1541.4.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2

2021-07-12 Thread Yongjun Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379477#comment-17379477
 ] 

Yongjun Zhang commented on YARN-8200:
-

HI [~jhung], thanks for your work here! would you please comment: Is this 
feature only available to capacity scheduler? or it's also available with fair 
scheduler? thanks.

> Backport resource types/GPU features to branch-3.0/branch-2
> ---
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0
>
> Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-2.002.patch, YARN-8200-branch-2.003.patch, 
> YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5939) FSDownload leaks FileSystem resources

2019-11-27 Thread Yongjun Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984188#comment-16984188
 ] 

Yongjun Zhang edited comment on YARN-5939 at 11/28/19 7:18 AM:
---

HI [~lxw668899], [~cheersyang] and [~bibinchundatt],

Thank you guys for the work here.

I have a question, this issue was created at 29/Nov/16 01:05, while YARN-58 was 
fixed at 20/Aug/12 11:33 (there is a 4+ years gap!). And it was concluded that 
YARN-58 should be the fix. Wonder if the version [~lxw668899] reported the 
issue about already had the fix of YARN-58? Sounds the answer should be Yes 
given "Affects Version/s:" field contains 2.5.1, 2.7.3".

Based on the comment below made by [~lxw668899]:
https://issues.apache.org/jira/browse/YARN-5939?focusedCommentId=15750463=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15750463

the patch provided in this jira seem to have helped. Wonder if we missed 
something here?

Thanks.



was (Author: yzhangal):
HI [~lxw668899], [~cheersyang] and [~bibinchundatt],

Thank you guys for the work here.

I have a question, this issue was created at 29/Nov/16 01:05, while YARN-58 was 
fixed at 20/Aug/12 11:33 (there is a 4+ years gap!). And it was concluded that 
YARN-58 should be the fix. Wonder if the version [~lxw668899] reported the 
issue about already had the fix of YARN-58? Sounds the answer should be Yes 
given "2.5.1, 2.7.3" are said to be the impacted version.

Based on the comment below made by [~lxw668899]:
https://issues.apache.org/jira/browse/YARN-5939?focusedCommentId=15750463=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15750463

the patch provided in this jira seem to have helped. Wonder if we missed 
something here?

Thanks.


> FSDownload leaks FileSystem resources
> -
>
> Key: YARN-5939
> URL: https://issues.apache.org/jira/browse/YARN-5939
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1, 2.7.3
>Reporter: liuxiangwei
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: leak
> Attachments: YARN-5939.004.patch, YARN-5939.005.patch, 
> YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Background
> To use our self-defined FileSystem class, the item of configuration 
> "fs.%s.impl.disable.cache" should set to true.
> In YARN's source code, the class named 
> "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, 
> which leading to file descriptor leak because our self-defined FileSystem 
> class close the file descriptor when the close function is invoked.
> My Question below:
> 1. whether invoking "getFileSystem" but never close is YARN's expected 
> behavior 
> 2. what should we do in our self-defined FileSystem resolve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5939) FSDownload leaks FileSystem resources

2019-11-27 Thread Yongjun Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984188#comment-16984188
 ] 

Yongjun Zhang edited comment on YARN-5939 at 11/28/19 7:17 AM:
---

HI [~lxw668899], [~cheersyang] and [~bibinchundatt],

Thank you guys for the work here.

I have a question, this issue was created at 29/Nov/16 01:05, while YARN-58 was 
fixed at 20/Aug/12 11:33 (there is a 4+ years gap!). And it was concluded that 
YARN-58 should be the fix. Wonder if the version [~lxw668899] reported the 
issue about already had the fix of YARN-58? Sounds the answer should be Yes 
given "2.5.1, 2.7.3" are said to be the impacted version.

Based on the comment below made by [~lxw668899]:
https://issues.apache.org/jira/browse/YARN-5939?focusedCommentId=15750463=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15750463

the patch provided in this jira seem to have helped. Wonder if we missed 
something here?

Thanks.



was (Author: yzhangal):
HI [~lxw668899], [~cheersyang] and [~bibinchundatt],

Thank you guys for the work here.

I have a question, this issue was created at 29/Nov/16 01:05, while YARN-58 was 
fixed at 20/Aug/12 11:33 (there is a 4+ years gap!). And it was concluded that 
YARN-58 should be the fix. Wonder if the version [~lxw668899] reported the 
issue about already had the fix of YARN-58? 

Based on the comment below made by [~lxw668899]:
https://issues.apache.org/jira/browse/YARN-5939?focusedCommentId=15750463=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15750463

the patch provided in this jira seem to have helped. Wonder if we missed 
something here?

Thanks.


> FSDownload leaks FileSystem resources
> -
>
> Key: YARN-5939
> URL: https://issues.apache.org/jira/browse/YARN-5939
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1, 2.7.3
>Reporter: liuxiangwei
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: leak
> Attachments: YARN-5939.004.patch, YARN-5939.005.patch, 
> YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Background
> To use our self-defined FileSystem class, the item of configuration 
> "fs.%s.impl.disable.cache" should set to true.
> In YARN's source code, the class named 
> "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, 
> which leading to file descriptor leak because our self-defined FileSystem 
> class close the file descriptor when the close function is invoked.
> My Question below:
> 1. whether invoking "getFileSystem" but never close is YARN's expected 
> behavior 
> 2. what should we do in our self-defined FileSystem resolve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources

2019-11-27 Thread Yongjun Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984188#comment-16984188
 ] 

Yongjun Zhang commented on YARN-5939:
-

HI [~lxw668899], [~cheersyang] and [~bibinchundatt],

Thank you guys for the work here.

I have a question, this issue was created at 29/Nov/16 01:05, while YARN-58 was 
fixed at 20/Aug/12 11:33 (there is a 4+ years gap!). And it was concluded that 
YARN-58 should be the fix. Wonder if the version [~lxw668899] reported the 
issue about already had the fix of YARN-58? 

Based on the comment below made by [~lxw668899]:
https://issues.apache.org/jira/browse/YARN-5939?focusedCommentId=15750463=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15750463

the patch provided in this jira seem to have helped. Wonder if we missed 
something here?

Thanks.


> FSDownload leaks FileSystem resources
> -
>
> Key: YARN-5939
> URL: https://issues.apache.org/jira/browse/YARN-5939
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1, 2.7.3
>Reporter: liuxiangwei
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: leak
> Attachments: YARN-5939.004.patch, YARN-5939.005.patch, 
> YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Background
> To use our self-defined FileSystem class, the item of configuration 
> "fs.%s.impl.disable.cache" should set to true.
> In YARN's source code, the class named 
> "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, 
> which leading to file descriptor leak because our self-defined FileSystem 
> class close the file descriptor when the close function is invoked.
> My Question below:
> 1. whether invoking "getFileSystem" but never close is YARN's expected 
> behavior 
> 2. what should we do in our self-defined FileSystem resolve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8507) EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims failed with 502

2018-07-09 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved YARN-8507.
-
Resolution: Duplicate

> EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims 
> failed with 502
> ---
>
> Key: YARN-8507
> URL: https://issues.apache.org/jira/browse/YARN-8507
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> When doing an update to 3.0.3 release, by running command
> dev-support/bin/create-release --asfrelease --docker --dockercache
> documented in 
> https://wiki.apache.org/hadoop/HowToRelease
> I hit the following problem:
> {quote}
> 43566 WARNING: Invalid cookie header: "Set-Cookie: logged_in=no; 
> domain=.github.com; path=/; expires=Thu, 08 Jul 2038 07:00:00 -; secure; 
> HttpOnly". Invalid 'expires' attribute: Thu, 08 Jul 2038 07:  00:00 -
> 43567 [INFO] Unpacking 
> /maven/com/github/eirslett/yarn/0.21.3/yarn-0.21.3.tar.gz into 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node/yarn
> 43568 [INFO] Installed Yarn locally.
> 43569 [INFO]
> 43570 [INFO] --- frontend-maven-plugin:1.5:yarn (yarn install) @ 
> hadoop-yarn-ui ---
> 43571 [INFO] Running 'yarn install' in 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
> 43572 [INFO] yarn install v0.21.3
> 43573 [INFO] [1/4] Resolving packages...
> 43574 [INFO] [2/4] Fetching packages...
> 43575 [INFO] [3/4] Linking dependencies...
> 43576 [INFO] [4/4] Building fresh packages...
> 43577 [INFO] success Saved lockfile.
> 43578 [INFO] Done in 33.19s.
> 43579 [INFO]
> 43580 [INFO] --- frontend-maven-plugin:1.5:bower (bower install) @ 
> hadoop-yarn-ui ---
> 43581 [INFO] Running 'bower install' in 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
> 43582 [ERROR] bower ember-cli-shims#0.0.6EINVRES Request to 
> https://bower.herokuapp.com/packages/ember-cli-shims failed with 502
> {quote}
> And I found the solution here:
> https://stackoverflow.com/questions/51020317/einvres-request-to-https-bower-herokuapp-com-packages-failed-with-502#
> This is due to a recent change
> {quote}
> Bower is deprecating their registry hosted with heroku. 
> http://bower.herokuapp.com/ wont be accessible any more or it might be down 
> intermittently there by forcing users to new registry.
> Users working on old bower versions can update the .bowerrc file with the 
> following data.
> {quote}
> {code}
> {
>   "registry": "https://registry.bower.io;
> }
> {code}
> Create this jira to update  
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/.bowerrc
> accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8507) EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims failed with 502

2018-07-09 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537335#comment-16537335
 ] 

Yongjun Zhang commented on YARN-8507:
-

Thanks [~jlowe], 

I was trying to put the fix there, and I found the same:-) I used this solution 
to update 3.0.3 artifacts without committing, since I don't want to change the 
commit tip of the 3.0.3 release which is out already. 


> EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims 
> failed with 502
> ---
>
> Key: YARN-8507
> URL: https://issues.apache.org/jira/browse/YARN-8507
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> When doing an update to 3.0.3 release, by running command
> dev-support/bin/create-release --asfrelease --docker --dockercache
> documented in 
> https://wiki.apache.org/hadoop/HowToRelease
> I hit the following problem:
> {quote}
> 43566 WARNING: Invalid cookie header: "Set-Cookie: logged_in=no; 
> domain=.github.com; path=/; expires=Thu, 08 Jul 2038 07:00:00 -; secure; 
> HttpOnly". Invalid 'expires' attribute: Thu, 08 Jul 2038 07:  00:00 -
> 43567 [INFO] Unpacking 
> /maven/com/github/eirslett/yarn/0.21.3/yarn-0.21.3.tar.gz into 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node/yarn
> 43568 [INFO] Installed Yarn locally.
> 43569 [INFO]
> 43570 [INFO] --- frontend-maven-plugin:1.5:yarn (yarn install) @ 
> hadoop-yarn-ui ---
> 43571 [INFO] Running 'yarn install' in 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
> 43572 [INFO] yarn install v0.21.3
> 43573 [INFO] [1/4] Resolving packages...
> 43574 [INFO] [2/4] Fetching packages...
> 43575 [INFO] [3/4] Linking dependencies...
> 43576 [INFO] [4/4] Building fresh packages...
> 43577 [INFO] success Saved lockfile.
> 43578 [INFO] Done in 33.19s.
> 43579 [INFO]
> 43580 [INFO] --- frontend-maven-plugin:1.5:bower (bower install) @ 
> hadoop-yarn-ui ---
> 43581 [INFO] Running 'bower install' in 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
> 43582 [ERROR] bower ember-cli-shims#0.0.6EINVRES Request to 
> https://bower.herokuapp.com/packages/ember-cli-shims failed with 502
> {quote}
> And I found the solution here:
> https://stackoverflow.com/questions/51020317/einvres-request-to-https-bower-herokuapp-com-packages-failed-with-502#
> This is due to a recent change
> {quote}
> Bower is deprecating their registry hosted with heroku. 
> http://bower.herokuapp.com/ wont be accessible any more or it might be down 
> intermittently there by forcing users to new registry.
> Users working on old bower versions can update the .bowerrc file with the 
> following data.
> {quote}
> {code}
> {
>   "registry": "https://registry.bower.io;
> }
> {code}
> Create this jira to update  
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/.bowerrc
> accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8507) EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims failed with 502

2018-07-09 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang reassigned YARN-8507:
---

Assignee: Yongjun Zhang

> EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims 
> failed with 502
> ---
>
> Key: YARN-8507
> URL: https://issues.apache.org/jira/browse/YARN-8507
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> When doing an update to 3.0.3 release, by running command
> dev-support/bin/create-release --asfrelease --docker --dockercache
> documented in 
> https://wiki.apache.org/hadoop/HowToRelease
> I hit the following problem:
> {quote}
> 43566 WARNING: Invalid cookie header: "Set-Cookie: logged_in=no; 
> domain=.github.com; path=/; expires=Thu, 08 Jul 2038 07:00:00 -; secure; 
> HttpOnly". Invalid 'expires' attribute: Thu, 08 Jul 2038 07:  00:00 -
> 43567 [INFO] Unpacking 
> /maven/com/github/eirslett/yarn/0.21.3/yarn-0.21.3.tar.gz into 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node/yarn
> 43568 [INFO] Installed Yarn locally.
> 43569 [INFO]
> 43570 [INFO] --- frontend-maven-plugin:1.5:yarn (yarn install) @ 
> hadoop-yarn-ui ---
> 43571 [INFO] Running 'yarn install' in 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
> 43572 [INFO] yarn install v0.21.3
> 43573 [INFO] [1/4] Resolving packages...
> 43574 [INFO] [2/4] Fetching packages...
> 43575 [INFO] [3/4] Linking dependencies...
> 43576 [INFO] [4/4] Building fresh packages...
> 43577 [INFO] success Saved lockfile.
> 43578 [INFO] Done in 33.19s.
> 43579 [INFO]
> 43580 [INFO] --- frontend-maven-plugin:1.5:bower (bower install) @ 
> hadoop-yarn-ui ---
> 43581 [INFO] Running 'bower install' in 
> /build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
> 43582 [ERROR] bower ember-cli-shims#0.0.6EINVRES Request to 
> https://bower.herokuapp.com/packages/ember-cli-shims failed with 502
> {quote}
> And I found the solution here:
> https://stackoverflow.com/questions/51020317/einvres-request-to-https-bower-herokuapp-com-packages-failed-with-502#
> This is due to a recent change
> {quote}
> Bower is deprecating their registry hosted with heroku. 
> http://bower.herokuapp.com/ wont be accessible any more or it might be down 
> intermittently there by forcing users to new registry.
> Users working on old bower versions can update the .bowerrc file with the 
> following data.
> {quote}
> {code}
> {
>   "registry": "https://registry.bower.io;
> }
> {code}
> Create this jira to update  
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/.bowerrc
> accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8507) EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims failed with 502

2018-07-09 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-8507:

Description: 
When doing an update to 3.0.3 release, by running command

dev-support/bin/create-release --asfrelease --docker --dockercache

documented in 

https://wiki.apache.org/hadoop/HowToRelease

I hit the following problem:
{quote}
43566 WARNING: Invalid cookie header: "Set-Cookie: logged_in=no; 
domain=.github.com; path=/; expires=Thu, 08 Jul 2038 07:00:00 -; secure; 
HttpOnly". Invalid 'expires' attribute: Thu, 08 Jul 2038 07:  00:00 -
43567 [INFO] Unpacking 
/maven/com/github/eirslett/yarn/0.21.3/yarn-0.21.3.tar.gz into 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node/yarn
43568 [INFO] Installed Yarn locally.
43569 [INFO]
43570 [INFO] --- frontend-maven-plugin:1.5:yarn (yarn install) @ hadoop-yarn-ui 
---
43571 [INFO] Running 'yarn install' in 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
43572 [INFO] yarn install v0.21.3
43573 [INFO] [1/4] Resolving packages...
43574 [INFO] [2/4] Fetching packages...
43575 [INFO] [3/4] Linking dependencies...
43576 [INFO] [4/4] Building fresh packages...
43577 [INFO] success Saved lockfile.
43578 [INFO] Done in 33.19s.
43579 [INFO]
43580 [INFO] --- frontend-maven-plugin:1.5:bower (bower install) @ 
hadoop-yarn-ui ---
43581 [INFO] Running 'bower install' in 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
43582 [ERROR] bower ember-cli-shims#0.0.6EINVRES Request to 
https://bower.herokuapp.com/packages/ember-cli-shims failed with 502
{quote}

And I found the solution here:

https://stackoverflow.com/questions/51020317/einvres-request-to-https-bower-herokuapp-com-packages-failed-with-502#

This is due to a recent change
{quote}
Bower is deprecating their registry hosted with heroku. 
http://bower.herokuapp.com/ wont be accessible any more or it might be down 
intermittently there by forcing users to new registry.
Users working on old bower versions can update the .bowerrc file with the 
following data.
{quote}
{code}
{
  "registry": "https://registry.bower.io;
}
{code}

Create this jira to update  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/.bowerrc
accordingly.


  was:
When doing an update to 3.0.3 release, by running command

dev-support/bin/create-release --asfrelease --docker --dockercache

documented in 

https://wiki.apache.org/hadoop/HowToRelease

I hit the following problem:
{quote}
43566 WARNING: Invalid cookie header: "Set-Cookie: logged_in=no; 
domain=.github.com; path=/; expires=Thu, 08 Jul 2038 07:00:00 -; secure; 
HttpOnly". Invalid 'expires' attribute: Thu, 08 Jul 2038 07:  00:00 -
43567 [INFO] Unpacking 
/maven/com/github/eirslett/yarn/0.21.3/yarn-0.21.3.tar.gz into 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node/yarn
43568 [INFO] Installed Yarn locally.
43569 [INFO]
43570 [INFO] --- frontend-maven-plugin:1.5:yarn (yarn install) @ hadoop-yarn-ui 
---
43571 [INFO] Running 'yarn install' in 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
43572 [INFO] yarn install v0.21.3
43573 [INFO] [1/4] Resolving packages...
43574 [INFO] [2/4] Fetching packages...
43575 [INFO] [3/4] Linking dependencies...
43576 [INFO] [4/4] Building fresh packages...
43577 [INFO] success Saved lockfile.
43578 [INFO] Done in 33.19s.
43579 [INFO]
43580 [INFO] --- frontend-maven-plugin:1.5:bower (bower install) @ 
hadoop-yarn-ui ---
43581 [INFO] Running 'bower install' in 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
43582 [ERROR] bower ember-cli-shims#0.0.6EINVRES Request to 
https://bower.herokuapp.com/packages/ember-cli-shims failed with 502
{quote}

And I found the solution here:

https://stackoverflow.com/questions/51020317/einvres-request-to-https-bower-herokuapp-com-packages-failed-with-502#

This is due to a recent change
{quote}
Bower is deprecating their registry hosted with heroku. 
http://bower.herokuapp.com/ wont be accessible any more or it might be down 
intermittently there by forcing users to new registry.
Users working on old bower versions can update the .bowerrc file with the 
following data.
{
  "registry": "https://registry.bower.io;
}
{quote}

Create this jira to update  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/.bowerrc
accordingly.



> EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims 
> failed with 502
> ---
>
> Key: YARN-8507
> URL: https://issues.apache.org/jira/browse/YARN-8507
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yongjun Zhang
>Priority: Major
>
> When doing an update to 3.0.3 release, 

[jira] [Created] (YARN-8507) EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims failed with 502

2018-07-09 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created YARN-8507:
---

 Summary: EINVRES Request to 
https://bower.herokuapp.com/packages/ember-cli-shims failed with 502
 Key: YARN-8507
 URL: https://issues.apache.org/jira/browse/YARN-8507
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Yongjun Zhang


When doing an update to 3.0.3 release, by running command

dev-support/bin/create-release --asfrelease --docker --dockercache

documented in 

https://wiki.apache.org/hadoop/HowToRelease

I hit the following problem:
{quote}
43566 WARNING: Invalid cookie header: "Set-Cookie: logged_in=no; 
domain=.github.com; path=/; expires=Thu, 08 Jul 2038 07:00:00 -; secure; 
HttpOnly". Invalid 'expires' attribute: Thu, 08 Jul 2038 07:  00:00 -
43567 [INFO] Unpacking 
/maven/com/github/eirslett/yarn/0.21.3/yarn-0.21.3.tar.gz into 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node/yarn
43568 [INFO] Installed Yarn locally.
43569 [INFO]
43570 [INFO] --- frontend-maven-plugin:1.5:yarn (yarn install) @ hadoop-yarn-ui 
---
43571 [INFO] Running 'yarn install' in 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
43572 [INFO] yarn install v0.21.3
43573 [INFO] [1/4] Resolving packages...
43574 [INFO] [2/4] Fetching packages...
43575 [INFO] [3/4] Linking dependencies...
43576 [INFO] [4/4] Building fresh packages...
43577 [INFO] success Saved lockfile.
43578 [INFO] Done in 33.19s.
43579 [INFO]
43580 [INFO] --- frontend-maven-plugin:1.5:bower (bower install) @ 
hadoop-yarn-ui ---
43581 [INFO] Running 'bower install' in 
/build/source/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp
43582 [ERROR] bower ember-cli-shims#0.0.6EINVRES Request to 
https://bower.herokuapp.com/packages/ember-cli-shims failed with 502
{quote}

And I found the solution here:

https://stackoverflow.com/questions/51020317/einvres-request-to-https-bower-herokuapp-com-packages-failed-with-502#

This is due to a recent change
{quote}
Bower is deprecating their registry hosted with heroku. 
http://bower.herokuapp.com/ wont be accessible any more or it might be down 
intermittently there by forcing users to new registry.
Users working on old bower versions can update the .bowerrc file with the 
following data.
{
  "registry": "https://registry.bower.io;
}
{quote}

Create this jira to update  
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/.bowerrc
accordingly.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-30 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-4781:

Fix Version/s: 3.0.3

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.0.3
>
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.branch-2.patch, 
> YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens

2018-05-26 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-8232:

Fix Version/s: 3.0.3

> RMContainer lost queue name when RM HA happens
> --
>
> Key: YARN-8232
> URL: https://issues.apache.org/jira/browse/YARN-8232
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 2.8.5
>
> Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch, 
> YARN-8232.002.patch, YARN-8232.003.patch
>
>
> RMContainer has a member variable queuename to store which queue the 
> container belongs to. When RM HA happens and RMContainers are recovered by 
> scheduler based on NM reports, the queue name isn't recovered and always be 
> null.
> This situation causes some problems. Here is a case in preemption. Preemption 
> uses container's queue name to deduct preemptable resources when we use more 
> than one preempt selector, (for example, enable intra-queue preemption,) . 
> The detail is in
> {code:java}
> CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
> If the contain's queue name is null, this function will throw a 
> YarnRuntimeException because it tries to get the container's 
> TempQueuePerPartition and the preemption fails.
> Our patch solved this problem by setting container queue name when recover 
> containers. The patch is based on branch-2.8.3.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487810#comment-16487810
 ] 

Yongjun Zhang commented on YARN-8346:
-

Thanks a lot for the quick turnaround [~jlowe] and [~kkaranasos].


> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: YARN-8346.001.patch
>
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487806#comment-16487806
 ] 

Yongjun Zhang commented on YARN-8108:
-

Hi [~eyang],

It seems the issue also exists in 3.0.2 release. The above discussion indicates 
that it might take some time for the solution to converge, should we move 3.0.3 
out of the target release and list this jira as a known issue for 3.0.3? or we 
should fix this issue in 3.0.3?

Thanks.


> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487699#comment-16487699
 ] 

Yongjun Zhang commented on YARN-8346:
-

Hi Guys,

Thanks for reporting and working on the issue. I'm preparing 3.0.3 release, 
wonder if we can prioritize this one if we deem its a blocker?

Thanks,


> Upgrading to 3.1 kills running containers with error "Opportunistic container 
> queue is full"
> 
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.0.2
>Reporter: Rohith Sharma K S
>Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the 
> running containers are killed and second attempt is launched for that 
> application. The diagnostics message is "Opportunistic container queue is 
> full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
>  Opportunistic container [container_e06_1527075664705_0001_01_01] will 
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483464#comment-16483464
 ] 

Yongjun Zhang edited comment on YARN-8108 at 5/22/18 4:15 AM:
--

Hi [~eyang] and [~kbadani],

Would you please confirm that this jira is a blocker for 3.0.3 release? If so, 
can we expedite the work here since we are in the process of preparing 3.0.3 
release?

Thanks a lot.




was (Author: yzhangal):
Hi Guys,

Would you please confirm that this jira is a blocker for 3.0.3 release? If so, 
can we expedite the work here since we are in the process of preparing 3.0.3 
release?

Thanks a lot.



> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483464#comment-16483464
 ] 

Yongjun Zhang commented on YARN-8108:
-

Hi Guys,

Would you please confirm that this jira is a blocker for 3.0.3 release? If so, 
can we expedite the work here since we are in the process of preparing 3.0.3 
release?

Thanks a lot.



> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8022) ResourceManager UI cluster/app/ page fails to render

2018-05-09 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-8022:

Fix Version/s: (was: 3.0.1)
   3.0.3

> ResourceManager UI cluster/app/ page fails to render
> 
>
> Key: YARN-8022
> URL: https://issues.apache.org/jira/browse/YARN-8022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Blocker
> Fix For: 3.1.0, 3.0.3
>
> Attachments: Screen Shot 2018-03-12 at 1.45.05 PM.png, 
> YARN-8022.001.patch, YARN-8022.002.patch
>
>
> The page displays the message "Failed to read the attempts of the application"
>  
> The following stack trace is observed in RM log.
> org.apache.hadoop.yarn.server.webapp.AppBlock: Failed to read the attempts of 
> the application application_1520597233415_0002.
> java.lang.NullPointerException
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:283)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:280)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>  at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>  at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>  at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>  at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>  at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>  at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7937) Fix http method name in Cluster Application Timeout Update API example request

2018-05-09 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-7937:

Fix Version/s: (was: 3.0.1)
   3.0.3

> Fix http method name in Cluster Application Timeout Update API example request
> --
>
> Key: YARN-7937
> URL: https://issues.apache.org/jira/browse/YARN-7937
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs, documentation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Charan Hebri
>Assignee: Charan Hebri
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 3.0.3
>
> Attachments: YARN-7937.001.patch
>
>
> In section, Cluster Application Timeout Update API, 
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_Timeout_Update_API
> the example requests for both XML and JSON formats show "GET" as the http 
> method. This should actually be a PUT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7810) TestDockerContainerRuntime test failures due to UID lookup of a non-existent user

2018-05-09 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-7810:

Fix Version/s: (was: 3.0.2)
   3.0.3

> TestDockerContainerRuntime test failures due to UID lookup of a non-existent 
> user
> -
>
> Key: YARN-7810
> URL: https://issues.apache.org/jira/browse/YARN-7810
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.3
>
> Attachments: YARN-7810-branch-2.001.patch, 
> YARN-7810-branch-2.002.patch, YARN-7810-branch-3.0.001.patch, 
> YARN-7810.001.patch, YARN-7810.002.patch
>
>
> YARN-7782 enabled the Docker runtime feature to remap the username to uid:gid 
> form for launching Docker containers. The feature does an {{id -u}} and {{id 
> -G}} to get the UID and GIDs. This fails with the test user, as that user 
> doesn't actually exist on the host.
> {code:java}
> [ERROR] 
> testContainerLaunchWithCustomNetworks(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime)
>   Time elapsed: 0.411 s  <<< ERROR!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  
> ExitCodeException exitCode=1: id: 'run_as_user': no such user
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.getUserIdInfo(DockerLinuxContainerRuntime.java:711)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:757)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime.testContainerLaunchWithCustomNetworks(TestDockerContainerRuntime.java:599){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-08 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810798#comment-15810798
 ] 

Yongjun Zhang commented on YARN-6073:
-

Sure [~yuanbo], seems someone need to help to add your name to YARN contributer 
list before your name can be assigned to (I tried and could not).


> Misuse of format specifier in Preconditions.checkArgument
> -
>
> Key: YARN-6073
> URL: https://issues.apache.org/jira/browse/YARN-6073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Trivial
>
> RMAdminCLI.java
> {code}
>  int nLabels = map.get(nodeId).size();
>   Preconditions.checkArgument(nLabels <= 1, "%d labels specified on 
> host=%s"
>   + ", please note that we do not support specifying multiple"
>   + " labels on a single host for now.", nLabels, nodeIdStr);
> {code}
> The {{%d}} should be replaced with {{%s}}, per
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-08 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created YARN-6073:
---

 Summary: Misuse of format specifier in Preconditions.checkArgument
 Key: YARN-6073
 URL: https://issues.apache.org/jira/browse/YARN-6073
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yongjun Zhang
Priority: Trivial


RMAdminCLI.java

{code}
 int nLabels = map.get(nodeId).size();
  Preconditions.checkArgument(nLabels <= 1, "%d labels specified on host=%s"
  + ", please note that we do not support specifying multiple"
  + " labels on a single host for now.", nLabels, nodeIdStr);
{code}

The {{%d}} should be replaced with {{%s}}, per

https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5057) resourcemanager.security.TestDelegationTokenRenewer fails in trunk

2016-05-06 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created YARN-5057:
---

 Summary: resourcemanager.security.TestDelegationTokenRenewer fails 
in trunk
 Key: YARN-5057
 URL: https://issues.apache.org/jira/browse/YARN-5057
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yongjun Zhang


The following test seems to fail easily if not always:

{code}
---
 T E S T S
---
Running 
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 42.996 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
testCancelWithMultipleAppSubmissions(org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer)
  Time elapsed: 0.382 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testCancelWithMultipleAppSubmissions(TestDelegationTokenRenewer.java:1134)
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5048) DelegationTokenRenewer#skipTokenRenewal may throw NPE

2016-05-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274862#comment-15274862
 ] 

Yongjun Zhang commented on YARN-5048:
-

Sure Jian, will soon.


> DelegationTokenRenewer#skipTokenRenewal may throw NPE 
> --
>
> Key: YARN-5048
> URL: https://issues.apache.org/jira/browse/YARN-5048
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5048.1.patch, YARN-5048.2.patch
>
>
> {{((Token)token).decodeIdentifier()}} may 
> throw NPE if RM does not have the corresponding toke kind class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5048) DelegationTokenRenewer#skipTokenRenewal may throw NPE

2016-05-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274281#comment-15274281
 ] 

Yongjun Zhang commented on YARN-5048:
-

Thanks [~jianhe] for the rev. +1. 

> DelegationTokenRenewer#skipTokenRenewal may throw NPE 
> --
>
> Key: YARN-5048
> URL: https://issues.apache.org/jira/browse/YARN-5048
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5048.1.patch, YARN-5048.2.patch
>
>
> {{((Token)token).decodeIdentifier()}} may 
> throw NPE if RM does not have the corresponding toke kind class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5048) DelegationTokenRenewer#skipTokenRenewal may throw NPE

2016-05-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273680#comment-15273680
 ] 

Yongjun Zhang commented on YARN-5048:
-

Thanks [~jianhe].

Noticed one little thing, we need to move {{@SuppressWarnings("unchecked")}} to 
right before the code you added, as
{code}
@SuppressWarnings("unchecked")
AbstractDelegationTokenIdentifier identifier =
((Token) token).decodeIdentifier();
if (identifier == null) {
  return false;
}   
{code}

 +1 after that.

BTW, I wonder why our build system doesn't point that out as a warning.







> DelegationTokenRenewer#skipTokenRenewal may throw NPE 
> --
>
> Key: YARN-5048
> URL: https://issues.apache.org/jira/browse/YARN-5048
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5048.1.patch
>
>
> {{((Token)token).decodeIdentifier()}} may 
> throw NPE if RM does not have the corresponding toke kind class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5048) DelegationTokenRenewer#skipTokenRenewal may throw NPE

2016-05-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273372#comment-15273372
 ] 

Yongjun Zhang edited comment on YARN-5048 at 5/6/16 12:20 AM:
--

Thanks for the explanation [~jianhe].  
I'm about to +1, but saw some test failures. Would you please check on? thanks.




was (Author: yzhangal):
Thanks for the explanation [~jianhe]. I'm +1 on your patch.


> DelegationTokenRenewer#skipTokenRenewal may throw NPE 
> --
>
> Key: YARN-5048
> URL: https://issues.apache.org/jira/browse/YARN-5048
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5048.1.patch
>
>
> {{((Token)token).decodeIdentifier()}} may 
> throw NPE if RM does not have the corresponding toke kind class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5048) DelegationTokenRenewer#skipTokenRenewal may throw NPE

2016-05-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273372#comment-15273372
 ] 

Yongjun Zhang commented on YARN-5048:
-

Thanks for the explanation [~jianhe]. I'm +1 on your patch.


> DelegationTokenRenewer#skipTokenRenewal may throw NPE 
> --
>
> Key: YARN-5048
> URL: https://issues.apache.org/jira/browse/YARN-5048
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5048.1.patch
>
>
> {{((Token)token).decodeIdentifier()}} may 
> throw NPE if RM does not have the corresponding toke kind class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5048) DelegationTokenRenewer#skipTokenRenewal may throw NPE

2016-05-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273347#comment-15273347
 ] 

Yongjun Zhang commented on YARN-5048:
-

HI [~jianhe],

Thanks for catching the issue and the patch!

The patch looks good to me. Some questions,
 
1. what circumstance that RM does not have the token kind class?
2. This method is to check whether we need to skip renewing a token, if RM does 
not have the corresponding token kind class, does it mean we should not skip 
renewing thus RM need to renew? 
3. If RM need to renew, does it mean RM need to have the token kind class?

Thanks.
 


> DelegationTokenRenewer#skipTokenRenewal may throw NPE 
> --
>
> Key: YARN-5048
> URL: https://issues.apache.org/jira/browse/YARN-5048
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5048.1.patch
>
>
> {{((Token)token).decodeIdentifier()}} may 
> throw NPE if RM does not have the corresponding toke kind class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-17 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Release Note: 
ResourceManager renews delegation tokens for applications. This behavior has 
been changed to renew tokens only if the token's renewer is a non-empty string. 
MapReduce jobs can instruct ResourceManager to skip renewal of tokens obtained 
from certain hosts by specifying the hosts with configuration 
mapreduce.job.hdfs-servers.token-renewal.exclude=host1,host2,..,hostN. 


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Fix For: 2.8.0

 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
 YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-17 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500368#comment-14500368
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks also to [~ka...@cloudera.com] for the earlier discussions, and we worked 
out a release notes which I just updated.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Fix For: 2.8.0

 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
 YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-17 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500029#comment-14500029
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks again [~jianhe] for the reviews/suggestions and committing!

Thanks [~qwertymaniac] for diagnosing and reporting the issue, Harsh, 
[~vinodkv], [~adhoot] for the reviews and discussions!



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Fix For: 2.8.0

 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
 YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498253#comment-14498253
 ] 

Yongjun Zhang commented on YARN-3021:
-

HI [~jianhe],

Thanks for looking at it again and sorry for late response, I was out for some 
time myself too.

It turned out that the same patch 007 applies for me with today's trunk, and I 
uploaded it again.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
 YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.007.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
 YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498773#comment-14498773
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks a lot [~jianhe]!


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
 YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498565#comment-14498565
 ] 

Yongjun Zhang commented on YARN-3021:
-

I don't see 
{code}
java.lang.AssertionError: AppAttempt state is not correct (timedout) 
expected:ALLOCATED but was:SCHEDULED
at org.junit.Assert.fail(Assert.java:88)
{code}
reported in YARN-2483 here, so the failure here may be for a different reason.

The same patch finished successfully in previous jenkins run, which indicates 
some flakiness of the failed test. Will throw another jenkins run.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
 YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498488#comment-14498488
 ] 

Yongjun Zhang commented on YARN-3021:
-

The test failure is likely YARN-2483.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
 YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-07 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.007.patch

Upload same patch again for another test.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481881#comment-14481881
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

Thanks for taking a further look. No worry about the delay, I guessed you were 
out.

About your comment, the code 
{code}
 private void collectDelegationTokens(final String renewer,
   final Credentials credentials,
   final ListToken? tokens)
   throws IOException {
final String serviceName = getCanonicalServiceName();
// Collect token of the this filesystem and then of its embedded children
if (serviceName != null) { // fs has token, grab it
  final Text service = new Text(serviceName);
  Token? token = credentials.getToken(service); 
  if (token == null) {
token = getDelegationToken(renewer);
if (token != null) {
  tokens.add(token);
  credentials.addToken(service, token);
}
  }
}
{code}
The line highlighted with === indicates that a token could be retrieved from 
the token map. In this case, are we sure that they always have a non-empty 
renewer? In addition, it's possible that we might change the 
{{skipTokenRenewer}} method in the future to do some additional checking.   
Seems safer to have this check. Do you think we should just keep this checking?

Thanks.




 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.007.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482635#comment-14482635
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

I uploaded rev 007 to address your latest comment. I agree that the token 
renewer won't be empty in that case, and if we need to modify the definition of 
{{skipTokenRenewal}} in the future, we can add back the check at that time. 

Would you please take a look?

Thanks.

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-31 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389392#comment-14389392
 ] 

Yongjun Zhang commented on YARN-3021:
-

HI [~vinodkv],

Seems [~jianhe] is not available. Would you please help with a review?.

Thanks a lot.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-26 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382799#comment-14382799
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe], would you please take a look at the latest patch? thanks a lot.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379249#comment-14379249
 ] 

Yongjun Zhang commented on YARN-3021:
-

Restarted my VM (the same one on which I reported the trace stack in my last 
update), and rerun the failed test TestCapacitySchedulerNodeLabelUpdate, and it 
is successful. There is some flakiness with this test but not related to this 
jira.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376765#comment-14376765
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

Thanks a lot for the clarification, I did a new rev (06) to address your latest 
comment, and also tested it against real clusters. Would you please take a  
further look? Thanks.





 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-23 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.006.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-23 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.006.patch

The test failure seems to be unrelated, upload same patch 06 to trigger another 
jenkins run. 


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-23 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: (was: YARN-3021.006.patch)

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370811#comment-14370811
 ] 

Yongjun Zhang commented on YARN-3021:
-

HI Jian,

Thanks a lot for your detailed review and comments! I'm attaching rev5 to 
address all of them.

* Replaced {{new Text(HDFS_DELEGATION_TOKEN)}} with predefined constant
* About does conf.getStrings strip off the leading or ending empty strings? if 
not, we may strip those off., I followed {{JobSubmitter#populateTokenCache}}. 
I think it makes sense for user to not to put leading and ending empty strings.
* Removed NON_RENEWER. But still use empty renewer string instead of null. 
* I did test rev 4 earlier, and I also tested rev5 with real clusters.

Thanks for taking look at the new rev.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-20 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.005.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369886#comment-14369886
 ] 

Yongjun Zhang commented on YARN-3021:
-

Running the failed test TestRM locally is successful.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.004.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369646#comment-14369646
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe], thanks for your help yesterday. 

Jian and all, I uploaded patch rev 004, would you please help taking a look? 
thanks.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-19 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369650#comment-14369650
 ] 

Yongjun Zhang commented on YARN-3021:
-

BTW, there is some weird problem with setting renewer string to null, I chose 
to set it to empty string instead.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367536#comment-14367536
 ] 

Yongjun Zhang commented on YARN-3021:
-

HI [~jianhe] and all,

I resumed working on this and found an obstacle here. 

See org.apache.hadoop.security.token.Token:

{code}
 private synchronized TokenRenewer getRenewer() throws IOException {
if (renewer != null) {
  return renewer;
}
renewer = TRIVIAL_RENEWER;
synchronized (renewers) {
  for (TokenRenewer canidate : renewers) {
if (canidate.handleKind(this.kind)) {
  renewer = canidate;
  return renewer;
}
  }
}
LOG.warn(No TokenRenewer defined for token kind  + this.kind);
return renewer;
  }

 public boolean isManaged() throws IOException {
return getRenewer().isManaged(this);
  }

  public long renew(Configuration conf
) throws IOException, InterruptedException {
return getRenewer().renew(this, conf);
  }
  
  public void cancel(Configuration conf
 ) throws IOException, InterruptedException {
getRenewer().cancel(this, conf);
  }

{code}

We can see that {{getRenewer()}} does more work than simply return the renewer. 
And non-null renewer is guaranteed to be returned currently. The other methods 
(listed above, called at server side) count on this behavior.

If we set the renewer to null at client side and expect the server to pick it 
up, we need to do either

1. change the behaviour of {{getRenewer()} to return whatever renewer set by 
client. 
2. or we change the token's {{kind}} to make {{getRenewer}} to return null, 
which will be really hacky.

Making this kind of change seems to be more wide impact than expected, and 
things likely will broken by this change.

Any thoughts?

Thanks a lot.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367692#comment-14367692
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi Jian, looking closer at what you suggested, I think I was wrong about 
setting the TokenRenewer object in the token to null, instead, we want to set 
the renewer string to null. :-) thanks.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367669#comment-14367669
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

Thanks for your comment. I'm actually aligned with what you suggested. 

The problem I was trying to point out is, we will have to change the behavior 
of the code I pasted above to deal with null renewer. E.g., the 
{{getRenewer()}} method will return a non-null based on current implementation 
(if not set or found, TRIVIAL_RENEWER will be returned); after making the 
suggested change for this jira,  the renewer can be null, so we should return 
null from {{getRenewer()}}.

My question was, I'm not sure about the impact of this behavior change. I 
expect some application does count on the current behavior.

More comments?

Thanks.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367539#comment-14367539
 ] 

Yongjun Zhang commented on YARN-3021:
-

Possibly introduce a dummy renewer class and make its methods no op, instead of 
setting renewer to null?

I wonder whether this would be compatible change ...


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-11 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357569#comment-14357569
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~vinodkv],

Thanks for the comments. We do have the consensus about the approach too, I 
have been caught on other critical stuff. Will try to get to this asap. Thanks.




 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350672#comment-14350672
 ] 

Yongjun Zhang commented on YARN-3021:
-

Gotcha, thanks Jian.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349494#comment-14349494
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks a lot Jian,

Good suggestion of #1, and I agree that in reality I don't foresee big 
breakage.  I further discussed [~adhoot], and he agrees with this approach too.

Hi [~vinodkv] and [~qwertymaniac], comments on this approach that Jian 
described above?

If there is no objection, I will try to work out a revised patch asap.

Thanks.





 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-05 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350013#comment-14350013
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

{quote}
My only concern is not to add a user-facing API only used for short-term.
{quote}

I thought about it a bit more, it seems that  
mapreduce.job.hdfs-servers.token-renewal.exclude is still going to be a 
user-facing API only used for short-term, because when we introduce external 
renewer, the tokens need to be assigned to the renewer, after all, we want the 
tokens to be renewed. Right? Or there are use cases that we really don't want 
to renew?

That said, I think the solution would solve our current problem.

Thanks.
 

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-04 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347026#comment-14347026
 ] 

Yongjun Zhang commented on YARN-3021:
-

Many thanks Jian.

{quote}
Change MR client to set null renewer for the token coming from a different 
cluster
{quote}
In the special case that we are dealing with in this jira, cluster A and 
cluster B don't trust each other. However, in other scenarios, two clusters may 
trust each other. So we can't always set null renewer based on which cluster 
the token is from. 
Maybe we can combine our approaches, set null renewer for external cluster only 
when 
{{-Dmapreduce.job.delegation.tokenrenewer.for.external.cluster=null}} is 
specified for a job?

{quote}
Actually, YARN can also provide a constant string say SKIP_RENEW_TOKEN, MR 
uses this string as the renewer for tokens it doesn't want to renew. RM detects 
if the renewer equals the constant string and skip renew if it is.
{quote}
Maybe we can use string null for SKIP_RENEW_TOKEN? we need to document 
whatever string here as a special string so application don't use it for tokens 
that need to be renewed.

There is still chance of changing existing applications behavior for those who 
happen to set the renewer to our special string. So what about we still 
introduce {{yarn.resourcemanager.validate.tokenrenewer}} described in my last 
comment (enable renewer validation only when the config is true)?

Thanks.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-03 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346182#comment-14346182
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

Thanks a lot for your comments.

I discussed with [~adhoot], below is what we thought:

1.
{quote}
we may choose to add a server-side config to not let application fail if 
renewal fails
{quote}
The change is to let RM ignore token renewal failure, this means all 
applications will be impacted by this change since it's a server side config.  
What Harsh did in the initial patch is to ignore renewal failure in a hardcoded 
way.  What my earlier patch does is to skip renewing instead of ignore token 
renewal failure. 

We think skipping renewal seems better than ignoring renewal failure, because 
we are also talking about adding external renewer in the future, and these two 
changes will be compatible. Say, there might be renewal failure with external 
renewer, which we don't want to ignore.

2. API change in the current patch. It's an optional parameter, so it's 
compatible change. Consider it's also matching our future extension of 
introducing external renewer, it seems ok to have the API change.

Comments?

Many thanks.







 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-03 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346347#comment-14346347
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

I agree with you about the longer term solution.  However, with the short term 
solution you suggested, #2 would let certain jobs continue to run even if they 
would have failed token renewal without this short term solution. 

Another alternative is,
#  have MR client specify the token renewer it needs to use (instead of your 
step 1), such as passing  -Dmapreduce.job.delegation.tokenrenewer=null
# client code will update the token renewer if specified
# RM implements the logic to only renew its own token, however, this is 
configurable by a new server-side config property, such as 
yarn.resourcemanager.validate.tokenrenewer. This config property  defaults to 
false to retain old behavior (means not validating the renewer). We may 
consider changing the default to true in the future.

This alternative avoids the new temporary API, but it also involves setting a 
server-side config property that impacts all clients. 
(The advantage of my earlier proposed solution has minimum impact, of course at 
the cost of introducing the temporary new API)

Thoughts?

Thanks.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3200) Factor OSType out from Shell: changes in YARN

2015-02-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321812#comment-14321812
 ] 

Yongjun Zhang commented on YARN-3200:
-

This is a subtask that I will be working on, but it seems I can't assign the 
jira to myself. Anyone who has the privilege, appreciate your help to assign it 
to me. Thanks.



 Factor OSType out from Shell: changes in YARN
 -

 Key: YARN-3200
 URL: https://issues.apache.org/jira/browse/YARN-3200
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Yongjun Zhang
 Fix For: 2.7.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3200) Factor OSType out from Shell: changes in YARN

2015-02-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322070#comment-14322070
 ] 

Yongjun Zhang commented on YARN-3200:
-

Thanks a lot [~brahmareddy] and [~ozawa]! I will submit patch after 
HADOOP-11597 is resolved because of the dependency.



 Factor OSType out from Shell: changes in YARN
 -

 Key: YARN-3200
 URL: https://issues.apache.org/jira/browse/YARN-3200
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Fix For: 2.7.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3200) Factor OSType out from Shell: changes in YARN

2015-02-15 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created YARN-3200:
---

 Summary: Factor OSType out from Shell: changes in YARN
 Key: YARN-3200
 URL: https://issues.apache.org/jira/browse/YARN-3200
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Yongjun Zhang
 Fix For: 2.7.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-02-12 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318760#comment-14318760
 ] 

Yongjun Zhang commented on YARN-3021:
-

HI [~vinodkv] and [~jianhe],

Would you please comment on [~qwertymaniac]'s comment above?

Thanks a lot.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-02-12 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319024#comment-14319024
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~vinodkv],

Thanks a lot for your comment!

{quote}
The question is whether we continue supporting this implicit aux feature or 
drop it. And given my earlier point that RM cannot know either ways, this 
implicit feature was always broken. 
{quote}
Agree. What about we use the patch of this jira to disable/enable this implicit 
feature (as it currently does), and create a new jira to address the broken 
implicit feature when enabled?

Thanks.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-02-09 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313600#comment-14313600
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks [~qwertymaniac], I agree with your comment. Look forward to hearing 
other folks' viewpoints.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-02-09 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312707#comment-14312707
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~vinodkv],

Thanks for your further look and confirm that the uploaded patch would work.

About your non-code comments, I assume you meant to  inspect the incoming 
renewer specified in the token and skip renewing those tokens if the renewer 
doesn't match it's own address. I have two thoughts:

* Seems regardless of this jira, we could do a renewer address match as a 
validation step. Right?
* In this case, actually looks like the renewer would be cluster A's yarn, 
based on {{TokenCache@obtainTokensForNamenodesInternal}} and 
{{Master.getMasterPrincipal}}. 

{code}
public static String getMasterUserName(Configuration conf) {
String framework = conf.get(MRConfig.FRAMEWORK_NAME, 
MRConfig.YARN_FRAMEWORK_NAME);
if (framework.equals(MRConfig.CLASSIC_FRAMEWORK_NAME)) {
  return conf.get(MRConfig.MASTER_USER_NAME);
} 
else {
  return conf.get(YarnConfiguration.RM_PRINCIPAL);
}
  }
{code}

So it looks like that even if we check, the renewer would match in this case. 
Please correct me if I'm wrong.

Thanks.





 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-02-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309703#comment-14309703
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~vinodkv] and [~jianhe],

Thank you so much for review and commenting!

I will try to respond to part of your comments here and keep looking into the 
rest.

{quote}
RM can simply inspect the incoming renewer specified in the token and skip 
renewing those tokens if the renewer doesn't match it's own address. This way, 
we don't need an explicit API in the submission context.
{quote}
Seems regardless of this jira, we could have do the above change, right? any 
catch?

{quote}
Apologies for going back and forth on this one.
{quote}
I appreciate the insight you provided, and we are trying to figure out the best 
solution together. All the points you provided are reasonable, so absolutely no 
need for apologies here.

{quote}
Irrespective of how we decide to skip tokens, the way the patch is skipping 
renewal will not work. In secure mode, DelegationTokenRenewer drives the app 
state machine. So if you skip adding the app itself to DTR, the app will be 
completely 
{quote}
I did test in a secure env and it worked. Would you please elaborate?

{quote}
I think in this case, the renewer specified in the token is the same as the RM. 
IIUC, the JobClient will request the token from B cluster, but still specify 
the renewer as the A cluster RM (via the A cluster local config), am I right?
{quote}
I think that's the case. The problem is that there is no trust between A and B. 
So common should be the one to renew the token.

Thanks.





 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-02-05 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.003.patch

Uploaded rev 003 to address Zhihai's comments. Thanks Zhihai.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-02-02 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302003#comment-14302003
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks for the in-person discussion and comment [~zxu]. I will do the change in 
next rev.

Hi [~vinodkv], your help is greatly appreciated here, would you please take a 
look at the patch? thanks.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-02-01 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300258#comment-14300258
 ] 

Yongjun Zhang commented on YARN-3021:
-

I submitted the same patch 002 twice (only diff between them is a comment in 
xml file), the two tests failed at the 1st submission are the same as rev 001, 
which I described at
https://issues.apache.org/jira/browse/YARN-3021?focusedCommentId=14290493page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14290493

Rerunning the other failed tests of rev 002 locally is successful.

Thanks.





 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-31 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300093#comment-14300093
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~qwertymaniac],

Thanks a lot for your review and feedback! I just uploaded rev 002 to address 
them,

Right now my fix is to change {{RMAppManager#submitApplication}} (that calls 
{{DeletationTokenRenewer#addApplicationAsync}}) to check the config property 
setting. Because I saw that it's the method that gets called when distcp 
submits job. 

There is another place in {{RMAppImpl#transition}} that calls 
{{DeletationTokenRenewer#addApplicationSync}}, which I'm not familiar with.

I wonder if you could comment on that?

Thanks.




 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-31 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.002.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-31 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.002.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-31 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: (was: YARN-3021.002.patch)

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-29 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297112#comment-14297112
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~vinodkv],

Thanks for your earlier comments and suggestions, would you please help taking 
a look at the patch? Thanks a lot!



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-28 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296066#comment-14296066
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~rkanter] and [~adhoot],

Thanks for your comments. Yes, we are talking about HDFS delegation token, this 
patch provides an option to turn off initialization validation inside RM, and 
the token verification will happen inside HDFS when distcp job runs.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-28 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296147#comment-14296147
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~rkanter], the reason I moved {{context.setResourcere(resource)}} is 
because it seems to be misplaced: all other parameters appear to be set in the 
order they are passed to the method, except this one. Thanks.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290493#comment-14290493
 ] 

Yongjun Zhang commented on YARN-3021:
-

I reran the failed tests locally, 

The following tests
{quote}
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStatorg.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
org.apache.hadoop.mapred.TestJobConf
{quote}
are successful

The following test failed:
{quote}
org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem
TestJobConf.testNegativeValueForTaskVmem:111 expected:1024 but was:-1
{quote}
and it was reported as MAPREDUCE-6223.

TestLargeSort failed in 
https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-23 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.001.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289439#comment-14289439
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~qwertymaniac], [~vinodkv], [~adhoot],

Thanks for the earlier discussion and input. I uploaded patch rev 001 by 
introducing a new job configuration property 
mapreduce.job.skip.rm.token.renewal, thus passing 
-Dmapreduce.job.skip.rm.token.renewal=true to distcp (to instruct Resource 
Manager to skip token renewal) would solve the problem.

I did test in the env Harsh helped to set up, thanks Harsh.

Would you please help taking a look at the patch?

Thanks.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279506#comment-14279506
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~adhoot], Thanks for clarifying. That sounds good.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-13 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276438#comment-14276438
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks a lot [~adhoot] and [~vinodkv]!

{quote}
Having said that, if RM cannot validate the token as valid why would the job 
itself work? Would not the containers themselves face the same issue using the 
tokens?
{quote}
Based on the scenario [~qwertymaniac] described in the jira description, the 
token is from realm B, which can not be validated by realm A's YARN since A and 
B doesn't trust each other. However, the token can be used by distcp job 
running in realm A to access B's file (B is the distcp source).

For the scenario described in the jira, I think we are aligned that it would be 
better to add an additional parameter at the time of job submission, so client 
to can tell YARN explicitly that YARN should not try to renew the token. 

What I wanted to clarify with my earlier question was, if we support this 
scenario by having YARN not to validate the token, do we open any security 
hole? Anyone could submit a job and ask YARN not to renew the token, right?

Thanks.






 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-09 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272317#comment-14272317
 ] 

Yongjun Zhang commented on YARN-3021:
-

{quote}
If we can have a way to tell YARN not to renew a token, and the client either 
wants to just to use the lifetime of the token, or the client can renew itself 
before the job finish, that will solve the problem. One possible solution is to 
add a new parameter when submitting a job, or ro add a field in the job context.
{quote}
However, assume we have the API to let YARN not to renew the token, YARN might 
still want to check whether a token submitted from client is valid. Currently 
the validity checking is done in the renewal code described in
https://issues.apache.org/jira/browse/YARN-3021?focusedCommentId=14271642page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271642
and Harsh's patch tries to fix. In other words, the renewal code serves as both 
validating and renewing the token. There seems to be no dedicated API to check 
validity.

In the scenario Harsh described, even if we have dedicated API to check token 
validity, the check would fail too because realm A and B don't have trust, even 
though the token is valid from realm B's point of view.

So my question is, should YARN support running a job without validating a 
token? (Though MR1 support this because the token renewal is asynchronous as 
Harsh pointed out). 

Thanks.






 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-09 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272308#comment-14272308
 ] 

Yongjun Zhang commented on YARN-3021:
-

HI [~qwertymaniac],

I looked at your patch, I think if we can be sure renewing an invalid token 
will throw InvalidToken instead of IOException, then the fix would be good. 
However, it seems to me that there is chance to get IOException. Comments?

If we can have a way to tell YARN not to renew a token, and the client either 
wants to just to use the lifetime of the token, or the client can renew itself 
before the job finish, that will solve the problem.  One possible solution is 
to add a new parameter when submitting a job, or ro add a field in the job 
context.

Should we create a YARN jira to request this support? Seems to be a useful 
infrastructure to have...

Thanks.
 

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-09 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271642#comment-14271642
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks [~qwertymaniac] for reporting the issue and providing patch.

A bit more info:

In DelegationTokenRenewer.java
{code}
   if (!tokenList.isEmpty()) {
  // Renewing token and adding it to timer calls are separated purposefully
  // If user provides incorrect token then it should not be added for
  // renewal.
  for (DelegationTokenToRenew dtr : tokenList) {
renewToken(dtr);
  }
  for (DelegationTokenToRenew dtr : tokenList) {
addTokenToList(dtr);
setTimerForTokenRenewal(dtr);
if (LOG.isDebugEnabled()) {
  LOG.debug(Registering token for renewal for: +  service = 
  + dtr.token.getService() +  for appId =  + dtr.applicationId);
}
  }
}
{code}
It seems that YARN tries to fix an issue of MR1: instead of not checking the 
token when scheduling the job (MR1), YARN tries to do it based on the comment 
in the above code.

I posted a question to MAPREDUCE-2764 to see if there is way to control YARN, 
so YARN doesn't renew the token and let the client to do the renewal. Not sure 
whether it's feasible yet.

See 

https://issues.apache.org/jira/browse/MAPREDUCE-2764?focusedCommentId=14270283page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14270283

and the one after.

Thanks.



 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)