date:20190806

[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-06 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901762#comment-16901762
 ] 

Szilard Nemeth commented on YARN-9667:
--

Thanks [~eyang]!

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9715) [YARN UI2] yarn-container-log support for https Knox Gateway url in nodes page

2019-08-06 Thread Akhil PB (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-9715:
---
Summary: [YARN UI2] yarn-container-log support for https Knox Gateway url 
in nodes page  (was: YARN UI2 - yarn-container-log support for https Knox 
Gateway url)

> [YARN UI2] yarn-container-log support for https Knox Gateway url in nodes page
> --
>
> Key: YARN-9715
> URL: https://issues.apache.org/jira/browse/YARN-9715
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Prabhu Joseph
>Assignee: Akhil PB
>Priority: Major
>
> Currently yarn-container-log (UI2 - Nodes - List of Containers - log file) 
> creates url with node scheme (http) and nodeHttpAddress. This does not work 
> with Knox Gateway https url. The logic to construct url can be improved to 
> accept both normal and knox case. The similar way is used in Applications -> 
> Logs Section. 
> And also UI2 - Nodes - List of Containers - log file does not have pagination 
> support for log file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9688) Variable description error of method in stateMachine class

2019-08-06 Thread runzhou wu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhou wu updated YARN-9688:
-
Labels: Newbie  (was: )

>  Variable description error of method in stateMachine class
> ---
>
> Key: YARN-9688
> URL: https://issues.apache.org/jira/browse/YARN-9688
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: runzhou wu
>Assignee: runzhou wu
>Priority: Trivial
>  Labels: Newbie
> Attachments: YARN-9688.001.patch
>
>
> StateMachineFactory class 
> /**
>  * Effect a transition due to the effecting stimulus.
>  * @param {color:#FF}state{color} current state
>  * @param eventType trigger to initiate the transition
>  * @param {color:#FF}cause{color} causal eventType context
>  * @return transitioned state
>  */
> private STATE doTransition
>  (OPERAND operand, STATE oldState, EVENTTYPE eventType, EVENT event)
>  throws InvalidStateTransitionException {



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9692) ContainerAllocationExpirer is missspelled

2019-08-06 Thread runzhou wu (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhou wu updated YARN-9692:
-
Labels: newbie  (was: )

> ContainerAllocationExpirer is missspelled
> -
>
> Key: YARN-9692
> URL: https://issues.apache.org/jira/browse/YARN-9692
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: runzhou wu
>Assignee: runzhou wu
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-9692.001.patch
>
>
> The class ContainerAllocationExpirer is missspelled.
> I think it should be changed to ContainerAllocationExpired



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file

2019-08-06 Thread Eric Badger (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901665#comment-16901665
 ] 

Eric Badger commented on YARN-9527:
---

[~billie.rina...@gmail.com], [~djp], [~eyang], [~bibinchundatt], you've all 
committed changes to the ResourceLocalizationService recently. Could one of you 
give an additional review on this change? 

> Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
> -
>
> Key: YARN-9527
> URL: https://issues.apache.org/jira/browse/YARN-9527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.5, 3.1.2
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-9527.001.patch, YARN-9527.002.patch, 
> YARN-9527.003.patch, YARN-9527.004.patch
>
>
> A rogue ContainerLocalizer can get stuck in a loop continuously downloading 
> the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" 
> exception on each iteration.  Sometimes this continues long enough that it 
> fills up a disk or depletes available inodes for the filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-06 Thread Zhankun Tang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901656#comment-16901656
 ] 

Zhankun Tang edited comment on YARN-9721 at 8/7/19 3:20 AM:


[~yuan_zac], Thanks for raising this issue! This is very helpful in a hybrid 
elastic environment.

I'm checking this story to get a more clear understanding. BTW, which solution 
do you prefer?


was (Author: tangzhankun):
[~yuan_zac], Thanks for raising this issue! This is very helpful in a hybrid 
environment.

I'm checking this story to get a more clear understanding. BTW, which solution 
do you prefer?

> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>   !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-06 Thread Zhankun Tang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901656#comment-16901656
 ] 

Zhankun Tang commented on YARN-9721:


[~yuan_zac], Thanks for raising this issue! This is very helpful in a hybrid 
environment.

I'm checking this story to get a more clear understanding. BTW, which solution 
do you prefer?

> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>   !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9442) container working directory has group read permissions

2019-08-06 Thread Eric Badger (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901647#comment-16901647
 ] 

Eric Badger commented on YARN-9442:
---

+1 lgtm. I'll commit in a day or two if there are no objections.

> container working directory has group read permissions
> --
>
> Key: YARN-9442
> URL: https://issues.apache.org/jira/browse/YARN-9442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.2
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-9442.001.patch, YARN-9442.002.patch
>
>
> Container working directories are currently created with permissions 0750, 
> owned by the user and with the group set to the node manager group.
> Is there any reason why these directories need group read permissions?
> I have been testing with group read permissions removed and so far I haven't 
> encountered any problems.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2019-08-06 Thread Xie YiFan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901638#comment-16901638
 ] 

Xie YiFan commented on YARN-6539:
-

[~subru] Could you review this patch for me?

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion

2019-08-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901604#comment-16901604
 ] 

Hadoop QA commented on YARN-9564:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
51s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange}  0m  
6s{color} | {color:orange} The patch generated 113 new + 0 unchanged - 0 fixed 
= 113 total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
14s{color} | {color:green} hadoop-assemblies in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}203m 
15s{color} | {color:green} hadoop-yarn in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}296m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9564 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976847/YARN-9564.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  xml  pylint  |
| uname | Linux b5a644c5e0af 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b77761b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| pylint | v1.9.2 |
| pylint | 
https://builds.apache.org/job/PreCommit-YARN-Build/24483/artifact/out/diff-patch-pylint.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24483/testReport/ |
| Max. process+thread count | 896 (vs. ulimi

[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed

2019-08-06 Thread kevin su (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901554#comment-16901554
 ] 

kevin su commented on YARN-9678:


[~jojochuang] Thank you so much 

> TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
> --
>
> Key: YARN-9678
> URL: https://issues.apache.org/jira/browse/YARN-9678
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-9678.001.patch
>
>
> Their respective production classes are GpuResourceHandlerImpl and 
> FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase 
> classnames.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9559:

Comment: was deleted

(was: Attached a branch-3.2 version. It's the same as trunk modulo trivial 
conflicts.)

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9559:

Attachment: (was: YARN-9559-branch-3.2.001.patch)

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901501#comment-16901501
 ] 

Hudson commented on YARN-9559:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17050 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17050/])
YARN-9559. Create AbstractContainersLauncher for pluggable (haibochen: rev 
f51702d5398531835b24d812f6f95094a0e0493e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/AbstractContainersLauncher.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/package-info.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9559-branch-3.2.001.patch, YARN-9559.001.patch, 
> YARN-9559.002.patch, YARN-9559.003.patch, YARN-9559.004.patch, 
> YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion

2019-08-06 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901500#comment-16901500
 ] 

Eric Yang commented on YARN-9564:
-

[~ebadger] {quote}Yes, you'll need to run this script as sudo due to a few of 
the commands in the script. It's probably easiest to run the whole script as 
root, but I like to run as little as possible as root. I could require the 
script run as root and then drop privileges when they aren't needed.{quote}

My first choice is to route this operation via YARN daemon to perform the 
privilege escalation operations similar to container launch.  This would put 
the usage of the command similar to "docker build" that requires a trusted 
daemon to validate security then perform the image build operation.  The second 
choice is to check the current user is a privileged user and run accordingly.

{quote}I believe that Craig Condit has a java version of something similar to 
this tool. I don't think that I'm going to have time to rewrite this in Java, 
but we might be able to leverage his tool if you think that approach is 
better.{quote}

[~ccondit]'s tool provides basic flatten of docker image to squashfs.  The 
python implementation provides more metadata management of layers.  Unless 
there is effort to add metadata management to Craig's version.  They are not 
equal in functionality.  I can not recommend to change direction unless someone 
willing put in the effort to covert the python work to Java.  I am shy from 
committing python version at this time because external dependencies make this 
script incomplete to function as standalone unit.  I think it is too hard to 
replicate for normal users.  The script can improves on detecting 
pre-requisites dependencies ahead of time instead of error out half way of the 
script execution.

> Create docker-to-squash tool for image conversion
> -
>
> Key: YARN-9564
> URL: https://issues.apache.org/jira/browse/YARN-9564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9564.001.patch, YARN-9564.002.patch
>
>
> The new runc runtime uses docker images that are converted into multiple 
> squashfs images. Each layer of the docker image will get its own squashfs 
> image. We need a tool to help automate the creation of these squashfs images 
> when all we have is a docker image



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9720) MR job submitted to a queue with default partition accessing the non-exclusive label resources

2019-08-06 Thread Eric Payne (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901499#comment-16901499
 ] 

Eric Payne commented on YARN-9720:
--

[~gb.ana...@gmail.com], can you please attach a copy of your 
capacity-scheduler.xml to show the queue and label configuration properties?

> MR job submitted to a queue with default partition accessing the 
> non-exclusive label resources
> --
>
> Key: YARN-9720
> URL: https://issues.apache.org/jira/browse/YARN-9720
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 3.1.1, 3.1.2
>Reporter: ANANDA G B
>Assignee: ANANDA G B
>Priority: Major
> Attachments: Issue.png
>
>
> When MR job is submitted to a queue1 with default partition, then it is 
> accessing non-exclusive partition resources. Please find the attachments.
> MR Job command:
> ./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.0201.jar 
> pi -Dmapreduce.job.queuename=queue1 -Dmapreduce.job.node-label-expression= 10 
> 10
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9442) container working directory has group read permissions

2019-08-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901490#comment-16901490
 ] 

Hadoop QA commented on YARN-9442:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 14s{color} 
| {color:red} YARN-9442 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9442 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12965018/YARN-9442.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24485/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> container working directory has group read permissions
> --
>
> Key: YARN-9442
> URL: https://issues.apache.org/jira/browse/YARN-9442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.2
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-9442.001.patch, YARN-9442.002.patch
>
>
> Container working directories are currently created with permissions 0750, 
> owned by the user and with the group set to the node manager group.
> Is there any reason why these directories need group read permissions?
> I have been testing with group read permissions removed and so far I haven't 
> encountered any problems.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2019-08-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901483#comment-16901483
 ] 

Hadoop QA commented on YARN-6315:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 11s{color} 
| {color:red} YARN-6315 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6315 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12901695/YARN-6315.006.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24484/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9527) Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file

2019-08-06 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901473#comment-16901473
 ] 

Jim Brennan commented on YARN-9527:
---

We have been running with this patch on one of our large research clusters for 
about a month.  I scanned for this issue again today and there were no 
instances of it.  That is not definitive, but it is a good sign.  We also have 
not had any new problems reported as a result of this change.

I will continue to monitor our clusters for this.

[~ebadger], did you want to see if we can get some other reviewers for this 
patch?

 

> Rogue LocalizerRunner/ContainerLocalizer repeatedly downloading same file
> -
>
> Key: YARN-9527
> URL: https://issues.apache.org/jira/browse/YARN-9527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.5, 3.1.2
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-9527.001.patch, YARN-9527.002.patch, 
> YARN-9527.003.patch, YARN-9527.004.patch
>
>
> A rogue ContainerLocalizer can get stuck in a loop continuously downloading 
> the same file while generating an "Invalid event: LOCALIZED at LOCALIZED" 
> exception on each iteration.  Sometimes this continues long enough that it 
> fills up a disk or depletes available inodes for the filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Jonathan Hung (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901467#comment-16901467
 ] 

Jonathan Hung commented on YARN-9559:
-

Attached a branch-3.2 version. It's the same as trunk modulo trivial conflicts.

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9559-branch-3.2.001.patch, YARN-9559.001.patch, 
> YARN-9559.002.patch, YARN-9559.003.patch, YARN-9559.004.patch, 
> YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9559:

Attachment: YARN-9559-branch-3.2.001.patch

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9559-branch-3.2.001.patch, YARN-9559.001.patch, 
> YARN-9559.002.patch, YARN-9559.003.patch, YARN-9559.004.patch, 
> YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Haibo Chen (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-9559:
-
Fix Version/s: 3.3.0

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Haibo Chen (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901450#comment-16901450
 ] 

Haibo Chen commented on YARN-9559:
--

The unit test failure is reported at 

YARN-5857, independent of the change here. Committing to trunk soon.

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9442) container working directory has group read permissions

2019-08-06 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901443#comment-16901443
 ] 

Jim Brennan commented on YARN-9442:
---

[~eyang], [~ebadger], [~shaneku...@gmail.com], [~jeagles], any further comments 
on this?

 

> container working directory has group read permissions
> --
>
> Key: YARN-9442
> URL: https://issues.apache.org/jira/browse/YARN-9442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.2
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-9442.001.patch, YARN-9442.002.patch
>
>
> Container working directories are currently created with permissions 0750, 
> owned by the user and with the group set to the node manager group.
> Is there any reason why these directories need group read permissions?
> I have been testing with group read permissions removed and so far I haven't 
> encountered any problems.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901433#comment-16901433
 ] 

Hadoop QA commented on YARN-9559:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 318 unchanged - 2 fixed = 318 total (was 320) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
56s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
25s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 50s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
57s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}117m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9559 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976

[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2019-08-06 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901432#comment-16901432
 ] 

Jim Brennan commented on YARN-8045:
---

The patch for 2.8 looks good to me.  +1 (non-binding)

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.0.4, 2.8.6, 2.9.3, 3.1.3
>
> Attachments: YARN-8045.001-branch-2.8.patch, YARN-8045.001.patch
>
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2019-08-06 Thread Kuhu Shukla (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901428#comment-16901428
 ] 

Kuhu Shukla commented on YARN-6315:
---

Thanks for the ping Eric and sorry about the delay on this. This is not a 
trivial change when it comes to archives and directories and I would have 
difficulty making time for this patch rework. I apologize and please feel free 
to reassign and use the existing patch if it is any good. :(

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2019-08-06 Thread Eric Payne (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901424#comment-16901424
 ] 

Eric Payne commented on YARN-6315:
--

[~kshukla], this came up internally for us recently. Do you plan on addressing 
the comments from [~jlowe] above?

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion

2019-08-06 Thread Eric Badger (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901413#comment-16901413
 ] 

Eric Badger commented on YARN-9564:
---

Patch 002 addresses the rest of your review comments, [~eyang]. I still have 
open questions from your comments #5 and #7 from my previous comment.

> Create docker-to-squash tool for image conversion
> -
>
> Key: YARN-9564
> URL: https://issues.apache.org/jira/browse/YARN-9564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9564.001.patch, YARN-9564.002.patch
>
>
> The new runc runtime uses docker images that are converted into multiple 
> squashfs images. Each layer of the docker image will get its own squashfs 
> image. We need a tool to help automate the creation of these squashfs images 
> when all we have is a docker image



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9564) Create docker-to-squash tool for image conversion

2019-08-06 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9564:
--
Attachment: YARN-9564.002.patch

> Create docker-to-squash tool for image conversion
> -
>
> Key: YARN-9564
> URL: https://issues.apache.org/jira/browse/YARN-9564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9564.001.patch, YARN-9564.002.patch
>
>
> The new runc runtime uses docker images that are converted into multiple 
> squashfs images. Each layer of the docker image will get its own squashfs 
> image. We need a tool to help automate the creation of these squashfs images 
> when all we have is a docker image



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Haibo Chen (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901373#comment-16901373
 ] 

Haibo Chen commented on YARN-9559:
--

+1 on the latest patch pending Jenkins.

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread Steve Loughran (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901360#comment-16901360
 ] 

Steve Loughran commented on YARN-9724:
--

looking  @ the spark side of things, it's coming from the line
{code}
  logInfo("Requesting a new application from cluster with %d NodeManagers"
.format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
{code}

That is, it's not actually doing much and could probably be reworked to be less 
brittle. 

For now, set the logger  
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend to log at WARN, 
not INFO and that line should be skipped.

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop：3.1.0
> Spark：2.3.3
>Reporter: panlijie
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.SparkSession$Builde

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Jonathan Hung (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901352#comment-16901352
 ] 

Jonathan Hung commented on YARN-9559:
-

Thanks Adam! Attached 005 to fix checkstyle

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Jonathan Hung (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9559:

Attachment: YARN-9559.005.patch

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch, YARN-9559.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed

2019-08-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901312#comment-16901312
 ] 

Hudson commented on YARN-9678:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17048 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17048/])
YARN-9678. Addendum: TestGpuResourceHandler / TestFpgaResourceHandler (weichiu: 
rev 7c2042a44d1cd7e60b911cb40642cdd9c443b076)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandlerImpl.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandlerImpl.java


> TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
> --
>
> Key: YARN-9678
> URL: https://issues.apache.org/jira/browse/YARN-9678
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-9678.001.patch
>
>
> Their respective production classes are GpuResourceHandlerImpl and 
> FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase 
> classnames.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed

2019-08-06 Thread Wei-Chiu Chuang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901308#comment-16901308
 ] 

Wei-Chiu Chuang commented on YARN-9678:
---

Added an addendum patch since I forgot to add the new files into the commit. It 
is now updated.

> TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
> --
>
> Key: YARN-9678
> URL: https://issues.apache.org/jira/browse/YARN-9678
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Fix For: 3.3.0
>
> Attachments: YARN-9678.001.patch
>
>
> Their respective production classes are GpuResourceHandlerImpl and 
> FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase 
> classnames.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-08-06 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901299#comment-16901299
 ] 

Íñigo Goiri commented on YARN-9694:
---

+1 on  [^YARN-9694.004.patch].

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch, 
> YARN-9694.003.patch, YARN-9694.004.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread Wei-Chiu Chuang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-9724:
--
Issue Type: Bug  (was: New Feature)

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop：3.1.0
> Spark：2.3.3
>Reporter: panlijie
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apac

[jira] [Resolved] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread Wei-Chiu Chuang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved YARN-9724.
---
  Resolution: Duplicate
Release Note:   (was: has solved)

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop：3.1.0
> Spark：2.3.3
>Reporter: panlijie
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method

[jira] [Reopened] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread Wei-Chiu Chuang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened YARN-9724:
---

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop：3.1.0
> Spark：2.3.3
>Reporter: panlijie
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.JavaMainApplication.st

[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed

2019-08-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901220#comment-16901220
 ] 

Hudson commented on YARN-9678:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17047 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17047/])
YARN-9678. TestGpuResourceHandler / TestFpgaResourceHandler should be (weichiu: 
rev b8bf09ba3d2514ccfa3c6beb4a7530cd2f3555c7)
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandler.java


> TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
> --
>
> Key: YARN-9678
> URL: https://issues.apache.org/jira/browse/YARN-9678
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-9678.001.patch
>
>
> Their respective production classes are GpuResourceHandlerImpl and 
> FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase 
> classnames.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2019-08-06 Thread Eric Badger (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901179#comment-16901179
 ] 

Eric Badger commented on YARN-8045:
---

[~shaneku...@gmail.com], can you review the 2.8 patch?

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.0.4, 2.8.6, 2.9.3, 3.1.3
>
> Attachments: YARN-8045.001-branch-2.8.patch, YARN-8045.001.patch
>
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901166#comment-16901166
 ] 

Adam Antal commented on YARN-9559:
--

(aside from the checkstyle error)

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-06 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901165#comment-16901165
 ] 

Adam Antal commented on YARN-9559:
--

Perfect. +1 (non-binding).

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9438) launchTime not written to state store for running applications

2019-08-06 Thread Adam Antal (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901156#comment-16901156
 ] 

Adam Antal commented on YARN-9438:
--

Trunk patch LGTM (non-binding).

> launchTime not written to state store for running applications
> --
>
> Key: YARN-9438
> URL: https://issues.apache.org/jira/browse/YARN-9438
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9438-branch-2.001.patch, 
> YARN-9438-branch-2.002.patch, YARN-9438.001.patch, YARN-9438.002.patch, 
> YARN-9438.003.patch
>
>
> launchTime is only saved to state store after application finishes, so if 
> restart happens, any running applications will have launchTime set as -1 
> (since this is the default timestamp of the recovery event).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-08-06 Thread Abhishek Modi (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901105#comment-16901105
 ] 

Abhishek Modi commented on YARN-9694:
-

Tested the latest patch on scale with 4500 nodes json and everything worked 
fine.

[~elgoiri] could you please review it. Thanks.

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch, 
> YARN-9694.003.patch, YARN-9694.004.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9410) Typo in documentation: Using FPGA On YARN

2019-08-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901075#comment-16901075
 ] 

Hudson commented on YARN-9410:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17045 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17045/])
YARN-9410. Typo in documentation: Using FPGA On YARN (#1220) Contributed 
(weichiu: rev 1c53ce0cda9b21ec7b56c76a65c7b742491b1a67)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/UsingFPGA.md


> Typo in documentation: Using FPGA On YARN 
> --
>
> Key: YARN-9410
> URL: https://issues.apache.org/jira/browse/YARN-9410
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Fix For: 3.3.0
>
>
> fpag.major-device-number should be changed to fpga... 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9678) TestGpuResourceHandler / TestFpgaResourceHandler should be renamed

2019-08-06 Thread Wei-Chiu Chuang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901066#comment-16901066
 ] 

Wei-Chiu Chuang commented on YARN-9678:
---

LGTM +1

> TestGpuResourceHandler / TestFpgaResourceHandler should be renamed
> --
>
> Key: YARN-9678
> URL: https://issues.apache.org/jira/browse/YARN-9678
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: kevin su
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-9678.001.patch
>
>
> Their respective production classes are GpuResourceHandlerImpl and 
> FpgaResourceHandlerImpl so we are missing the "Impl" from the testcase 
> classnames.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9571) RM state store purging should be configurable if log aggregation does not terminate

2019-08-06 Thread Craig Condit (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YARN-9571.

Resolution: Won't Fix

Closing as won't fix since it appears we will revert YARN-4946 instead.

> RM state store purging should be configurable if log aggregation does not 
> terminate
> ---
>
> Key: YARN-9571
> URL: https://issues.apache.org/jira/browse/YARN-9571
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
> Attachments: YARN-9571.001.patch
>
>
> YARN-4946 introduced logic which prevents applications from being removed 
> from the RM State Store unless log aggregation has reached a terminal state. 
> However, there are cases where log aggregation may not always complete or 
> fail cleanly (we've seen instances where it shows as NOT_STARTED on 
> production clusters), and in this case the state store will continue to build 
> up large numbers of applications.
> We should make this behavior configurable. I propose to add a new 
> configuration:
> {code:java}
> yarn.resourcemanager.completed-application-ttl-secs
> {code}
> This can be used to force removal of an application if it would otherwise be 
> purged from the state store if not for log aggregation status. If this 
> configuration is set to a positive value, it will be used in conjunction with 
> the application finish time to determine whether or not to purge the app. The 
> app would be removed if the max # of completed applications has been reached 
> AND either the app has completed more than ttl-secs seconds ago OR the app 
> has completed log aggregation. For backwards compatibility we can set the 
> default value of this parameter to zero, which would disable the new logic.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

2019-08-06 Thread Craig Condit (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901040#comment-16901040
 ] 

Craig Condit commented on YARN-4946:


I'd be in favor of reverting, but I will defer to [~wangda].

> RM should not consider an application as COMPLETED when log aggregation is 
> not in a terminal state
> --
>
> Key: YARN-4946
> URL: https://issues.apache.org/jira/browse/YARN-4946
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-4946.001.patch, YARN-4946.002.patch, 
> YARN-4946.003.patch, YARN-4946.004.patch
>
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
> Yarn App into a HAR file.  When run, it seeds the list by looking at the 
> aggregated logs directory, and then filters out ineligible apps.  One of the 
> criteria involves checking with the RM that an Application's log aggregation 
> status is not still running and has not failed.  When the RM "forgets" about 
> an older completed Application (e.g. RM failover, enough time has passed, 
> etc), the tool won't find the Application in the RM and will just assume that 
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed 
> from its history) until the aggregation status has reached a terminal state 
> (e.g. SUCCEEDED, FAILED, TIME_OUT).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread panlijie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

panlijie resolved YARN-9724.

  Resolution: Fixed
Release Note: has solved
Target Version/s: 3.2.0

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop：3.1.0
> Spark：2.3.3
>Reporter: panlijie
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Met

[jira] [Updated] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread panlijie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

panlijie updated YARN-9724:
---
Fix Version/s: 3.2.0

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop：3.1.0
> Spark：2.3.3
>Reporter: panlijie
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.JavaM

[jira] [Commented] (YARN-9723) ApplicationPlacementContext is not required for terminated jobs during recovery

2019-08-06 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900900#comment-16900900
 ] 

Prabhu Joseph commented on YARN-9723:
-

[~eyang] Can you review this Jira when you get time. This fixes RMAppManager 
recovery to not call placeApplication for terminated jobs as 
ApplicationPlacementContext is not required for those jobs.

Failing testcase is not related and reported in YARN-9333.

> ApplicationPlacementContext is not required for terminated jobs during 
> recovery
> ---
>
> Key: YARN-9723
> URL: https://issues.apache.org/jira/browse/YARN-9723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9723-001.patch
>
>
>Placement of application (RMAppManager.placeApplication) is called for all 
> the jobs during recovery. This can be ignored for the terminated jobs.
> {code}
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.AppNameMappingPlacementRule.getPlacementForApp(AppNameMappingPlacementRule.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:867)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:421)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:410)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:637)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1536)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9723) ApplicationPlacementContext is not required for terminated jobs during recovery

2019-08-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900875#comment-16900875
 ] 

Hadoop QA commented on YARN-9723:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 50s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}127m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9723 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976795/YARN-9723-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b2271482e27f 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1127215 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24481/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24481/testReport/ |
| Max. process+thread count | 888 (vs. ulimit of 1) |

[jira] [Commented] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread panlijie (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900836#comment-16900836
 ] 

panlijie commented on YARN-9724:


[~Prabhu Joseph] Thank you , I'll keep care for this update

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop：3.1.0
> Spark：2.3.3
>Reporter: panlijie
>Priority: Major
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.

[jira] [Commented] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900821#comment-16900821
 ] 

Prabhu Joseph commented on YARN-9724:
-

[~panlijie] This will be fixed by YARN-8699 which has implemented the 
getClusterMetrics in FederationClientInterceptor.

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop：3.1.0
> Spark：2.3.3
>Reporter: panlijie
>Priority: Major
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invok

[jira] [Updated] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread panlijie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

panlijie updated YARN-9724:
---
Description: 
we have some problemes about hadoop-yarn-federation when we use  spark on 
yarn-federation

The flowing Error find :

org.apache.commons.lang.NotImplementedException: Code is not implemented

at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
 at 
org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
 at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
 at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
 at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
 at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
 at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
 at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
 at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
 at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
 at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
 at org.apache.spark.SparkContext.(SparkContext.scala:500)
 at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
 at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
 at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
 at scala.Option.getOrElse(Option.scala:121)
 at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
 at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
 at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
 at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

  was:
In our HDFS and YARN Federation deploy. we run sparkDemo , The flowing Error 
find :

org.apache.commons.lang.NotImplementedException: Code is not implemented

at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(Feder

[jira] [Created] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread panlijie (JIRA)

panlijie created YARN-9724:
--

 Summary: ERROR SparkContext: Error initializing SparkContext.
 Key: YARN-9724
 URL: https://issues.apache.org/jira/browse/YARN-9724
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: federation, router, yarn
Affects Versions: 3.1.0, 3.0.0
 Environment: Hadoop：3.1.0

Spark：2.3.3
Reporter: panlijie
 Attachments: spark.log

In our HDFS and YARN Federation deploy. we run sparkDemo , The flowing Error 
find :

org.apache.commons.lang.NotImplementedException: Code is not implemented

at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
 at 
org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
 at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
 at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
 at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
 at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
 at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
 at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
 at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
 at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
 at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
 at org.apache.spark.SparkContext.(SparkContext.scala:500)
 at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
 at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
 at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
 at scala.Option.getOrElse(Option.scala:121)
 at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
 at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
 at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
 at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



-

[jira] [Updated] (YARN-9723) ApplicationPlacementContext is not required for terminated jobs during recovery

2019-08-06 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9723:

Attachment: YARN-9723-001.patch

> ApplicationPlacementContext is not required for terminated jobs during 
> recovery
> ---
>
> Key: YARN-9723
> URL: https://issues.apache.org/jira/browse/YARN-9723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9723-001.patch
>
>
>Placement of application (RMAppManager.placeApplication) is called for all 
> the jobs during recovery. This can be ignored for the terminated jobs.
> {code}
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.AppNameMappingPlacementRule.getPlacementForApp(AppNameMappingPlacementRule.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:867)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:421)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:410)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:637)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1536)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9723) ApplicationPlacementContext is not required for terminated jobs during recovery

2019-08-06 Thread Prabhu Joseph (JIRA)

Prabhu Joseph created YARN-9723:
---

 Summary: ApplicationPlacementContext is not required for 
terminated jobs during recovery
 Key: YARN-9723
 URL: https://issues.apache.org/jira/browse/YARN-9723
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


   Placement of application (RMAppManager.placeApplication) is called for all 
the jobs during recovery. This can be ignored for the terminated jobs.

{code}
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.AppNameMappingPlacementRule.getPlacementForApp(AppNameMappingPlacementRule.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:66)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:867)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:421)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:410)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:637)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1536)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-06 Thread Zac Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900727#comment-16900727
 ] 

Zac Zhou commented on YARN-9721:


 

[~sunilg]

Thanks a lot for your comments~

Maybe I could use some methods to clean up the inactive list.
 # Add a parameter like "--prune-nodes", to the command "rmadmin 
-refreshNodes". A parameter named like "prunable" can be added to RMNodes. when 
"rmadmin -refreshNodes --prune-nodes" is executed.  prunable of RMNodes should 
be true, and RMNodes will deleted by removalTimer.
 # Add a time period parameter in yarn configuration. If RMNodes stays in the 
inactive list more than that time period, delete the RMNodes.
 # Add a parameter in yarn configuration. If the parameter is true. Delete the 
RMNodes from the inactive list directly.

[~sunilg], [~leftnoteasy], [~cheersyang], [~tangzhankun] Any Ideas~

 

> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>   !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9715) YARN UI2 - yarn-container-log support for https Knox Gateway url

2019-08-06 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned YARN-9715:
---

Assignee: Akhil PB

> YARN UI2 - yarn-container-log support for https Knox Gateway url
> 
>
> Key: YARN-9715
> URL: https://issues.apache.org/jira/browse/YARN-9715
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Prabhu Joseph
>Assignee: Akhil PB
>Priority: Major
>
> Currently yarn-container-log (UI2 - Nodes - List of Containers - log file) 
> creates url with node scheme (http) and nodeHttpAddress. This does not work 
> with Knox Gateway https url. The logic to construct url can be improved to 
> accept both normal and knox case. The similar way is used in Applications -> 
> Logs Section. 
> And also UI2 - Nodes - List of Containers - log file does not have pagination 
> support for log file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

62 matches

Mail list logo