[jira] [Commented] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-07 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077891#comment-17077891
 ] 

Siddharth Ahuja commented on YARN-10001:


Hmm, build at https://builds.apache.org/job/PreCommit-YARN-Build/25822/ been 
running for 5 hr 43 min. Seems stuck at unit tests. Any idea if this has been 
seen before?

> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10001-branch-3.2.003.patch, YARN-10001.001.patch, 
> YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10128) [FederationSecurity] YARN RMAdmin commands fail when Authorization is enabled on router

2020-04-07 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T resolved YARN-10128.
--
Resolution: Fixed

> [FederationSecurity] YARN RMAdmin commands fail when Authorization is enabled 
> on router
> ---
>
> Key: YARN-10128
> URL: https://issues.apache.org/jira/browse/YARN-10128
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> Exception thrown is 
> {quote}Protocol interface 
> org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB is 
> not known., while invoking 
> ResourceManagerAdministrationProtocolPBClientImpl.refreshQueues over rm2 
> after 1 failover attempts. Trying to failover after sleeping for 44717ms.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10128) [FederationSecurity] YARN RMAdmin commands fail when Authorization is enabled on router

2020-04-07 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077862#comment-17077862
 ] 

Bilwa S T commented on YARN-10128:
--

Handled as part of Jira YARN-10110

> [FederationSecurity] YARN RMAdmin commands fail when Authorization is enabled 
> on router
> ---
>
> Key: YARN-10128
> URL: https://issues.apache.org/jira/browse/YARN-10128
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> Exception thrown is 
> {quote}Protocol interface 
> org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB is 
> not known., while invoking 
> ResourceManagerAdministrationProtocolPBClientImpl.refreshQueues over rm2 
> after 1 failover attempts. Trying to failover after sleeping for 44717ms.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9306) Detect docker image existence during container launch

2020-04-07 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned YARN-9306:
---

Assignee: (was: Bilwa S T)

> Detect docker image existence during container launch
> -
>
> Key: YARN-9306
> URL: https://issues.apache.org/jira/browse/YARN-9306
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>
> It would be good to check yarn.nodemanager.runtime.linux.docker.image-update 
> flag.  When the flag is false, and docker image doesn't exist in docker 
> cache.  Container launch should abort, and try on another node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument

2020-04-07 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YARN-10063.
--
Fix Version/s: 3.4.0
   3.3.0
   Resolution: Fixed

committed to 3.3

removed backport to 3.2 branch YARN-6586 was not backported to 3.2

> Usage output of container-executor binary needs to include --http/--https 
> argument
> --
>
> Key: YARN-10063
> URL: https://issues.apache.org/jira/browse/YARN-10063
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Minor
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10063.001.patch, YARN-10063.002.patch, 
> YARN-10063.003.patch, YARN-10063.004.patch
>
>
> YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" 
> (default) and "\--https" that is possible to be passed in to the 
> container-executor binary, see :
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L564
> and 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L521
> however, the usage output seems to have missed this:
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L74
> Raising this jira to improve this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument

2020-04-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077739#comment-17077739
 ] 

Hudson commented on YARN-10063:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18126 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18126/])
YARN-10063. Add container-executor arguments --http/--https to usage. 
(wilfreds: rev 2214005c0f11955b2c50c4d2d4bd14947dd797ba)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c


> Usage output of container-executor binary needs to include --http/--https 
> argument
> --
>
> Key: YARN-10063
> URL: https://issues.apache.org/jira/browse/YARN-10063
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Minor
> Attachments: YARN-10063.001.patch, YARN-10063.002.patch, 
> YARN-10063.003.patch, YARN-10063.004.patch
>
>
> YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" 
> (default) and "\--https" that is possible to be passed in to the 
> container-executor binary, see :
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L564
> and 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L521
> however, the usage output seems to have missed this:
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L74
> Raising this jira to improve this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-07 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077722#comment-17077722
 ] 

Siddharth Ahuja edited comment on YARN-10001 at 4/8/20, 1:23 AM:
-

Hey [~snemeth], thank you very much for your review and commits! In regards to 
branch-3.2 conflicts, I suspect the patch cannot be directly applied to 
branch-3.2 because it does not contain the getLogs() method inside 
InMemoryConfigurationStore.java. From inspection, this method comes along from 
YARN-10002 which is quite a recent change. I have uploaded a patch for 
branch-3.2 specifically that provides the method explanations for the existing 
unimplemented methods in branch-3.2. However, as per YARN-10002 - 
https://issues.apache.org/jira/browse/YARN-10002?focusedCommentId=17056066&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17056066,
 there is a pending task to backport changes to branch-3.2. Therefore, when 
that happens, getLogs() will come along but it won't have method explanations. 
Therefore, do you want to wait for the backport of YARN-10002 first to 
branch-3.2 and then commit my changes after that instead? I will let you think 
about it :) Either way, you will have patches for both branches. Thanks again! 


was (Author: sahuja):
Hey [~snemeth], thank you very much for your review and commits! In regards to 
branch-3.2 conflicts, I suspect the patch cannot be directly applied to 
branch-3.2 because it does not contain the getLogs() method inside 
InMemoryConfigurationStore.java. From inspection, this method comes along from 
YARN-10002 which is quite a recent change. I have uploaded a patch for 
branch-3.2 specifically that provides the method explanations for the existing 
unimplemented methods in branch-3.2. However, as per YARN-10002 - 
https://issues.apache.org/jira/browse/YARN-10002?focusedCommentId=17056066&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17056066,
 there is a pending task to backport changes to branch-3.2. Therefore, when 
that happens, getLogs() will come along but it won't have method explanations. 
Therefore, do you want to wait for the backport of YARN-10002 first to 
branch-3.2 and then commit my changes after that instead? I will let you think 
about it :) Thanks again! 

> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10001-branch-3.2.003.patch, YARN-10001.001.patch, 
> YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-07 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077722#comment-17077722
 ] 

Siddharth Ahuja commented on YARN-10001:


Hey [~snemeth], thank you very much for your review and commits! In regards to 
branch-3.2 conflicts, I suspect the patch cannot be directly applied to 
branch-3.2 because it does not contain the getLogs() method inside 
InMemoryConfigurationStore.java. From inspection, this method comes along from 
YARN-10002 which is quite a recent change. I have uploaded a patch for 
branch-3.2 specifically that provides the method explanations for the existing 
unimplemented methods in branch-3.2. However, as per YARN-10002 - 
https://issues.apache.org/jira/browse/YARN-10002?focusedCommentId=17056066&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17056066,
 there is a pending task to backport changes to branch-3.2. Therefore, when 
that happens, getLogs() will come along but it won't have method explanations. 
Therefore, do you want to wait for the backport of YARN-10002 first to 
branch-3.2 and then commit my changes after that instead? I will let you think 
about it :) Thanks again! 

> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10001-branch-3.2.003.patch, YARN-10001.001.patch, 
> YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-07 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja updated YARN-10001:
---
Attachment: YARN-10001-branch-3.2.003.patch

> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10001-branch-3.2.003.patch, YARN-10001.001.patch, 
> YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5277) when localizers fail due to resource timestamps being out, provide more diagnostics

2020-04-07 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077720#comment-17077720
 ] 

Hadoop QA commented on YARN-5277:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
50s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:e6455cc864d |
| JIRA Issue | YARN-5277 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12999279/YARN-5277.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7752f74e197d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 20eec95 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25821/testReport/ |
| Max. process+thread count | 456 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25821/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> when localizers fail due to resource timestamps being out, provide more 
> diagnostics
> ---

[jira] [Commented] (YARN-5277) when localizers fail due to resource timestamps being out, provide more diagnostics

2020-04-07 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077673#comment-17077673
 ] 

Siddharth Ahuja commented on YARN-5277:
---

Hi [~snemeth], thank you very much for your review and comments. 

I have gone ahead and incorporated both of your valid comments in the latest 
patch - YARN-5277.002.patch.

Thanks in advance again for your review!

> when localizers fail due to resource timestamps being out, provide more 
> diagnostics
> ---
>
> Key: YARN-5277
> URL: https://issues.apache.org/jira/browse/YARN-5277
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-5277.001.patch, YARN-5277.002.patch
>
>
> When an NM fails a resource D/L as the timestamps are wrong, there's not much 
> info, just two long values. 
> It would be good to also include the local time values, *and the current wall 
> time*. These are the things people need to know when trying to work out what 
> went wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5277) when localizers fail due to resource timestamps being out, provide more diagnostics

2020-04-07 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja updated YARN-5277:
--
Attachment: YARN-5277.002.patch

> when localizers fail due to resource timestamps being out, provide more 
> diagnostics
> ---
>
> Key: YARN-5277
> URL: https://issues.apache.org/jira/browse/YARN-5277
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-5277.001.patch, YARN-5277.002.patch
>
>
> When an NM fails a resource D/L as the timestamps are wrong, there's not much 
> info, just two long values. 
> It would be good to also include the local time values, *and the current wall 
> time*. These are the things people need to know when trying to work out what 
> went wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077664#comment-17077664
 ] 

Siddharth Ahuja commented on YARN-10207:


Hi [~snemeth], thank you so much for reviewing this and helping in getting this 
fixed in the relevant branches. I have gone ahead and raised 
https://issues.apache.org/jira/browse/YARN-10224 for the incorrect logging and 
handling of exception coming from rendering of a corrupted IFile on JHS Web UI. 
In regards to CLOSE_WAITS on the ApplicationHistoryServer process, I don't have 
any evidence for that, as such, I will not raise any JIRAs for this. Hope this 
resolves actions from my end for this JIRA. Thanks again for having this 
resolved in trunk and other relevant branches!

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch, 
> YARN-10207.003.patch, YARN-10207.004.patch, YARN-10207.branch-3.2.001.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp

[jira] [Created] (YARN-10224) Exception resulting from failure of a corrupted IFile log in JHS UI incorrectly logged as WARN with no exception stacktrace provided

2020-04-07 Thread Siddharth Ahuja (Jira)
Siddharth Ahuja created YARN-10224:
--

 Summary: Exception resulting from failure of a corrupted IFile log 
in JHS UI incorrectly logged as WARN with no exception stacktrace provided
 Key: YARN-10224
 URL: https://issues.apache.org/jira/browse/YARN-10224
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Ahuja


While investigating YARN-10207, it was noticed that when an IFile (aggregated 
log in HDFS in IndexedFormat) and is being attempted to be rendered inside the 
JHS UI, an IOException is (correctly) encountered inside a try block. 

The finally clause here -> 
https://github.com/apache/hadoop/blob/4af2556b48e01150851c7f273a254a16324ba843/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java#L900
 helps cleaning up the socket connection by closing out FSDataInputStream.

However, only a WARN log is presented to the user in the JHS logs in case of 
rendering failure and there is no stacktrace logged coming from the exception 
here - 
https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
 as the exception is just swallowed up inside the catch{} clause. This prevents 
understanding the cause of the rendering failure.

We should log the message as an ERROR along with the exception coming from the 
render failure. This is how it works with the TFile rendering failure as 
identified in YARN-10207.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10224) Exception resulting from failure of a corrupted IFile log in JHS UI incorrectly logged as WARN with no exception stacktrace provided

2020-04-07 Thread Siddharth Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja reassigned YARN-10224:
--

Assignee: Siddharth Ahuja

> Exception resulting from failure of a corrupted IFile log in JHS UI 
> incorrectly logged as WARN with no exception stacktrace provided
> 
>
> Key: YARN-10224
> URL: https://issues.apache.org/jira/browse/YARN-10224
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
>
> While investigating YARN-10207, it was noticed that when an IFile (aggregated 
> log in HDFS in IndexedFormat) and is being attempted to be rendered inside 
> the JHS UI, an IOException is (correctly) encountered inside a try block. 
> The finally clause here -> 
> https://github.com/apache/hadoop/blob/4af2556b48e01150851c7f273a254a16324ba843/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java#L900
>  helps cleaning up the socket connection by closing out FSDataInputStream.
> However, only a WARN log is presented to the user in the JHS logs in case of 
> rendering failure and there is no stacktrace logged coming from the exception 
> here - 
> https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
>  as the exception is just swallowed up inside the catch{} clause. This 
> prevents understanding the cause of the rendering failure.
> We should log the message as an ERROR along with the exception coming from 
> the render failure. This is how it works with the TFile rendering failure as 
> identified in YARN-10207.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10212) Create separate configuration for max global AM attempts

2020-04-07 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077515#comment-17077515
 ] 

Jonathan Hung commented on YARN-10212:
--

Thanks [~BilwaST]. A few comments:
 * The javadoc for RM_AM_MAX_ATTEMPTS, can we change it to something like "The 
maximum number of application attempts for an application, if unset by the user"
 * Can we change the new config to at least have "am"? e.g.  
yarn.resourcemanager.global.max-attempts to 
yarn.resourcemanager.am.global-max-attempts? 
 * In ResourceManager.java I think we should validate both RM_AM_MAX_ATTEMPTS 
and GLOBAL_RM_AM_MAX_ATTEMPTS (and change the message in the RuntimeExceptions 
accordingly)

 * In RMAppImpl, we need to split this case into two:
{noformat}
if (individualMaxAppAttempts <= 0 ||
individualMaxAppAttempts > globalMaxAppAttempts) {
  this.maxAppAttempts = globalMaxAppAttempts; {noformat}
If individualMaxAppAttempts <= 0, set this.maxAppAttempts to 
RM_AM_MAX_ATTEMPTS. If individualMaxAppAttempts > globalMaxAppAttempts, set 
this.maxAppAttempts to globalMaxAppAttempts

 * In the test case:
{noformat}
​ int[] rmAmMaxAttempts = new int[] { 8, 0 };{noformat}
I don't think 0 is a valid config for RM_AM_MAX_ATTEMPTS, can we set this to \{ 
8, 1 }?
 * Based on the above changes we will need to change the expected values in the 
test case from
{noformat}
int[][] expectedNums = new int[][]{
new int[]{ 9, 10, 10, 10 }, {noformat}
to 

 * 
{noformat}
int[][] expectedNums = new int[][]{
new int[]{ 9, 10, 10, 8 }, {noformat}

> Create separate configuration for max global AM attempts
> 
>
> Key: YARN-10212
> URL: https://issues.apache.org/jira/browse/YARN-10212
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10212.001.patch
>
>
> Right now user's default max AM attempts is set to the same as global max AM 
> attempts:
> {noformat}
> int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
> YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS); {noformat}
> If we want to increase global max AM attempts, it will also increase the 
> default. So we should create a separate global AM max attempts config to 
> separate the two.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10209) DistributedShell should initialize TimelineClient conditionally

2020-04-07 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077463#comment-17077463
 ] 

Hadoop QA commented on YARN-10209:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} docker {color} | {color:blue}  0m 
10s{color} | {color:blue} Dockerfile 
'/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/sourcedir/dev-support/docker/Dockerfile'
 not found, falling back to built-in. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
13s{color} | {color:red} Docker failed to build yetus/hadoop:date2020-04-07. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10209 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12999255/YARN-10209.branch-2.6.0.v2.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25820/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> DistributedShell should initialize TimelineClient conditionally
> ---
>
> Key: YARN-10209
> URL: https://issues.apache.org/jira/browse/YARN-10209
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Benjamin Teke
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 2.6.0
>
> Attachments: YARN-10209.001.patch, YARN-10209.branch-2.6.0.patch, 
> YARN-10209.branch-2.6.0.v2.patch
>
>
> YarnConfiguration was changed along with the introduction of newer Timeline 
> Service versions to include configuration about the used version. In Hadoop 
> 2.6.0 the distributed shell instantiates Timeline Client whether if it's 
> enabled in the configuration or not. Running this distributed shell on newer 
> Hadoop versions (where the new Timeline Service is available) causes an 
> exception, because the bundled YarnConfiguration doesn't have the necessary 
> version configuration property. Making the Timeline Client initialization 
> conditional the distributed shell would run at least with disabled Timeline 
> Service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10209) DistributedShell should initialize TimelineClient conditionally

2020-04-07 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077454#comment-17077454
 ] 

Bilwa S T commented on YARN-10209:
--

[~bteke] Thanks for reviewing. I have updated patch.

can you please tell me who can help me in checking why build failed???

> DistributedShell should initialize TimelineClient conditionally
> ---
>
> Key: YARN-10209
> URL: https://issues.apache.org/jira/browse/YARN-10209
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Benjamin Teke
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 2.6.0
>
> Attachments: YARN-10209.001.patch, YARN-10209.branch-2.6.0.patch, 
> YARN-10209.branch-2.6.0.v2.patch
>
>
> YarnConfiguration was changed along with the introduction of newer Timeline 
> Service versions to include configuration about the used version. In Hadoop 
> 2.6.0 the distributed shell instantiates Timeline Client whether if it's 
> enabled in the configuration or not. Running this distributed shell on newer 
> Hadoop versions (where the new Timeline Service is available) causes an 
> exception, because the bundled YarnConfiguration doesn't have the necessary 
> version configuration property. Making the Timeline Client initialization 
> conditional the distributed shell would run at least with disabled Timeline 
> Service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10209) DistributedShell should initialize TimelineClient conditionally

2020-04-07 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10209:
-
Attachment: YARN-10209.branch-2.6.0.v2.patch

> DistributedShell should initialize TimelineClient conditionally
> ---
>
> Key: YARN-10209
> URL: https://issues.apache.org/jira/browse/YARN-10209
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Benjamin Teke
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 2.6.0
>
> Attachments: YARN-10209.001.patch, YARN-10209.branch-2.6.0.patch, 
> YARN-10209.branch-2.6.0.v2.patch
>
>
> YarnConfiguration was changed along with the introduction of newer Timeline 
> Service versions to include configuration about the used version. In Hadoop 
> 2.6.0 the distributed shell instantiates Timeline Client whether if it's 
> enabled in the configuration or not. Running this distributed shell on newer 
> Hadoop versions (where the new Timeline Service is available) causes an 
> exception, because the bundled YarnConfiguration doesn't have the necessary 
> version configuration property. Making the Timeline Client initialization 
> conditional the distributed shell would run at least with disabled Timeline 
> Service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10209) DistributedShell should initialize TimelineClient conditionally

2020-04-07 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077392#comment-17077392
 ] 

Benjamin Teke commented on YARN-10209:
--

Hi [~BilwaST],

Thanks for providing the patch! I have just one minor nit about it: assigning 
false to timelineServiceEnabled by default is unnecessary, it is also omitted 
above on the done boolean variable. Otherwise it looks good to me. Not sure 
what is wrong with the build though.

> DistributedShell should initialize TimelineClient conditionally
> ---
>
> Key: YARN-10209
> URL: https://issues.apache.org/jira/browse/YARN-10209
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Benjamin Teke
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 2.6.0
>
> Attachments: YARN-10209.001.patch, YARN-10209.branch-2.6.0.patch
>
>
> YarnConfiguration was changed along with the introduction of newer Timeline 
> Service versions to include configuration about the used version. In Hadoop 
> 2.6.0 the distributed shell instantiates Timeline Client whether if it's 
> enabled in the configuration or not. Running this distributed shell on newer 
> Hadoop versions (where the new Timeline Service is available) causes an 
> exception, because the bundled YarnConfiguration doesn't have the necessary 
> version configuration property. Making the Timeline Client initialization 
> conditional the distributed shell would run at least with disabled Timeline 
> Service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077384#comment-17077384
 ] 

Hadoop QA commented on YARN-10207:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
14s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
24s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:11aff6c269f |
| JIRA Issue | YARN-10207 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12999248/YARN-10207.branch-3.2.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 139b8c6182d4 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.2 / 11aff6c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25819/testReport/ |
| Max. process+thread count | 306 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25819/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> CLOSE_WAIT socket connection leaks during rendering of (corrupted) agg

[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077332#comment-17077332
 ] 

Hudson commented on YARN-10207:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18124 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18124/])
YARN-10207. CLOSE_WAIT socket connection leaks during rendering of (snemeth: 
rev bffb43b00e14a23d96f08b5a5df01e7f760b11ed)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch, 
> YARN-10207.003.patch, YARN-10207.004.patch, YARN-10207.branch-3.2.001.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>  

[jira] [Comment Edited] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077325#comment-17077325
 ] 

Szilard Nemeth edited comment on YARN-10207 at 4/7/20, 3:10 PM:


[~sahuja],
Committed to trunk and branch-3.3.
For branch-3.2, I uploaded a patch ( [^YARN-10207.branch-3.2.001.patch] ) to 
have a Jenkins results


was (Author: snemeth):
[~sahuja],
Committed to trunk and branch-3.3.
For branch-3.2, I uploaded a patch 
(https://issues.apache.org/jira/secure/attachment/12999248/YARN-10207.branch-3.2.001.patch)
 to have a Jenkins results

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch, 
> YARN-10207.003.patch, YARN-10207.004.patch, YARN-10207.branch-3.2.001.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at 
>

[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077325#comment-17077325
 ] 

Szilard Nemeth commented on YARN-10207:
---

[~sahuja],
Committed to trunk and branch-3.3.
For branch-3.2, I uploaded a patch 
(https://issues.apache.org/jira/secure/attachment/12999248/YARN-10207.branch-3.2.001.patch)
 to have a Jenkins results

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch, 
> YARN-10207.003.patch, YARN-10207.004.patch, YARN-10207.branch-3.2.001.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202)
>   at sun.reflect.NativeMethodAcce

[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10207:
--
Attachment: YARN-10207.branch-3.2.001.patch

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch, 
> YARN-10207.003.patch, YARN-10207.004.patch, YARN-10207.branch-3.2.001.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegatin

[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10207:
--
Fix Version/s: 3.3.0

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch, 
> YARN-10207.003.patch, YARN-10207.004.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lan

[jira] [Updated] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10207:
--
Fix Version/s: 3.4.0

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch, 
> YARN-10207.003.patch, YARN-10207.004.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628)
>   at 
> org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:588)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.TFileAggregatedLogsBlock.render(TFileAggregatedLogsBlock.java:111)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.renderAggregatedLogsBlock(LogAggregationTFileController.java:341)
>   at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:202)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.refle

[jira] [Commented] (YARN-10207) CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI

2020-04-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077318#comment-17077318
 ] 

Szilard Nemeth commented on YARN-10207:
---

Hi [~sahuja],
Thanks for working on this patch.
Very thorough investigation 
[here|https://issues.apache.org/jira/browse/YARN-10207?focusedCommentId=17071542&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17071542],
 thumbs up for this.
Investigation and analysis makes sense for me, as well as the patch. LGTM, 
committing shortly.
Thanks [~adam.antal] for the review.

{quote}
You will notice that this is a different call stack to the TFile case as we 
don't have a call to AggregatedLogFormat.LogReader i.e. it is coded differently.
Regardless, thanks to that finally clause, it does end up cleaning the 
connection and there are no CLOSE_WAIT leaks in case of a corrupted log file 
being encountered. (Bad thing here is that only a WARN log is presented to the 
user in the JHS logs in case of rendering failing for Tfile logs and there is 
no stacktrace logged coming from the exception here - 
https://github.com/apache/hadoop/blob/c24af4b0d6fc32938b076161b5a8c86d38e3e0a1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/IndexedFileAggregatedLogsBlock.java#L136
 as the exception is just swallowed up inside the catch{} clause. This may 
warrant a separate JIRA.)
{quote}
I'd vote for filing the jira, given your explanation makes sense and it's a 
matter of copy/pasting your investigation bits to the other jira could work 
well.

{quote}
however, did not find CLOSE_WAITS on the ApplicationHistoryServer process. 
>From code inspection it wasn't obvious to me as to why there weren't any leaks 
but then again I am not well-versed with AHS so this could be taken up in a 
separate JIRA by someone who has more experience with this component.
{quote}
I think if you haven't hound anything suspicious there and given you haven't 
found CLOSE_WAIT sockets, I suggest don't report a jira for this, unless you 
suspect something could hide there as a potential problem.

> CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated 
> logs on the JobHistoryServer Web UI
> -
>
> Key: YARN-10207
> URL: https://issues.apache.org/jira/browse/YARN-10207
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-10207.001.patch, YARN-10207.002.patch, 
> YARN-10207.003.patch, YARN-10207.004.patch
>
>
> File descriptor leaks are observed coming from the JobHistoryServer process 
> while it tries to render a "corrupted" aggregated log on the JHS Web UI.
> Issue reproduced using the following steps:
> # Ran a sample Hadoop MR Pi job, it had the id - 
> application_1582676649923_0026.
> # Copied an aggregated log file from HDFS to local FS:
> {code}
> hdfs dfs -get 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Updated the TFile metadata at the bottom of this file with some junk to 
> corrupt the file :
> *Before:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáP
> {code}
> *After:*
> {code}
>   
> ^@^GVERSION*(^@&container_1582676649923_0026_01_03^F^Dnone^A^Pª5²ª5²^C^Qdata:BCFile.index^Dnoneª5þ^M^M^Pdata:TFile.index^Dnoneª5È66^Odata:TFile.meta^Dnoneª5Â^F^F^@^@^@^@^@^B6^K^@^A^@^@Ñ^QÓh<91>µ×¶9ßA@<92>ºáPblah
> {code}
> Notice "blah" (junk) added at the very end.
> # Remove the existing aggregated log file that will need to be replaced by 
> our modified copy from step 3 (as otherwise HDFS will prevent it from placing 
> the file with the same name as it already exists):
> {code}
> hdfs dfs -rm -r -f 
> /tmp/logs/systest/logs/application_1582676649923_0026/_8041
> {code}
> # Upload the corrupted aggregated file back to HDFS:
> {code}
> hdfs dfs -put _8041 
> /tmp/logs/systest/logs/application_1582676649923_0026
> {code}
> # Visit HistoryServer Web UI
> # Click on job_1582676649923_0026
> # Click on "logs" link against the AM (assuming the AM ran on nm_hostname)
> # Review the JHS logs, following exception will be seen:
> {code}
>   2020-03-24 20:03:48,484 ERROR org.apache.hadoop.yarn.webapp.View: Error 
> getting logs for job_1582676649923_0026
>   java.io.IOException: Not a valid BCFile.
>   at 
> org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927)
>

[jira] [Commented] (YARN-5277) when localizers fail due to resource timestamps being out, provide more diagnostics

2020-04-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077304#comment-17077304
 ] 

Szilard Nemeth commented on YARN-5277:
--

Hi [~sahuja],
Some comments: 
1. In FSDownload: 

{code:java}
if (sStat.getModificationTime() != resource.getTimestamp()) {
  throw new IOException("Resource " + sCopy + " change on src filesystem" +
  " - expected: " +
  "\"" + Times.formatISO8601(resource.getTimestamp()) + "\"" +
  ", was: " +
  "\"" + Times.formatISO8601(sStat.getModificationTime()) + "\"" +
  ", current time: " + "\"" + Times.formatISO8601(Time.now()) + "\"");
}
{code}
I think message should be "changed" not "change".

2. The testcase you added (testResourceTimestampChangeDuringDownload) makes 
sense in overall, however I think in the end, you need an Assert.fail() if the 
exception haven't thrown: 

{code:java}
 try {
  for (Map.Entry> p : pending.entrySet()) {
p.getValue().get();
  }
//TODO Add Assert.fail() here
} catch (ExecutionException ee) {
  Assert.assertTrue(ee.getCause() instanceof IOException);
  Assert.assertTrue("Exception contains original timestamp",
  ee.getMessage().contains(Times.formatISO8601(origLRTimestamp)));
  Assert.assertTrue("Exception contains modified timestamp",
  ee.getMessage().contains(Times.formatISO8601(modifiedFSTimestamp)));
}
{code}



> when localizers fail due to resource timestamps being out, provide more 
> diagnostics
> ---
>
> Key: YARN-5277
> URL: https://issues.apache.org/jira/browse/YARN-5277
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-5277.001.patch
>
>
> When an NM fails a resource D/L as the timestamps are wrong, there's not much 
> info, just two long values. 
> It would be good to also include the local time values, *and the current wall 
> time*. These are the things people need to know when trying to work out what 
> went wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument

2020-04-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077280#comment-17077280
 ] 

Szilard Nemeth commented on YARN-10063:
---

Hi [~sahuja],
Thanks for working on this patch, latest patch LGTM.

[~wilfreds] Do you want to commit this or are you fine with me committing it?
thanks


> Usage output of container-executor binary needs to include --http/--https 
> argument
> --
>
> Key: YARN-10063
> URL: https://issues.apache.org/jira/browse/YARN-10063
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Ahuja
>Assignee: Siddharth Ahuja
>Priority: Minor
> Attachments: YARN-10063.001.patch, YARN-10063.002.patch, 
> YARN-10063.003.patch, YARN-10063.004.patch
>
>
> YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" 
> (default) and "\--https" that is possible to be passed in to the 
> container-executor binary, see :
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L564
> and 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L521
> however, the usage output seems to have missed this:
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L74
> Raising this jira to improve this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-07 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077270#comment-17077270
 ] 

Hudson commented on YARN-10001:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18123 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18123/])
YARN-10001. Add explanation of unimplemented methods in (snemeth: rev 
45362a9f4cbe512ee4cd6b7f65aa47d59fee612e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/InMemoryConfigurationStore.java


> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10001.001.patch, YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-07 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10001:
--
Fix Version/s: 3.4.0
   3.3.0

> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10001.001.patch, YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077265#comment-17077265
 ] 

Szilard Nemeth commented on YARN-10001:
---

[~sahuja]: 
UPDATE: Never mind, cherry picked and pushed to branch-3.3 so don't bother with 
that.
However, while backporting the commit to branch-3.2, I had a conflict. Please 
resolve it and upload a patch targeting 3.2. Thanks


> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-10001.001.patch, YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10001) Add explanation of unimplemented methods in InMemoryConfigurationStore

2020-04-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077257#comment-17077257
 ] 

Szilard Nemeth commented on YARN-10001:
---

Hi [~sahuja],
Thanks for this patch, LGTM, committed to trunk.
Please also upload a patch targeting branch-3.3 as well.
Please check if it makes sense to backport this to branch-3.2, so that branch 
can stay more close to trunk.
thanks

> Add explanation of unimplemented methods in InMemoryConfigurationStore
> --
>
> Key: YARN-10001
> URL: https://issues.apache.org/jira/browse/YARN-10001
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-10001.001.patch, YARN-10001.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5277) when localizers fail due to resource timestamps being out, provide more diagnostics

2020-04-07 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077232#comment-17077232
 ] 

Steve Loughran commented on YARN-5277:
--

can you submit this as a github PR so I can review it there?

> when localizers fail due to resource timestamps being out, provide more 
> diagnostics
> ---
>
> Key: YARN-5277
> URL: https://issues.apache.org/jira/browse/YARN-5277
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-5277.001.patch
>
>
> When an NM fails a resource D/L as the timestamps are wrong, there's not much 
> info, just two long values. 
> It would be good to also include the local time values, *and the current wall 
> time*. These are the things people need to know when trying to work out what 
> went wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10120) In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled

2020-04-07 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077215#comment-17077215
 ] 

Bilwa S T commented on YARN-10120:
--

Hi [~prabhujoseph] can you please commit branch-3.3 patch?? 

> In Federation Router Nodes/Applications/About pages throws 500 exception when 
> https is enabled
> --
>
> Key: YARN-10120
> URL: https://issues.apache.org/jira/browse/YARN-10120
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Reporter: Sushanta Sen
>Assignee: Bilwa S T
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: YARN-10120-YARN-7402.patch, 
> YARN-10120-YARN-7402.v2.patch, YARN-10120-addendum-01.patch, 
> YARN-10120-branch-3.3.patch, YARN-10120-branch-3.3.v2.patch, 
> YARN-10120.001.patch, YARN-10120.002.patch
>
>
> In Federation Router Nodes/Applications/About pages throws 500 exception when 
> https is enabled.
> yarn.router.webapp.https.address =router ip:8091
> {noformat}
> 2020-02-07 16:38:49,990 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/apps
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:166)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
>   at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
>   at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
>   at 
> com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
>   at 
> org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1622)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at