[jira] [Comment Edited] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382403#comment-17382403
 ] 

Qi Zhu edited comment on YARN-10855 at 7/17/21, 2:42 AM:
-

Thanks [~Jim_Brennan] for update.

cc [~epayne]

If no other comments, i will commit it.


was (Author: zhuqi):
Thanks [~Jim_Brennan] for update.

If no other comments, i will commit it.

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch, YARN-10855.002.patch, 
> YARN-10855.003.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382403#comment-17382403
 ] 

Qi Zhu commented on YARN-10855:
---

Thanks [~Jim_Brennan] for update.

If no other comments, i will commit it.

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch, YARN-10855.002.patch, 
> YARN-10855.003.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382384#comment-17382384
 ] 

Hadoop QA commented on YARN-10855:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
45s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
34s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
16s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  9s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 23m 
21s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
32s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
32s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
51s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  9s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| 

[jira] [Commented] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382331#comment-17382331
 ] 

Jim Brennan commented on YARN-10855:


patch 003 fixes the checkstyle issues.
[~epayne] can you please review this?


> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch, YARN-10855.002.patch, 
> YARN-10855.003.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10855:
---
Attachment: YARN-10855.003.patch

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch, YARN-10855.002.patch, 
> YARN-10855.003.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10857) YarnClient Caching Addresses

2021-07-16 Thread Steve Suh (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Suh updated YARN-10857:
-
Description: 
We have noticed that when the YarnClient is initialized and used, it is not 
very resilient when dns or /etc/hosts is modified in the following scenario:

Take for instance the following (and reproducable) sequence of events that can 
occur on a service that instantiates and uses YarnClient. 
  - Yarn has rm HA enabled (*yarn.resourcemanager.ha.enabled* is *true*) and 
there are two rms (rm1 and rm2).
  - *yarn.client.failover-proxy-provider* is set to 
*org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider*

1)  rm2 is currently the active rm
2)  /etc/hosts (or dns) is missing host information for rm2
3)  A service is started and it initializes the YarnClient at startup.
4)  At some point in time after YarnClient is done initializing, /etc/hosts 
is updated and contains host information for rm2
5)  Yarn is queried, for instance calling *yarnclient.getApplications()*
6)  All YarnClient attempts to communicate with rm2 fail with 
UnknownHostExceptions, even though /etc/hosts now contains host information for 
it.



  was:
We have noticed that when the YarnClient is initialized and used, it is not 
very resilient when dns or /etc/hosts is modified in the following scenario:

Take for instance the following (and reproducable) sequence of events that can 
occur on a service that instantiates and uses YarnClient. 
  - Yarn has rm HA enabled (*yarn.resourcemanager.ha.enabled* is *true*) and 
there are two rms (rm1 and rm2).
  - *yarn.client.failover-proxy-provider* is set to 
*org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider*

1)  rm2 is currently the active rm
2)  /etc/hosts (or dns) is missing host information for rm2
3)  A service is started and it initializes the YarnClient at startup.
4)  At some point in time after YarnClient is done initializing, /etc/hosts 
is updated and contains host information for rm2
5)  Yarn is queried using YarnClient, for instance calling 
`.getApplications()`
6)  All YarnClient attempts to communicate with rm2 fail with 
UnknownHostExceptions, even though /etc/hosts now contains host information for 
it.




> YarnClient Caching Addresses
> 
>
> Key: YARN-10857
> URL: https://issues.apache.org/jira/browse/YARN-10857
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, yarn
>Reporter: Steve Suh
>Priority: Minor
>
> We have noticed that when the YarnClient is initialized and used, it is not 
> very resilient when dns or /etc/hosts is modified in the following scenario:
> Take for instance the following (and reproducable) sequence of events that 
> can occur on a service that instantiates and uses YarnClient. 
>   - Yarn has rm HA enabled (*yarn.resourcemanager.ha.enabled* is *true*) and 
> there are two rms (rm1 and rm2).
>   - *yarn.client.failover-proxy-provider* is set to 
> *org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider*
> 1)rm2 is currently the active rm
> 2)/etc/hosts (or dns) is missing host information for rm2
> 3)A service is started and it initializes the YarnClient at startup.
> 4)At some point in time after YarnClient is done initializing, /etc/hosts 
> is updated and contains host information for rm2
> 5)Yarn is queried, for instance calling *yarnclient.getApplications()*
> 6)All YarnClient attempts to communicate with rm2 fail with 
> UnknownHostExceptions, even though /etc/hosts now contains host information 
> for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10857) YarnClient Caching Addresses

2021-07-16 Thread Steve Suh (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Suh updated YARN-10857:
-
Description: 
We have noticed that when the YarnClient is initialized and used, it is not 
very resilient when dns or /etc/hosts is modified in the following scenario:

Take for instance the following (and reproducable) sequence of events that can 
occur on a service that instantiates and uses YarnClient. 
  - Yarn has rm HA enabled (*yarn.resourcemanager.ha.enabled* is *true*) and 
there are two rms (rm1 and rm2).
  - *yarn.client.failover-proxy-provider* is set to 
*org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider*

1)  rm2 is currently the active rm
2)  /etc/hosts (or dns) is missing host information for rm2
3)  A service is started and it initializes the YarnClient at startup.
4)  At some point in time after YarnClient is done initializing, /etc/hosts 
is updated and contains host information for rm2
5)  Yarn is queried using YarnClient, for instance calling 
`.getApplications()`
6)  All YarnClient attempts to communicate with rm2 fail with 
UnknownHostExceptions, even though /etc/hosts now contains host information for 
it.



  was:
We have noticed that when the YarnClient is initialized and used, it is not 
very resilient when dns or /etc/hosts is modified in the following scenario:

Take for instance the following (and reproducable) sequence of events that can 
occur on a service that instantiates and uses YarnClient. 
  - Yarn has rm HA enabled (`yarn.resourcemanager.ha.enabled` is `true`) and 
there are two rms (rm1 and rm2).
  - `yarn.client.failover-proxy-provider` is set to 
`org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider`

1)  rm2 is currently the active rm
2)  /etc/hosts (or dns) is missing host information for rm2
3)  A service is started and it initializes the YarnClient at startup.
4)  At some point in time after YarnClient is done initializing, /etc/hosts 
is updated and contains host information for rm2
5)  Yarn is queried using YarnClient, for instance calling 
`.getApplications()`
6)  All YarnClient attempts to communicate with rm2 fail with 
UnknownHostExceptions, even though /etc/hosts now contains host information for 
it.




> YarnClient Caching Addresses
> 
>
> Key: YARN-10857
> URL: https://issues.apache.org/jira/browse/YARN-10857
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, yarn
>Reporter: Steve Suh
>Priority: Minor
>
> We have noticed that when the YarnClient is initialized and used, it is not 
> very resilient when dns or /etc/hosts is modified in the following scenario:
> Take for instance the following (and reproducable) sequence of events that 
> can occur on a service that instantiates and uses YarnClient. 
>   - Yarn has rm HA enabled (*yarn.resourcemanager.ha.enabled* is *true*) and 
> there are two rms (rm1 and rm2).
>   - *yarn.client.failover-proxy-provider* is set to 
> *org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider*
> 1)rm2 is currently the active rm
> 2)/etc/hosts (or dns) is missing host information for rm2
> 3)A service is started and it initializes the YarnClient at startup.
> 4)At some point in time after YarnClient is done initializing, /etc/hosts 
> is updated and contains host information for rm2
> 5)Yarn is queried using YarnClient, for instance calling 
> `.getApplications()`
> 6)All YarnClient attempts to communicate with rm2 fail with 
> UnknownHostExceptions, even though /etc/hosts now contains host information 
> for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10857) YarnClient Caching Addresses

2021-07-16 Thread Steve Suh (Jira)
Steve Suh created YARN-10857:


 Summary: YarnClient Caching Addresses
 Key: YARN-10857
 URL: https://issues.apache.org/jira/browse/YARN-10857
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client, yarn
Reporter: Steve Suh


We have noticed that when the YarnClient is initialized and used, it is not 
very resilient when dns or /etc/hosts is modified in the following scenario:

Take for instance the following (and reproducable) sequence of events that can 
occur on a service that instantiates and uses YarnClient. 
  - Yarn has rm HA enabled (`yarn.resourcemanager.ha.enabled` is `true`) and 
there are two rms (rm1 and rm2).
  - `yarn.client.failover-proxy-provider` is set to 
`org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider`

1)  rm2 is currently the active rm
2)  /etc/hosts (or dns) is missing host information for rm2
3)  A service is started and it initializes the YarnClient at startup.
4)  At some point in time after YarnClient is done initializing, /etc/hosts 
is updated and contains host information for rm2
5)  Yarn is queried using YarnClient, for instance calling 
`.getApplications()`
6)  All YarnClient attempts to communicate with rm2 fail with 
UnknownHostExceptions, even though /etc/hosts now contains host information for 
it.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1187) Add discrete event-based simulation to yarn scheduler simulator

2021-07-16 Thread Anup Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382305#comment-17382305
 ] 

Anup Agarwal commented on YARN-1187:


Currently the alarm events which have the same time instant are sorted 
arbitrarily using the alarm's UUID. This can cause causally dependent events to 
be triggered/handled out of order.

To fix this, a sequence number – that is incremented at creation of an alarm -- 
can be added to each alarm so that causal order between events is preserved in 
the simulation.

> Add discrete event-based simulation to yarn scheduler simulator
> ---
>
> Key: YARN-1187
> URL: https://issues.apache.org/jira/browse/YARN-1187
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Andrew Chung
>Priority: Major
> Attachments: YARN-1187 design doc.pdf, 
> YARN-1187-branch-2.1.3.001.patch, YARN-1187-trunk.001.patch
>
>
> Follow the discussion in YARN-1021.
> Discrete event simulation decouples the running from any real-world clock. 
> This allows users to step through the execution, set debug points, and 
> definitely get a deterministic rexec. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382279#comment-17382279
 ] 

Hadoop QA commented on YARN-10855:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
47s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
33s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
13s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
37s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
37s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  3s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 22m 
41s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
29s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
29s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
48s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 55s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1123/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 
127 unchanged - 0 fixed = 129 total (was 127) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| 

[jira] [Updated] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10855:
---
Attachment: YARN-10855.002.patch

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch, YARN-10855.002.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9551) TestTimelineClientV2Impl.testSyncCall fails intermittent

2021-07-16 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382157#comment-17382157
 ] 

Andras Gyori commented on YARN-9551:


Unfortunately it is really not guaranteed for the async pushes to be merged. 
The intermittent failure is due to the fact, that the third entity is put into 
the queue after the the polling on TimelineV2ClientImpl#515. Since it is a non 
blocking operation, the nextEntityInQueue is null and therefore the second 
entity is published before it has a chance to merge with the third entity. 
Either we would need a guarantee (eg. wait on the queue instead of polling), or 
we just eliminate this wrong assumption.

I have chosen the latter, since I am reluctant to change the client behaviour 
just for the sake of a test.

> TestTimelineClientV2Impl.testSyncCall fails intermittent
> 
>
> Key: YARN-9551
> URL: https://issues.apache.org/jira/browse/YARN-9551
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestTimelineClientV2Impl.testSyncCall fails intermittent
> {code:java}
> Failed
> org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall
> Failing for the past 1 build (Since #24083 )
> Took 1.5 sec.
> Error Message
> TimelineEntities not published as desired expected:<3> but was:<4>
> Stacktrace
> java.lang.AssertionError: TimelineEntities not published as desired 
> expected:<3> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall(TestTimelineClientV2Impl.java:251)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Standard Output
> 2019-05-13 15:33:46,596 WARN  [main] util.NativeCodeLoader 
> (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 2019-05-13 15:33:47,763 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) 

[jira] [Updated] (YARN-9551) TestTimelineClientV2Impl.testSyncCall fails intermittent

2021-07-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-9551:
-
Labels: pull-request-available  (was: )

> TestTimelineClientV2Impl.testSyncCall fails intermittent
> 
>
> Key: YARN-9551
> URL: https://issues.apache.org/jira/browse/YARN-9551
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestTimelineClientV2Impl.testSyncCall fails intermittent
> {code:java}
> Failed
> org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall
> Failing for the past 1 build (Since #24083 )
> Took 1.5 sec.
> Error Message
> TimelineEntities not published as desired expected:<3> but was:<4>
> Stacktrace
> java.lang.AssertionError: TimelineEntities not published as desired 
> expected:<3> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall(TestTimelineClientV2Impl.java:251)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Standard Output
> 2019-05-13 15:33:46,596 WARN  [main] util.NativeCodeLoader 
> (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 2019-05-13 15:33:47,763 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 0 : 1,
> 2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 1 : 2,
> 2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 2 : 3,
> 2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 3 : 4,
> 2019-05-13 15:33:47,765 INFO  [main] impl.TimelineV2ClientImpl 

[jira] [Commented] (YARN-10855) yarn logs cli fails to retrieve logs if any TFile is corrupt or empty

2021-07-16 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382114#comment-17382114
 ] 

Jim Brennan commented on YARN-10855:


Thanks for the review and the suggestion [~zhuqi]!  I will update the patch.

> yarn logs cli fails to retrieve logs if any TFile is corrupt or empty
> -
>
> Key: YARN-10855
> URL: https://issues.apache.org/jira/browse/YARN-10855
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2, 2.10.1, 3.4.0, 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10855.001.patch
>
>
> When attempting to retrieve yarn logs via the CLI command, it failed with the 
> following stack trace (on branch-2.10):
> {noformat}
> yarn logs -applicationId application_1591017890475_1049740 > logs
> 20/06/05 19:15:50 INFO client.RMProxy: Connecting to ResourceManager 
> 20/06/05 19:15:51 INFO client.AHSProxy: Connecting to Application History 
> server 
> Exception in thread "main" java.io.EOFException: Cannot seek to negative 
> offset
>   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1701)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:65)
>   at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624)
>   at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804)
>   at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.(AggregatedLogFormat.java:503)
>   at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:227)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:333)
>   at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:367) 
> {noformat}
> The problem was that there was a zero-length TFile for one of the containers 
> in the application aggregated log directory in hdfs.  When we removed the 
> zero length file, {{yarn logs}} was able to retrieve the logs.
> A corrupt or zero length TFile for one container should not prevent loading 
> logs for the rest of the application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10821) User limit is not calculated as per definition for preemption

2021-07-16 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381810#comment-17381810
 ] 

Andras Gyori commented on YARN-10821:
-

Thanks for the insights [~epayne], it really helped me to get my head around 
user limit calculation. I am still investigating how to easily reproduce this 
issue, and will get back to you if I succeed. In the meantime, thanks again for 
hopping in.

> User limit is not calculated as per definition for preemption
> -
>
> Key: YARN-10821
> URL: https://issues.apache.org/jira/browse/YARN-10821
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10821.001.patch
>
>
> Minimum user limit percent (MULP) is a soft limit by definition. Preemption 
> uses pending resources to determine the resources needed by a queue, which is 
> calculated in LeafQueue#getTotalPendingResourcesConsideringUserLimit. This 
> method involves headroom calculated by UsersManager#computeUserLimit. 
> However, the pending resources for preemption are limited in an unexpected 
> fashion.
>  * In LeafQueue#getUserAMResourceLimitPerPartition an effective userLimit is 
> calculated first:
> {code:java}
>  float effectiveUserLimit = Math.max(usersManager.getUserLimit() / 100.0f,
>  1.0f / Math.max(getAbstractUsersManager().getNumActiveUsers(), 1));
> {code}
>  * In UsersManager#computeUserLimit the userLimit is calculated as is 
> (currentCapacity * userLimit)
> {code:java}
>  Resource userLimitResource = Resources.max(resourceCalculator,
>  partitionResource,
>  Resources.divideAndCeil(resourceCalculator, resourceUsed,
>  usersSummedByWeight),
>  Resources.divideAndCeil(resourceCalculator,
>  Resources.multiplyAndRoundDown(currentCapacity, getUserLimit()),
>  100));
> {code}
> The fewer users occupying the queue, the more prevalent and outstanding this 
> effect will be in preemption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org