[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-12 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212676#comment-17212676
 ] 

Jonathan Hung commented on YARN-10450:
--

[~Jim_Brennan], Physical Mem Used % makes sense to me. We also refer to this as 
"Memory Efficiency" internally. 

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-12 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212662#comment-17212662
 ] 

Jim Brennan commented on YARN-10450:


Thanks for the review and comments [~ebadger]!  I agree the names could be 
clearer.  I'm not sure if we should change *Mem Used* because even though I 
agree it could be more accurate, it has been called that for a long time.

I'm definitely open to changing the name for *Mem Utilization %*, which in the 
Cluster Metrics is the actual memory utilization percentage for all nodes in 
the cluster, and in the Node Metrics it's the actual memory utilization 
percentage for the node. Maybe it should be something like *Physical Mem Used 
%* / *Physical VCores Used %*?


 [~epayne], [~jhung]  what do you think?

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10422) Create the script responsible for collecting the bundle data

2020-10-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212650#comment-17212650
 ] 

Hadoop QA commented on YARN-10422:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
15s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} |  | {color:red} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 6s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 32s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} |  | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange}  0m  
4s{color} | 
[/diff-pylint.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/229/artifact/out/diff-pylint.txt]
 | {color:orange} The patch generated 42 new + 0 unchanged - 0 fixed = 42 total 
(was 0) {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 15m 
44s{color} | 
[/patch-shadedclient.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/229/artifact/out/patch-shadedclient.txt]
 | {color:red} patch has errors when building and testing our client artifacts. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m  2s{color} 
| 
[/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/229/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 | {color:red} 

[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-12 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212632#comment-17212632
 ] 

Eric Badger commented on YARN-10450:


The patch itself looks good to me. However, I'm wondering if "Mem Utilization" 
is the correct phrase to convey what we mean. To me this means "Mem Used" / 
"Mem Avail". But in this case it's the actual utilization of the node. And "Mem 
Used" isn't really the actual memory that's being used. It's the memory that is 
allocated to that node via YARN.

[~Jim_Brennan], [~epayne] do you have any thoughts on making this terminology a 
little more clear on the UI?

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-12 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212557#comment-17212557
 ] 

Eric Badger commented on YARN-10450:


I'll review it

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-12 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212547#comment-17212547
 ] 

Jim Brennan commented on YARN-10450:


Anyone else available to review? [~jhung], [~ebadger] ?

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: NodesPage.png, YARN-10450.001.patch, YARN-10450.002.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10422) Create the script responsible for collecting the bundle data

2020-10-12 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10422:
-
Attachment: YARN-10422.POC.003.patch

> Create the script responsible for collecting the bundle data
> 
>
> Key: YARN-10422
> URL: https://issues.apache.org/jira/browse/YARN-10422
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10422.POC.001.patch, YARN-10422.POC.002.patch, 
> YARN-10422.POC.003.patch
>
>
> The script should provide the list of diagnostic use-cases described in 
> YARN-10421. If a request comes in to the YarnDiagnosticCollector servlet, the 
> script will be invoked. It collects all the information required for that 
> diagnostic category and saves it into a configurable directory as a 
> compressed tar file. 
> An example of how the script could look like:
> {code:java}
> if [$1 = "listcommonissues"]
>   echo "1, Application Failed"
>   echo "2, Application Hanging"
>   echo "3, Scheduler Related Issue"
>   echo "4, RM failure to start"
>   echo "5, NM failure to start"
> elif [$1 = "collect"]
>   if [$2 == 1]   
> appId = $3  
> mkdir /tmp/$appId
> yarn logs -applicationId $appId > /tmp/$appId/joblogs   
> curl /{appId}/conf > /tmp/$appId/conf  
> curl /logs | grep container > /tmp/$appId/rmlogs  
> curl /logs | grep container > /tmp/$appId/nmlogs  
> outputpath = /tmp/$appId
>   elif  ...
>   elif  ...
> fi   tar and compress outputpath.{code}
>  
> During class load YarnDiagnosticsCollector reads the list of common issues 
> from the script and keeps it in memory. On every startup of YARN UI2 
> diagnostics page, it fetches the list from the servlet and displays them. The 
> servlet should handle the script changes, so if a new diagnostic case is 
> added,  a YARN UI2 reload should show it. This way the users can easily plug 
> new categories without any UI2 or Servlet code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2020-10-12 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212524#comment-17212524
 ] 

Eric Badger commented on YARN-9667:
---

Thanks, [~Jim_Brennan]!

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.5, 2.10.2
>
> Attachments: YARN-9667-001.patch, YARN-9667-branch-2.10.001.patch, 
> YARN-9667-branch-3.2.001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10448) SLS should set default user to handle SYNTH format

2020-10-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212476#comment-17212476
 ] 

Hadoop QA commented on YARN-10448:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
12s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} |  | {color:red} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
23s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 22s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
47s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 38s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 54s{color} 
| 

[jira] [Updated] (YARN-9667) Container-executor.c duplicates messages to stdout

2020-10-12 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-9667:
--
Fix Version/s: 2.10.2

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.5, 2.10.2
>
> Attachments: YARN-9667-001.patch, YARN-9667-branch-2.10.001.patch, 
> YARN-9667-branch-3.2.001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10448) SLS should set default user to handle SYNTH format

2020-10-12 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212437#comment-17212437
 ] 

zhuqi commented on YARN-10448:
--

CC [~adam.antal] 

Thanks for your patient review and commit.

The unit tests failures are not related to it, and i have fixed the checkstyle 
warning in the new patch.

> SLS should set default user to handle SYNTH format
> --
>
> Key: YARN-10448
> URL: https://issues.apache.org/jira/browse/YARN-10448
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.2.1, 3.4.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10448.001.patch, YARN-10448.002.patch, 
> YARN-10448.003.patch, YARN-10448.004.patch, 
> image-2020-10-11-22-01-37-227.png, image-2020-10-11-22-02-17-166.png
>
>
> When using the synthetic generator json file example from the doc ( 
> https://hadoop.apache.org/docs/current/hadoop-sls/SchedulerLoadSimulator.html#SYNTH_JSON_input_file_format
>  ), it throws the following exception:
> {noformat}
> java.lang.IllegalArgumentException: Null user
> at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1269)
> at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1256)
> at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.submitReservationWhenSpecified(AMSimulator.java:191)
> at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:161)
> at 
> org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:88)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {noformat}
> So the solution is either:
> 1) to make {{user_name}} a mandatory field, or
> 2) to set default user in SLS code if the json file does not define it.
> IMO, solution 2 might be better, because in most cases (if not all) 
> {{user_name}} has no impact on scheduler performance, thus it is reasonable 
> to make it an optional field, which is also consistent with the {{job.user}} 
> field in SLS JSON file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10448) SLS should set default user to handle SYNTH format

2020-10-12 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212423#comment-17212423
 ] 

Adam Antal commented on YARN-10448:
---

Thanks for the patch [~zhuqi], looks good to me.

Could you please double check that the unit tests failures are related? Also 
there's one checkstyle warning remained. I can commit this if you take care of 
that.

> SLS should set default user to handle SYNTH format
> --
>
> Key: YARN-10448
> URL: https://issues.apache.org/jira/browse/YARN-10448
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.2.1, 3.4.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10448.001.patch, YARN-10448.002.patch, 
> YARN-10448.003.patch, image-2020-10-11-22-01-37-227.png, 
> image-2020-10-11-22-02-17-166.png
>
>
> When using the synthetic generator json file example from the doc ( 
> https://hadoop.apache.org/docs/current/hadoop-sls/SchedulerLoadSimulator.html#SYNTH_JSON_input_file_format
>  ), it throws the following exception:
> {noformat}
> java.lang.IllegalArgumentException: Null user
> at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1269)
> at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1256)
> at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.submitReservationWhenSpecified(AMSimulator.java:191)
> at 
> org.apache.hadoop.yarn.sls.appmaster.AMSimulator.firstStep(AMSimulator.java:161)
> at 
> org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:88)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {noformat}
> So the solution is either:
> 1) to make {{user_name}} a mandatory field, or
> 2) to set default user in SLS code if the json file does not define it.
> IMO, solution 2 might be better, because in most cases (if not all) 
> {{user_name}} has no impact on scheduler performance, thus it is reasonable 
> to make it an optional field, which is also consistent with the {{job.user}} 
> field in SLS JSON file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10420) Update CS MappingRule documentation with the new format and features

2020-10-12 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212411#comment-17212411
 ] 

Adam Antal commented on YARN-10420:
---

Thanks for the patch [~pbacsko]. I'll attach my reply inline.

1. Ok, let's not touch it then.
2. Can we check what happens, and document it as well? I think users would be 
also interested in that.
3. Can we also add this to the document?
4,5,6. Ok, got it, thanks.

bq. "If the target queue doesn't exist or and it cannot be created..." - you 
propose "and" but that would mean that we always try to create a non-existing 
queue, which is not the case in CS. Under regular parents, queues cannot be 
created dynamically and CS doesn't even try. Therefore "or" is more appropriate 
here.
Thanks for the clarification. I suggest to add this to the doc, because I 
didn't know that "cannot be created" is what you've illustrated as an example. 
Something like "If the target queue doesn't exist or cannot be created (e.g. 
under regular parents) ..."

For all the other points, I'm fine.

> Update CS MappingRule documentation with the new format and features
> 
>
> Key: YARN-10420
> URL: https://issues.apache.org/jira/browse/YARN-10420
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10420-001.patch, YARN-10420-002.patch, 
> YARN-10420-003.patch, YARN-10420-004.patch, YARN-10420-005.patch
>
>
> Update the upstream documentation with the new changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10457) Add a configuration switch to change between legacy and JSON placement rule format.

2020-10-12 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak reassigned YARN-10457:
-

Assignee: Gergely Pollak

> Add a configuration switch to change between legacy and JSON placement rule 
> format.
> ---
>
> Key: YARN-10457
> URL: https://issues.apache.org/jira/browse/YARN-10457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10457) Add a configuration switch to change between legacy and JSON placement rule format.

2020-10-12 Thread Gergely Pollak (Jira)
Gergely Pollak created YARN-10457:
-

 Summary: Add a configuration switch to change between legacy and 
JSON placement rule format.
 Key: YARN-10457
 URL: https://issues.apache.org/jira/browse/YARN-10457
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Gergely Pollak






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10431) [Umbrella] Job group management

2020-10-12 Thread jialei weng (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-10431:
---
Attachment: YarnJobGroupImpl design.pdf

> [Umbrella] Job group management
> ---
>
> Key: YARN-10431
> URL: https://issues.apache.org/jira/browse/YARN-10431
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.9.2
>Reporter: jialei weng
>Priority: Major
> Attachments: YarnJobGroupImpl design.pdf
>
>
> In current yarn job management, we don't have an efficient mechanism to 
> manage several jobs together. For example, one batch job may trigger several 
> sub-jobs to running at the same time, like one job to process the data and 
> another one monitor job metrics. And when we want to cancel these jobs, we 
> have to kill them one by one in current design. I proposal a job group 
> concept to handle such parent-child jobs as one unit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10431) [Umbrella] Job group management

2020-10-12 Thread jialei weng (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-10431:
---
Attachment: YarnJobObjectImpl Design.pdf

> [Umbrella] Job group management
> ---
>
> Key: YARN-10431
> URL: https://issues.apache.org/jira/browse/YARN-10431
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.9.2
>Reporter: jialei weng
>Priority: Major
>
> In current yarn job management, we don't have an efficient mechanism to 
> manage several jobs together. For example, one batch job may trigger several 
> sub-jobs to running at the same time, like one job to process the data and 
> another one monitor job metrics. And when we want to cancel these jobs, we 
> have to kill them one by one in current design. I proposal a job group 
> concept to handle such parent-child jobs as one unit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10431) [Umbrella] Job group management

2020-10-12 Thread jialei weng (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jialei weng updated YARN-10431:
---
Attachment: (was: YarnJobObjectImpl Design.pdf)

> [Umbrella] Job group management
> ---
>
> Key: YARN-10431
> URL: https://issues.apache.org/jira/browse/YARN-10431
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.9.2
>Reporter: jialei weng
>Priority: Major
>
> In current yarn job management, we don't have an efficient mechanism to 
> manage several jobs together. For example, one batch job may trigger several 
> sub-jobs to running at the same time, like one job to process the data and 
> another one monitor job metrics. And when we want to cancel these jobs, we 
> have to kill them one by one in current design. I proposal a job group 
> concept to handle such parent-child jobs as one unit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10422) Create the script responsible for collecting the bundle data

2020-10-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212248#comment-17212248
 ] 

Hadoop QA commented on YARN-10422:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} |  | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
|  | {color:red} YARN-10422 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10422 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13013317/YARN-10422.POC.002.patch
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/227/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Create the script responsible for collecting the bundle data
> 
>
> Key: YARN-10422
> URL: https://issues.apache.org/jira/browse/YARN-10422
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10422.POC.001.patch, YARN-10422.POC.002.patch
>
>
> The script should provide the list of diagnostic use-cases described in 
> YARN-10421. If a request comes in to the YarnDiagnosticCollector servlet, the 
> script will be invoked. It collects all the information required for that 
> diagnostic category and saves it into a configurable directory as a 
> compressed tar file. 
> An example of how the script could look like:
> {code:java}
> if [$1 = "listcommonissues"]
>   echo "1, Application Failed"
>   echo "2, Application Hanging"
>   echo "3, Scheduler Related Issue"
>   echo "4, RM failure to start"
>   echo "5, NM failure to start"
> elif [$1 = "collect"]
>   if [$2 == 1]   
> appId = $3  
> mkdir /tmp/$appId
> yarn logs -applicationId $appId > /tmp/$appId/joblogs   
> curl /{appId}/conf > /tmp/$appId/conf  
> curl /logs | grep container > /tmp/$appId/rmlogs  
> curl /logs | grep container > /tmp/$appId/nmlogs  
> outputpath = /tmp/$appId
>   elif  ...
>   elif  ...
> fi   tar and compress outputpath.{code}
>  
> During class load YarnDiagnosticsCollector reads the list of common issues 
> from the script and keeps it in memory. On every startup of YARN UI2 
> diagnostics page, it fetches the list from the servlet and displays them. The 
> servlet should handle the script changes, so if a new diagnostic case is 
> added,  a YARN UI2 reload should show it. This way the users can easily plug 
> new categories without any UI2 or Servlet code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org