[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2021-12-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461121#comment-17461121
 ] 

Gergely Pollák commented on YARN-10427:
---

[~snemeth] thank you for the patch, it seems fine to me, LGTM+1, merged to 
trunk.

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, 
> YARN-10427.004.patch, fair-scheduler.xml, inputsls.json, jobruntime.csv, 
> jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2021-12-16 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460802#comment-17460802
 ] 

Szilard Nemeth commented on YARN-10427:
---

Hi [~shuzirra],
I don't know what's going on, my latest change should have screwed the unit 
tests at all.
Moreover, I managed to run the UTs locally and none of them failed.
Created a PR from the same state of my branch, let's see what Jenkins results 
it produces.

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, 
> YARN-10427.004.patch, fair-scheduler.xml, inputsls.json, jobruntime.csv, 
> jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2021-12-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460663#comment-17460663
 ] 

Hadoop QA commented on YARN-10427:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 
57s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 22m 
41s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1264/artifact/out/branch-mvninstall-root.txt{color}
 | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 31s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 17m  
7s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1264/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt{color}
 | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 22 
unchanged - 1 fixed = 23 total (was 23) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 33s{color} | {color:green}{color} | {c

[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2021-12-16 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460624#comment-17460624
 ] 

Szilard Nemeth commented on YARN-10427:
---

Thanks [~shuzirra] for your review and comment,
Fixed your concern, please check the latest patch.

Thanks.

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, 
> YARN-10427.004.patch, fair-scheduler.xml, inputsls.json, jobruntime.csv, 
> jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2021-12-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460347#comment-17460347
 ] 

Gergely Pollák commented on YARN-10427:
---

[~snemeth] thank you for the patch I have only one minor observation:

The simulateFinishTimeMS is initialized as 0, and 0 value is supposed to mean 
that simulateFinishTimeMS have not been calculated yet, however it's not 
necessarily true, since simulateFinishTimeMS is calculated as the 
currentTimestamp - baseline, it might be 0 (very unlikely, but theoretically 
possible). 
So I think -1 or Long.MIN_VALUE would be a better indicator for the "empty" 
timestamp. Also we could introduce a boolean for the same purpose and then we 
wouldn't have to rely on magic numbers, but it's acceptable here. (Also please 
make a constant for the value, just to make clear what does this number mean)

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, 
> fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, 
> mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2021-12-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456414#comment-17456414
 ] 

Hadoop QA commented on YARN-10427:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
30s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
35s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 17s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 16m 
59s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1259/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt{color}
 | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 22 
unchanged - 1 fixed = 23 total (was 23) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 26s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color

[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2021-02-02 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276924#comment-17276924
 ] 

Andras Gyori commented on YARN-10427:
-

Thank you [~snemeth] for the extremely detailed analysis! I have check through 
the main points of your feedback and I agree with your solution. My additions 
to the topic:
 * The logic is quite confusing, as I agree with the proposed approach to 
prevent lastStep to be invoked multiple times.
 * I can see, that the processResponseQueue method sets isFinished variable to 
true, after invoking lastStep. This could be useful for additional checks, 
inside MRAMSimulator#lastStep. I do not think invoking those clear methods on 
the collections multiple times is problematic, but it might be worth a double 
check.

{code:java}
@Override
  public void lastStep() throws Exception {
super.lastStep();

// clear data structures, but still invoked twice
allMaps.clear();
allReduces.clear();
assignedMaps.clear();
assignedReduces.clear();
pendingFailedMaps.clear();
pendingFailedReduces.clear();
pendingMaps.clear();
pendingReduces.clear();
scheduledMaps.clear();
scheduledReduces.clear();
responseQueue.clear();
  }
{code}
All in all, I have no objections, its +1.


> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, 
> fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, 
> mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2021-01-08 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261287#comment-17261287
 ] 

Szilard Nemeth commented on YARN-10427:
---

Hi [~werd.up],
I'm glad that you found it useful.


> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, 
> fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, 
> mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-12-22 Thread Drew Merrill (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253892#comment-17253892
 ] 

Drew Merrill commented on YARN-10427:
-

Hi [~snemeth], wow, your response is amazing! I need to set aside a good chunk 
of time to digest it in its entirety and actually work through the debugging 
procedure you went through, step-by-step. But I just want to express my sincere 
gratitude for putting the time and energy into crafting such a detailed and 
instructive follow-up that both confirmed my findings, while also showing in 
great detail and clarity the steps you took to identify the source of the 
problem along with possible solutions. Thank you!

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, 
> fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, 
> mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-12-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253693#comment-17253693
 ] 

Hadoop QA commented on YARN-10427:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 59s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
49s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/411/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt{color}
 | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 22 
unchanged - 1 fixed = 23 total (was 23) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 40s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {col

[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-12-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253654#comment-17253654
 ] 

Hadoop QA commented on YARN-10427:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
35s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 5s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  9s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
48s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/410/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt{color}
 | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 22 
unchanged - 1 fixed = 23 total (was 23) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 47s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {col

[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-12-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253608#comment-17253608
 ] 

Hadoop QA commented on YARN-10427:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red}{color} | {color:red} YARN-10427 does not apply to trunk. Rebase 
required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10427 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13017522/YARN-10427.002.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/409/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, YARN-10427.002.patch, fair-scheduler.xml, 
> inputsls.json, jobruntime.csv, jobruntime.csv, mapred-site.xml, 
> sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-12-22 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253604#comment-17253604
 ] 

Szilard Nemeth commented on YARN-10427:
---

Accidentally attached a patch that also contains all the logging.
Adding a second patch with just the fix.

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, 
> YARN-10427.001.patch, fair-scheduler.xml, inputsls.json, jobruntime.csv, 
> jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-12-22 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253589#comment-17253589
 ] 

Szilard Nemeth commented on YARN-10427:
---

Hi [~werd.up],
 Thanks for reporting this issue and congratulations for the first reported 
Hadoop YARN jira.
{quote}In the process of attempting to verify and validate the SLS output, I've 
encountered a number of issues including runtime exceptions and bad output.
{quote}
I read through your observations and spent some time to play around with SLS.

If you encountered other issues, please report other jiras if you have some 
time.

As the process of running SLS involved some repetitive tasks like uploading 
configs to the remote machine, launch SLS, save the resulted logs..., I created 
some scripts into my public Github repo here: 
[https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427]

Let me break summarize what are these scripts are doing: 
 1. [config 
dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/config]:
 This is the exact same configuration file set that you attached to this jira, 
with one exception of the log4j.properties file, that turns on DEBUG logging 
for SLS.

2. [upstream-patches 
dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/upstream-patches]:
 This is the directory of the logging patch that helped me see the issues more 
clearly.
 My code changes are also pushed to my Hadoop fork: 
[https://github.com/szilard-nemeth/hadoop/tree/YARN-10427-investigation]

3. [scripts 
dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts]:
 This is the directory that contains all my scripts to build Hadoop + launch 
SLS and save produced logs to the local machine.
 As I have been working on a remote cluster, there's a script called 
[setup-vars-upstream.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/setup-vars-upstream.sh]
 that contains some configuration values for the remote cluster + some local 
directories. If you want to use the scripts, all you need to do is to replace 
the configs in this file according to your environment.

3.1 
[build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh]:
 This is the script that builds Hadoop according to the environment variables 
and launches the SLS suite on the remote cluster.

3.2 
[start-sls.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/start-sls.sh]:
 This is the most important script as this will be executed on the remote 
machine. 
 I think the script itself is straightforward enough, but let me briefly list 
what it does:
 - This script assumes that the Hadoop dist package is copied to the remote 
machine (this was done by 
[build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh])
 - Cleans up all Hadoop-related directories and extracts the Hadoop dist tar.gz
 - Copies the config to Hadoop's config dirs so SLS will use these particular 
configs
 - Launches SLS by starting slsrun.sh with the appropriate CLI swithces
 - Greps for some useful data in the resulted SLS log file.

3.3 
[launch-sls.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/launch-sls.sh]:
 This script is executed by 
[build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh]
 as its last step. Once the start-sls.sh is finished, the 
[save-latest-sls-logs.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/save-latest-sls-logs.sh]
 script is started. As the name implies it saves the latest SLS log dir and 
SCPs it to the local machine. The target directory of the local machine is 
determined by the config 
([setup-vars-upstream.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/setup-vars-upstream.sh]).

*The late

[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-11-23 Thread Drew Merrill (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237866#comment-17237866
 ] 

Drew Merrill commented on YARN-10427:
-

Yes it does. Output attached. [^jobruntime.csv]

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Priority: Major
> Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv, 
> jobruntime.csv, jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-11-23 Thread Agshin Kazimli (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237288#comment-17237288
 ] 

Agshin Kazimli commented on YARN-10427:
---

Does this issue persist with the following command?
bin/slsrun.sh --input-sls=/inputsls.json --output-dir=

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Priority: Major
> Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv, 
> mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output

2020-11-19 Thread Drew Merrill (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235876#comment-17235876
 ] 

Drew Merrill commented on YARN-10427:
-

_*Anyone?*_ *I'd really appreciate a response from someone on this.* _*A 
developer? A fellow user? A computer?*_

Have I not included enough info or the right info needed to investigate this? 
If so, please let me know!

*At the very least, can someone else please _{color:#FF}confirm{color}_ 
_{color:#FF}that the issue with duplicate Job IDs is reproducible?{color}_ 
{color:#FF}{color:#172b4d}It's frustrating and stressful not knowing if the 
problem is due to something that _I'm doing wrong_ or if it's a bug in 
Hadoop.{color}{color}*

*{color:#FF}{color:#172b4d}There's either a teachable moment here where I 
can learn what I'm doing wrong or else an opportunity to identify and fix a bug 
in Hadoop. Both are good outcomes!{color}{color}*

> Duplicate Job IDs in SLS output
> ---
>
> Key: YARN-10427
> URL: https://issues.apache.org/jira/browse/YARN-10427
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0
> Environment: I ran the attached inputs on my MacBook Pro, using 
> Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also 
> tested against 3.2.1 and 3.3.0 release branches.
>  
>Reporter: Drew Merrill
>Priority: Major
> Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv, 
> mapred-site.xml, sls-runner.xml, yarn-site.xml
>
>
> Hello, I'm hoping someone can help me resolve or understand some issues I've 
> been having with the YARN Scheduler Load Simulator (SLS). I've been 
> experimenting with SLS for several months now at work as we're trying to 
> build a simulation model to characterize our enterprise Hadoop infrastructure 
> for purposes of future capacity planning. In the process of attempting to 
> verify and validate the SLS output, I've encountered a number of issues 
> including runtime exceptions and bad output. The focus of this issue is the 
> bad output. In all my simulation runs, the jobruntime.csv output seems to 
> have one or more of the following problems: no output, duplicate job ids, 
> and/or missing job ids.
>  
> Because of where I work, I'm unable to provide the exact inputs I typically 
> use, but I'm able to reproduce the problem of the duplicate Job IDS using 
> some simplified inputs and configuration files, which I've attached, along 
> with the output I obtained.
>  
> The command I used to run the simulation:
> {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json 
> --output-dir=sls-run-1 --print-simulation 
> --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}}
>  
> Can anyone help me understand what would cause the duplicate Job IDs in the 
> output? Is this a bug in Hadoop or a problem with my inputs? Thanks in 
> advance.
>  
> PS: This is my first issue I've ever opened so please be kind if I've missed 
> something or am not understanding something obvious about the way Hadoop 
> works. I'll gladly follow-up with more info as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org