[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461121#comment-17461121 ] Gergely Pollák commented on YARN-10427: --- [~snemeth] thank you for the patch, it seems fine to me, LGTM+1, merged to trunk. > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, > YARN-10427.004.patch, fair-scheduler.xml, inputsls.json, jobruntime.csv, > jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml > > Time Spent: 40m > Remaining Estimate: 0h > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460802#comment-17460802 ] Szilard Nemeth commented on YARN-10427: --- Hi [~shuzirra], I don't know what's going on, my latest change should have screwed the unit tests at all. Moreover, I managed to run the UTs locally and none of them failed. Created a PR from the same state of my branch, let's see what Jenkins results it produces. > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, > YARN-10427.004.patch, fair-scheduler.xml, inputsls.json, jobruntime.csv, > jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml > > Time Spent: 10m > Remaining Estimate: 0h > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460663#comment-17460663 ] Hadoop QA commented on YARN-10427: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 57s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 22m 41s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1264/artifact/out/branch-mvninstall-root.txt{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 31s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 17m 7s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1264/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 22 unchanged - 1 fixed = 23 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 33s{color} | {color:green}{color} | {c
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460624#comment-17460624 ] Szilard Nemeth commented on YARN-10427: --- Thanks [~shuzirra] for your review and comment, Fixed your concern, please check the latest patch. Thanks. > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, > YARN-10427.004.patch, fair-scheduler.xml, inputsls.json, jobruntime.csv, > jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460347#comment-17460347 ] Gergely Pollák commented on YARN-10427: --- [~snemeth] thank you for the patch I have only one minor observation: The simulateFinishTimeMS is initialized as 0, and 0 value is supposed to mean that simulateFinishTimeMS have not been calculated yet, however it's not necessarily true, since simulateFinishTimeMS is calculated as the currentTimestamp - baseline, it might be 0 (very unlikely, but theoretically possible). So I think -1 or Long.MIN_VALUE would be a better indicator for the "empty" timestamp. Also we could introduce a boolean for the same purpose and then we wouldn't have to rely on magic numbers, but it's acceptable here. (Also please make a constant for the value, just to make clear what does this number mean) > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, > fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, > mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456414#comment-17456414 ] Hadoop QA commented on YARN-10427: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 30s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 35s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 17s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 16m 59s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1259/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 22 unchanged - 1 fixed = 23 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 26s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276924#comment-17276924 ] Andras Gyori commented on YARN-10427: - Thank you [~snemeth] for the extremely detailed analysis! I have check through the main points of your feedback and I agree with your solution. My additions to the topic: * The logic is quite confusing, as I agree with the proposed approach to prevent lastStep to be invoked multiple times. * I can see, that the processResponseQueue method sets isFinished variable to true, after invoking lastStep. This could be useful for additional checks, inside MRAMSimulator#lastStep. I do not think invoking those clear methods on the collections multiple times is problematic, but it might be worth a double check. {code:java} @Override public void lastStep() throws Exception { super.lastStep(); // clear data structures, but still invoked twice allMaps.clear(); allReduces.clear(); assignedMaps.clear(); assignedReduces.clear(); pendingFailedMaps.clear(); pendingFailedReduces.clear(); pendingMaps.clear(); pendingReduces.clear(); scheduledMaps.clear(); scheduledReduces.clear(); responseQueue.clear(); } {code} All in all, I have no objections, its +1. > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, > fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, > mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261287#comment-17261287 ] Szilard Nemeth commented on YARN-10427: --- Hi [~werd.up], I'm glad that you found it useful. > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, > fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, > mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253892#comment-17253892 ] Drew Merrill commented on YARN-10427: - Hi [~snemeth], wow, your response is amazing! I need to set aside a good chunk of time to digest it in its entirety and actually work through the debugging procedure you went through, step-by-step. But I just want to express my sincere gratitude for putting the time and energy into crafting such a detailed and instructive follow-up that both confirmed my findings, while also showing in great detail and clarity the steps you took to identify the source of the problem along with possible solutions. Thank you! > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, YARN-10427.002.patch, YARN-10427.003.patch, > fair-scheduler.xml, inputsls.json, jobruntime.csv, jobruntime.csv, > mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253693#comment-17253693 ] Hadoop QA commented on YARN-10427: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 52s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 59s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 49s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/411/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 22 unchanged - 1 fixed = 23 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 40s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {col
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253654#comment-17253654 ] Hadoop QA commented on YARN-10427: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 35s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 5s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 9s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 48s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/410/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 1 new + 22 unchanged - 1 fixed = 23 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 47s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {col
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253608#comment-17253608 ] Hadoop QA commented on YARN-10427: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 10s{color} | {color:red}{color} | {color:red} YARN-10427 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-10427 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13017522/YARN-10427.002.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/409/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, YARN-10427.002.patch, fair-scheduler.xml, > inputsls.json, jobruntime.csv, jobruntime.csv, mapred-site.xml, > sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253604#comment-17253604 ] Szilard Nemeth commented on YARN-10427: --- Accidentally attached a patch that also contains all the logging. Adding a second patch with just the fix. > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-10427-sls-scriptsandlogs.tar.gz, > YARN-10427.001.patch, fair-scheduler.xml, inputsls.json, jobruntime.csv, > jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253589#comment-17253589 ] Szilard Nemeth commented on YARN-10427: --- Hi [~werd.up], Thanks for reporting this issue and congratulations for the first reported Hadoop YARN jira. {quote}In the process of attempting to verify and validate the SLS output, I've encountered a number of issues including runtime exceptions and bad output. {quote} I read through your observations and spent some time to play around with SLS. If you encountered other issues, please report other jiras if you have some time. As the process of running SLS involved some repetitive tasks like uploading configs to the remote machine, launch SLS, save the resulted logs..., I created some scripts into my public Github repo here: [https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427] Let me break summarize what are these scripts are doing: 1. [config dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/config]: This is the exact same configuration file set that you attached to this jira, with one exception of the log4j.properties file, that turns on DEBUG logging for SLS. 2. [upstream-patches dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/upstream-patches]: This is the directory of the logging patch that helped me see the issues more clearly. My code changes are also pushed to my Hadoop fork: [https://github.com/szilard-nemeth/hadoop/tree/YARN-10427-investigation] 3. [scripts dir|https://github.com/szilard-nemeth/linux-env/tree/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts]: This is the directory that contains all my scripts to build Hadoop + launch SLS and save produced logs to the local machine. As I have been working on a remote cluster, there's a script called [setup-vars-upstream.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/setup-vars-upstream.sh] that contains some configuration values for the remote cluster + some local directories. If you want to use the scripts, all you need to do is to replace the configs in this file according to your environment. 3.1 [build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh]: This is the script that builds Hadoop according to the environment variables and launches the SLS suite on the remote cluster. 3.2 [start-sls.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/start-sls.sh]: This is the most important script as this will be executed on the remote machine. I think the script itself is straightforward enough, but let me briefly list what it does: - This script assumes that the Hadoop dist package is copied to the remote machine (this was done by [build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh]) - Cleans up all Hadoop-related directories and extracts the Hadoop dist tar.gz - Copies the config to Hadoop's config dirs so SLS will use these particular configs - Launches SLS by starting slsrun.sh with the appropriate CLI swithces - Greps for some useful data in the resulted SLS log file. 3.3 [launch-sls.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/launch-sls.sh]: This script is executed by [build-and-launch.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/build-and-launch-sls.sh] as its last step. Once the start-sls.sh is finished, the [save-latest-sls-logs.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/save-latest-sls-logs.sh] script is started. As the name implies it saves the latest SLS log dir and SCPs it to the local machine. The target directory of the local machine is determined by the config ([setup-vars-upstream.sh|https://github.com/szilard-nemeth/linux-env/blob/ff84652b34bc23c1f88766f781f6648365becde5/workplace-specific/cloudera/investigations/YARN-10427/scripts/setup-vars-upstream.sh]). *The late
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237866#comment-17237866 ] Drew Merrill commented on YARN-10427: - Yes it does. Output attached. [^jobruntime.csv] > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Priority: Major > Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv, > jobruntime.csv, jobruntime.csv, mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237288#comment-17237288 ] Agshin Kazimli commented on YARN-10427: --- Does this issue persist with the following command? bin/slsrun.sh --input-sls=/inputsls.json --output-dir= > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Priority: Major > Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv, > mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10427) Duplicate Job IDs in SLS output
[ https://issues.apache.org/jira/browse/YARN-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235876#comment-17235876 ] Drew Merrill commented on YARN-10427: - _*Anyone?*_ *I'd really appreciate a response from someone on this.* _*A developer? A fellow user? A computer?*_ Have I not included enough info or the right info needed to investigate this? If so, please let me know! *At the very least, can someone else please _{color:#FF}confirm{color}_ _{color:#FF}that the issue with duplicate Job IDs is reproducible?{color}_ {color:#FF}{color:#172b4d}It's frustrating and stressful not knowing if the problem is due to something that _I'm doing wrong_ or if it's a bug in Hadoop.{color}{color}* *{color:#FF}{color:#172b4d}There's either a teachable moment here where I can learn what I'm doing wrong or else an opportunity to identify and fix a bug in Hadoop. Both are good outcomes!{color}{color}* > Duplicate Job IDs in SLS output > --- > > Key: YARN-10427 > URL: https://issues.apache.org/jira/browse/YARN-10427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.0.0, 3.3.0, 3.2.1, 3.4.0 > Environment: I ran the attached inputs on my MacBook Pro, using > Hadoop compiled from the latest trunk (as of commit 139a43e98e). I also > tested against 3.2.1 and 3.3.0 release branches. > >Reporter: Drew Merrill >Priority: Major > Attachments: fair-scheduler.xml, inputsls.json, jobruntime.csv, > mapred-site.xml, sls-runner.xml, yarn-site.xml > > > Hello, I'm hoping someone can help me resolve or understand some issues I've > been having with the YARN Scheduler Load Simulator (SLS). I've been > experimenting with SLS for several months now at work as we're trying to > build a simulation model to characterize our enterprise Hadoop infrastructure > for purposes of future capacity planning. In the process of attempting to > verify and validate the SLS output, I've encountered a number of issues > including runtime exceptions and bad output. The focus of this issue is the > bad output. In all my simulation runs, the jobruntime.csv output seems to > have one or more of the following problems: no output, duplicate job ids, > and/or missing job ids. > > Because of where I work, I'm unable to provide the exact inputs I typically > use, but I'm able to reproduce the problem of the duplicate Job IDS using > some simplified inputs and configuration files, which I've attached, along > with the output I obtained. > > The command I used to run the simulation: > {{./runsls.sh --tracetype=SLS --tracelocation=./inputsls.json > --output-dir=sls-run-1 --print-simulation > --track-jobs=job_1,job_2,job_3,job_4,job_5,job_6,job_7,job_8,job_9,job_10}} > > Can anyone help me understand what would cause the duplicate Job IDs in the > output? Is this a bug in Hadoop or a problem with my inputs? Thanks in > advance. > > PS: This is my first issue I've ever opened so please be kind if I've missed > something or am not understanding something obvious about the way Hadoop > works. I'll gladly follow-up with more info as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org