[jira] [Commented] (YARN-6645) Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024312#comment-16024312 ] Bingxue Qiu commented on YARN-6645: --- hi, [~cheersyang], we backport the YARN-1503 to hadoop 2.8 in our clusters. for this exception, we create the nmPrivateDir in writeScriptToNMPrivateDir method like this, please feel free to give me some suggestion, Thank you! private File writeScriptToNMPrivateDir(String nmPrivateDir, String command) throws IOException { File file = new File(nmPrivateDir); if (!file.mkdirs()) { if (!file.exists()) { LOG.error("Failed to create nmPrivate dir " + file); } } File tmp = File.createTempFile("cmd_", "_tmp", new File(nmPrivateDir)); Writer writer = new OutputStreamWriter(new FileOutputStream(tmp), "UTF-8"); PrintWriter printWriter = new PrintWriter(writer); printWriter.print(command); printWriter.close(); return tmp; } > Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor > --- > > Key: YARN-6645 > URL: https://issues.apache.org/jira/browse/YARN-6645 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bingxue Qiu > Fix For: 2.9.0 > > Attachments: error when creating symlink.png > > > when creating symlink after the resource localized in our clusters , an > IOException has been thrown, because the nmPrivateDir doesn't exist. we add a > patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6111) Rumen input does't work in SLS
[ https://issues.apache.org/jira/browse/YARN-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024353#comment-16024353 ] YuJie Huang commented on YARN-6111: --- Yufei Gu, There are some strange symbols in your patch like: @@ -1,4 +1,4 @@ -[{ +{ "priority" : "NORMAL", "jobID" : "job_1369942127770_1205", "user" : "jenkins", @@ -5078,7 +5078,8 @@ "clusterReduceMB" : -1, "jobMapMB" : 200, "jobReduceMB" : 200 -}, { Is there something wrong with my opening way or I just need to remove this kind of symbols (@,-, -5078...)? > Rumen input does't work in SLS > -- > > Key: YARN-6111 > URL: https://issues.apache.org/jira/browse/YARN-6111 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 > Environment: ubuntu14.0.4 os >Reporter: YuJie Huang >Assignee: Yufei Gu > Labels: test > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6111.001.patch > > > Hi guys, > I am trying to learn the use of SLS. > I would like to get the file realtimetrack.json, but this it only > contains "[]" at the end of a simulation. This is the command I use to > run the instance: > HADOOP_HOME $ bin/slsrun.sh --input-rumen=sample-data/2jobsmin-rumen-jh.json > --output-dir=sample-data > All other files, including metrics, appears to be properly populated.I can > also trace with web:http://localhost:10001/simulate > Can someone help? > Thanks -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6577) Remove unused ContainerLocalization classes
[ https://issues.apache.org/jira/browse/YARN-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-6577: Fix Version/s: 2.9.0 > Remove unused ContainerLocalization classes > --- > > Key: YARN-6577 > URL: https://issues.apache.org/jira/browse/YARN-6577 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 3.0.0-alpha2 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3 > > Attachments: YARN-6577.001.patch > > > From 2.7.3 and 3.0.0-alpha2, the ContainerLocalization interface and the > ContainerLocalizationImpl implementation class are of no use, and I recommend > removing the useless interface and implementation classes -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6111) Rumen input does't work in SLS
[ https://issues.apache.org/jira/browse/YARN-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024368#comment-16024368 ] YuJie Huang commented on YARN-6111: --- Only two jobs in 2jobs2min-rumen-jh.json file in Hadoop-2.7.3 and its format is {job1} {job2} but not [{job1},{job2}], but the realtimetrack.json file still only has "[]". > Rumen input does't work in SLS > -- > > Key: YARN-6111 > URL: https://issues.apache.org/jira/browse/YARN-6111 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 > Environment: ubuntu14.0.4 os >Reporter: YuJie Huang >Assignee: Yufei Gu > Labels: test > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6111.001.patch > > > Hi guys, > I am trying to learn the use of SLS. > I would like to get the file realtimetrack.json, but this it only > contains "[]" at the end of a simulation. This is the command I use to > run the instance: > HADOOP_HOME $ bin/slsrun.sh --input-rumen=sample-data/2jobsmin-rumen-jh.json > --output-dir=sample-data > All other files, including metrics, appears to be properly populated.I can > also trace with web:http://localhost:10001/simulate > Can someone help? > Thanks -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6141) ppc64le on Linux doesn't trigger __linux get_executable codepath
[ https://issues.apache.org/jira/browse/YARN-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024373#comment-16024373 ] Akira Ajisaka commented on YARN-6141: - LGTM, +1. Checking this in. > ppc64le on Linux doesn't trigger __linux get_executable codepath > > > Key: YARN-6141 > URL: https://issues.apache.org/jira/browse/YARN-6141 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 > Environment: $ uname -a > Linux f8eef0f055cf 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 > 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux >Reporter: Sonia Garudi >Assignee: Ayappan > Labels: ppc64le > Attachments: YARN-6141.patch > > > On ppc64le architecture, the build fails in the 'Hadoop YARN NodeManager' > project with the below error: > Cannot safely determine executable path with a relative HADOOP_CONF_DIR on > this operating system. > [WARNING] #error Cannot safely determine executable path with a relative > HADOOP_CONF_DIR on this operating system. > [WARNING] ^ > [WARNING] make[2]: *** > [CMakeFiles/container.dir/main/native/container-executor/impl/get_executable.c.o] > Error 1 > [WARNING] make[2]: *** Waiting for unfinished jobs > [WARNING] make[1]: *** [CMakeFiles/container.dir/all] Error 2 > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > Cmake version used : > $ /usr/bin/cmake --version > cmake version 2.8.12.2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6141) ppc64le on Linux doesn't trigger __linux get_executable codepath
[ https://issues.apache.org/jira/browse/YARN-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024396#comment-16024396 ] Hudson commented on YARN-6141: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11779 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11779/]) YARN-6141. ppc64le on Linux doesn't trigger __linux get_executable (aajisaka: rev bc28da65fb1c67904aa3cefd7273cb7423521014) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/get_executable.c > ppc64le on Linux doesn't trigger __linux get_executable codepath > > > Key: YARN-6141 > URL: https://issues.apache.org/jira/browse/YARN-6141 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 > Environment: $ uname -a > Linux f8eef0f055cf 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 > 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux >Reporter: Sonia Garudi >Assignee: Ayappan > Labels: ppc64le > Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3 > > Attachments: YARN-6141.patch > > > On ppc64le architecture, the build fails in the 'Hadoop YARN NodeManager' > project with the below error: > Cannot safely determine executable path with a relative HADOOP_CONF_DIR on > this operating system. > [WARNING] #error Cannot safely determine executable path with a relative > HADOOP_CONF_DIR on this operating system. > [WARNING] ^ > [WARNING] make[2]: *** > [CMakeFiles/container.dir/main/native/container-executor/impl/get_executable.c.o] > Error 1 > [WARNING] make[2]: *** Waiting for unfinished jobs > [WARNING] make[1]: *** [CMakeFiles/container.dir/all] Error 2 > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > Cmake version used : > $ /usr/bin/cmake --version > cmake version 2.8.12.2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6646) Modifier 'static' is redundant for inner enums less
ZhangBing Lin created YARN-6646: --- Summary: Modifier 'static' is redundant for inner enums less Key: YARN-6646 URL: https://issues.apache.org/jira/browse/YARN-6646 Project: Hadoop YARN Issue Type: Bug Reporter: ZhangBing Lin Assignee: ZhangBing Lin Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6141) ppc64le on Linux doesn't trigger __linux get_executable codepath
[ https://issues.apache.org/jira/browse/YARN-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024434#comment-16024434 ] Ayappan commented on YARN-6141: --- Thanks [~ajisakaa] > ppc64le on Linux doesn't trigger __linux get_executable codepath > > > Key: YARN-6141 > URL: https://issues.apache.org/jira/browse/YARN-6141 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 > Environment: $ uname -a > Linux f8eef0f055cf 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 > 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux >Reporter: Sonia Garudi >Assignee: Ayappan > Labels: ppc64le > Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3 > > Attachments: YARN-6141.patch > > > On ppc64le architecture, the build fails in the 'Hadoop YARN NodeManager' > project with the below error: > Cannot safely determine executable path with a relative HADOOP_CONF_DIR on > this operating system. > [WARNING] #error Cannot safely determine executable path with a relative > HADOOP_CONF_DIR on this operating system. > [WARNING] ^ > [WARNING] make[2]: *** > [CMakeFiles/container.dir/main/native/container-executor/impl/get_executable.c.o] > Error 1 > [WARNING] make[2]: *** Waiting for unfinished jobs > [WARNING] make[1]: *** [CMakeFiles/container.dir/all] Error 2 > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > Cmake version used : > $ /usr/bin/cmake --version > cmake version 2.8.12.2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6646) Modifier 'static' is redundant for inner enums less
[ https://issues.apache.org/jira/browse/YARN-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated YARN-6646: Attachment: YARN-6646.001.patch > Modifier 'static' is redundant for inner enums less > --- > > Key: YARN-6646 > URL: https://issues.apache.org/jira/browse/YARN-6646 > Project: Hadoop YARN > Issue Type: Bug >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: YARN-6646.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6141) ppc64le on Linux doesn't trigger __linux get_executable codepath
[ https://issues.apache.org/jira/browse/YARN-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024445#comment-16024445 ] Sonia Garudi commented on YARN-6141: Thanks [~ajisakaa] . > ppc64le on Linux doesn't trigger __linux get_executable codepath > > > Key: YARN-6141 > URL: https://issues.apache.org/jira/browse/YARN-6141 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 > Environment: $ uname -a > Linux f8eef0f055cf 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 > 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux >Reporter: Sonia Garudi >Assignee: Ayappan > Labels: ppc64le > Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3 > > Attachments: YARN-6141.patch > > > On ppc64le architecture, the build fails in the 'Hadoop YARN NodeManager' > project with the below error: > Cannot safely determine executable path with a relative HADOOP_CONF_DIR on > this operating system. > [WARNING] #error Cannot safely determine executable path with a relative > HADOOP_CONF_DIR on this operating system. > [WARNING] ^ > [WARNING] make[2]: *** > [CMakeFiles/container.dir/main/native/container-executor/impl/get_executable.c.o] > Error 1 > [WARNING] make[2]: *** Waiting for unfinished jobs > [WARNING] make[1]: *** [CMakeFiles/container.dir/all] Error 2 > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > Cmake version used : > $ /usr/bin/cmake --version > cmake version 2.8.12.2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6646) Modifier 'static' is redundant for inner enums less
[ https://issues.apache.org/jira/browse/YARN-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated YARN-6646: Affects Version/s: 3.0.0-alpha3 > Modifier 'static' is redundant for inner enums less > --- > > Key: YARN-6646 > URL: https://issues.apache.org/jira/browse/YARN-6646 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha3 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: YARN-6646.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6644) The demand of FSAppAttempt may be negative
[ https://issues.apache.org/jira/browse/YARN-6644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024538#comment-16024538 ] Feng Yuan commented on YARN-6644: - Hi, [~jack zhou] this is because before 2.8, Resource in FSAppAttempt#demand use Int to display memory,so it will let int overflow Integer.Max. check this issue YARN-6020 > The demand of FSAppAttempt may be negative > --- > > Key: YARN-6644 > URL: https://issues.apache.org/jira/browse/YARN-6644 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 > Environment: CentOS release 6.7 (Final) >Reporter: JackZhou > Fix For: 2.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6644) The demand of FSAppAttempt may be negative
[ https://issues.apache.org/jira/browse/YARN-6644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024581#comment-16024581 ] JackZhou commented on YARN-6644: Thank you, yufei. I find my problem is the same with YARN-6020. So my problem is solve, thanks a lot. > The demand of FSAppAttempt may be negative > --- > > Key: YARN-6644 > URL: https://issues.apache.org/jira/browse/YARN-6644 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 > Environment: CentOS release 6.7 (Final) >Reporter: JackZhou > Fix For: 2.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6644) The demand of FSAppAttempt may be negative
[ https://issues.apache.org/jira/browse/YARN-6644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024582#comment-16024582 ] JackZhou commented on YARN-6644: [~Feng Yuan] Thanks a lot. > The demand of FSAppAttempt may be negative > --- > > Key: YARN-6644 > URL: https://issues.apache.org/jira/browse/YARN-6644 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 > Environment: CentOS release 6.7 (Final) >Reporter: JackZhou > Fix For: 2.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6646) Modifier 'static' is redundant for inner enums less
[ https://issues.apache.org/jira/browse/YARN-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024617#comment-16024617 ] Hadoop QA commented on YARN-6646: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 55s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 36s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 50s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 40s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 53s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 147 unchanged - 14 fixed = 149 total (was 161) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 57s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 8s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 38m 58s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 16s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 39s{color} | {color:green} hadoop-yarn-applications-distributedshell in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}159m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests ||
[jira] [Commented] (YARN-6646) Modifier 'static' is redundant for inner enums less
[ https://issues.apache.org/jira/browse/YARN-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024621#comment-16024621 ] ZhangBing Lin commented on YARN-6646: - Through the log,Unit test failure and FindBugs are not caused by this patch > Modifier 'static' is redundant for inner enums less > --- > > Key: YARN-6646 > URL: https://issues.apache.org/jira/browse/YARN-6646 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha3 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: YARN-6646.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024814#comment-16024814 ] Kuhu Shukla commented on YARN-6641: --- Minor checkstyle issues. Will fix in upcoming patches. Request for review on the approach and any concerns with this change. [~jlowe]/ [~nroberts]. > Non-public resource localization on a bad disk causes subsequent containers > failure > --- > > Key: YARN-6641 > URL: https://issues.apache.org/jira/browse/YARN-6641 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-6641.001.patch, YARN-6641.002.patch > > > YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} > call to allow checking an already localized resource against the list of > good/full directories. > Since LocalResourcesTrackerImpl instantiations for app level resources and > private resources do not use the new constructor, such resources that are on > bad disk will never be checked against good dirs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024818#comment-16024818 ] Jason Lowe commented on YARN-6641: -- Thanks for the patch! Patch looks good overall. At this point the only callers of the LocalResourcesTrackerImpl constructor that omits the directory handler are tests, and I think it would be better to simply remove this constructor and update the few places in the tests to explicitly pass null. That way future maintainers won't be lulled into thinking it's OK to call the constructor without a handler, since it clearly needs a dir handler to properly deal with resources that get orphaned on bad disks. > Non-public resource localization on a bad disk causes subsequent containers > failure > --- > > Key: YARN-6641 > URL: https://issues.apache.org/jira/browse/YARN-6641 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-6641.001.patch, YARN-6641.002.patch > > > YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} > call to allow checking an already localized resource against the list of > good/full directories. > Since LocalResourcesTrackerImpl instantiations for app level resources and > private resources do not use the new constructor, such resources that are on > bad disk will never be checked against good dirs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6390) Support service assembly
[ https://issues.apache.org/jira/browse/YARN-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi resolved YARN-6390. -- Resolution: Duplicate This will be completed as part of YARN-6613. > Support service assembly > > > Key: YARN-6390 > URL: https://issues.apache.org/jira/browse/YARN-6390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Jian He > > An assembly is a hierarchical app-of-apps. Say, an assembly could be a > combination of zookeeper + hbase + kafka. > This functionality was there in slider, need to re-implement this in the new > yarn-native-service framework. > Also, the new yarn-native-service UI needs to account for the assembly > concept. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression
[ https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024866#comment-16024866 ] Bibin A Chundatt commented on YARN-4925: We should also port dependent YARN-4140 > ContainerRequest in AMRMClient, application should be able to specify > nodes/racks together with nodeLabelExpression > --- > > Key: YARN-4925 > URL: https://issues.apache.org/jira/browse/YARN-4925 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: release-blocker > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch, > YARN-4925-branch-2.7.001.patch > > > Currently with nodelabel AMRMClient will not be able to specify nodelabels > with Node/Rack requests.For application like spark NODE_LOCAL requests cannot > be asked with label expression. > As per the check in {{AMRMClientImpl#checkNodeLabelExpression}} > {noformat} > // Don't allow specify node label against ANY request > if ((containerRequest.getRacks() != null && > (!containerRequest.getRacks().isEmpty())) > || > (containerRequest.getNodes() != null && > (!containerRequest.getNodes().isEmpty( { > throw new InvalidContainerRequestException( > "Cannot specify node label with rack and node"); > } > {noformat} > {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of > OFF-SWITCH. > The above check is not required for ContainerRequest ask /cc [~wangda] thank > you for confirming -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6647) ZKRMStateStore can crash during shutdown due to InterruptedException
Jason Lowe created YARN-6647: Summary: ZKRMStateStore can crash during shutdown due to InterruptedException Key: YARN-6647 URL: https://issues.apache.org/jira/browse/YARN-6647 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Noticed some tests were failing due to the JVM shutting down early. I was able to reproduce this occasionally with TestKillApplicationWithRMHA. Stacktrace to follow. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6647) ZKRMStateStore can crash during shutdown due to InterruptedException
[ https://issues.apache.org/jira/browse/YARN-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024901#comment-16024901 ] Jason Lowe commented on YARN-6647: -- Sample test output showing the mishandling of InterruptedException and a forced exit of the RM as a result. In this case it causes tests to error because the JVM exits without notifying the test framework. {noformat} 2017-05-25 10:23:45,835 INFO [Thread-50] zookeeper.JUnit4ZKTestRunner (JUnit4ZKTestRunner.java:evaluate(78)) - FINISHED TEST METHOD testKillAppWhenFailoverHappensAtNewState 2017-05-25 10:23:45,835 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: ResourceManager entered state STOPPED 2017-05-25 10:23:45,835 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - ResourceManager: stopping services, size=3 2017-05-25 10:23:45,835 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #2: Service Dispatcher in state Dispatcher: STARTED 2017-05-25 10:23:45,835 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2017-05-25 10:23:45,835 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@233aac83] util.JvmPauseMonitor (JvmPauseMonitor.java:run(188)) - Starting JVM pause monitor 2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #1: Service org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter in state org.apache.hadoop.yarn.server.res ourcemanager.ahs.RMApplicationHistoryWriter: STARTED 2017-05-25 10:23:45,836 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter entered state STOPPED 2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: stopping services, size=0 2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #0: Service org.apache.hadoop.yarn.server.resourcemanager.AdminService in state org.apache.hadoop.yarn.server.resourcemanager.Admin Service: STARTED 2017-05-25 10:23:45,836 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.AdminService entered state STOPPED 2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - org.apache.hadoop.yarn.server.resourcemanager.AdminService: stopping services, size=0 2017-05-25 10:23:45,836 INFO [main] resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1191)) - Already in standby state 2017-05-25 10:23:45,836 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: ResourceManager entered state STOPPED 2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - ResourceManager: stopping services, size=3 2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #2: Service org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter in state org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: STARTED 2017-05-25 10:23:45,836 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter entered state STOPPED 2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: stopping services, size=0 2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #1: Service org.apache.hadoop.yarn.server.resourcemanager.AdminService in state org.apache.hadoop.yarn.server.resourcemanager.AdminService: STARTED 2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.AdminService entered state STOPPED 2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - org.apache.hadoop.yarn.server.resourcemanager.AdminService: stopping services, size=0 2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #0: Service Dispatcher in state Dispatcher: STARTED 2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2017-05-25 10:23:45,837 INFO [main] resou
[jira] [Commented] (YARN-6643) TestRMFailover fails rarely due to port conflict
[ https://issues.apache.org/jira/browse/YARN-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024904#comment-16024904 ] Jason Lowe commented on YARN-6643: -- +1 lgtm. The unit tests that failed don't even call the code that was changed. I was able to reproduce one of the tests exiting early and filed YARN-6647. I'll commit this later today if there are no objections. > TestRMFailover fails rarely due to port conflict > > > Key: YARN-6643 > URL: https://issues.apache.org/jira/browse/YARN-6643 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-6643.001.patch > > > We've seen various tests in {{TestRMFailover}} fail very rarely with a > message like "org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.IOException: ResourceManager failed to start. Final state is > STOPPED". > After some digging, it turns out that it's due to a port conflict with the > embedded ZooKeeper in the tests. The embedded ZooKeeper uses > {{ServerSocketUtil#getPort}} to choose a free port, but the RMs are > configured to 1 + and 2 + (e.g. the > default port for the RM is 8032, so you'd use 18032 and 28032). > When I was able to reproduce this, I saw that ZooKeeper was using port 18033, > which is 1 + 8033, the default RM Admin port. It results in an error > like this, causing the RM to be unable to start, and hence the original error > message in the test failure: > {noformat} > 2017-05-24 01:16:52,735 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in > state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:158) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1147) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$2.run(MiniYARNCluster.java:310) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:720) > at org.apache.hadoop.ipc.Server.bind(Server.java:482) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:688) > at org.apache.hadoop.ipc.Server.(Server.java:2376) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1042) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:887) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132) > ... 9 more > Caused by: java.net.BindException: Address already in use >
[jira] [Updated] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-6641: -- Attachment: YARN-6641.003.patch Thanks [~jlowe] for the quick response. I have updated the patch. > Non-public resource localization on a bad disk causes subsequent containers > failure > --- > > Key: YARN-6641 > URL: https://issues.apache.org/jira/browse/YARN-6641 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-6641.001.patch, YARN-6641.002.patch, > YARN-6641.003.patch > > > YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} > call to allow checking an already localized resource against the list of > good/full directories. > Since LocalResourcesTrackerImpl instantiations for app level resources and > private resources do not use the new constructor, such resources that are on > bad disk will never be checked against good dirs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6644) The demand of FSAppAttempt may be negative
[ https://issues.apache.org/jira/browse/YARN-6644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu resolved YARN-6644. Resolution: Duplicate > The demand of FSAppAttempt may be negative > --- > > Key: YARN-6644 > URL: https://issues.apache.org/jira/browse/YARN-6644 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 > Environment: CentOS release 6.7 (Final) >Reporter: JackZhou > Fix For: 2.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6111) Rumen input does't work in SLS
[ https://issues.apache.org/jira/browse/YARN-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024979#comment-16024979 ] Yufei Gu commented on YARN-6111: [~yoyo], The patch file is a diff, which tells the different made for the repo. Try to play 'git', especially how to generate patch and apply patch. SLS in Hadoop-2.7.3 may be broken in some way, use the trunk instead. YARN-6608 tries to backport all SLS improvements from trunk to branch-2. You can try branch-2 after that. > Rumen input does't work in SLS > -- > > Key: YARN-6111 > URL: https://issues.apache.org/jira/browse/YARN-6111 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 > Environment: ubuntu14.0.4 os >Reporter: YuJie Huang >Assignee: Yufei Gu > Labels: test > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6111.001.patch > > > Hi guys, > I am trying to learn the use of SLS. > I would like to get the file realtimetrack.json, but this it only > contains "[]" at the end of a simulation. This is the command I use to > run the instance: > HADOOP_HOME $ bin/slsrun.sh --input-rumen=sample-data/2jobsmin-rumen-jh.json > --output-dir=sample-data > All other files, including metrics, appears to be properly populated.I can > also trace with web:http://localhost:10001/simulate > Can someone help? > Thanks -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6111) Rumen input does't work in SLS
[ https://issues.apache.org/jira/browse/YARN-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024979#comment-16024979 ] Yufei Gu edited comment on YARN-6111 at 5/25/17 5:18 PM: - [~yoyo], The patch file is a diff file, which tells the differences made for the repo. Try to play tool 'git', especially how to generate a patch and apply a patch. SLS in Hadoop-2.7.3 may be broken in some way, please use the trunk instead. YARN-6608 tries to backport all recent SLS improvements from trunk to branch-2. You can try branch-2 after that. was (Author: yufeigu): [~yoyo], The patch file is a diff, which tells the different made for the repo. Try to play 'git', especially how to generate patch and apply patch. SLS in Hadoop-2.7.3 may be broken in some way, use the trunk instead. YARN-6608 tries to backport all SLS improvements from trunk to branch-2. You can try branch-2 after that. > Rumen input does't work in SLS > -- > > Key: YARN-6111 > URL: https://issues.apache.org/jira/browse/YARN-6111 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 > Environment: ubuntu14.0.4 os >Reporter: YuJie Huang >Assignee: Yufei Gu > Labels: test > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6111.001.patch > > > Hi guys, > I am trying to learn the use of SLS. > I would like to get the file realtimetrack.json, but this it only > contains "[]" at the end of a simulation. This is the command I use to > run the instance: > HADOOP_HOME $ bin/slsrun.sh --input-rumen=sample-data/2jobsmin-rumen-jh.json > --output-dir=sample-data > All other files, including metrics, appears to be properly populated.I can > also trace with web:http://localhost:10001/simulate > Can someone help? > Thanks -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025057#comment-16025057 ] Hadoop QA commented on YARN-6641: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 46s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 250 unchanged - 1 fixed = 250 total (was 251) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 17s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 40m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6641 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869891/YARN-6641.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux cec92e0cea29 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2e41f88 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/16019/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16019/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16019/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Non-public resource localization on a bad disk causes
[jira] [Updated] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6555: Attachment: YARN-6555.003.patch Updated the patch fixing review comment from Haibo. > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6648) Add FederationStateStore interfaces for Global Policy Generator
Botong Huang created YARN-6648: -- Summary: Add FederationStateStore interfaces for Global Policy Generator Key: YARN-6648 URL: https://issues.apache.org/jira/browse/YARN-6648 Project: Hadoop YARN Issue Type: Task Reporter: Botong Huang Assignee: Botong Huang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025226#comment-16025226 ] Hadoop QA commented on YARN-6555: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 42s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 65 unchanged - 0 fixed = 66 total (was 65) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 17s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 32m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6555 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869901/YARN-6555.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux ee85dfafbee4 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 29b7df9 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/16020/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16020/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16020/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-proje
[jira] [Updated] (YARN-6648) Add FederationStateStore interfaces for Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-6648: - Issue Type: Sub-task (was: Task) Parent: YARN-5597 > Add FederationStateStore interfaces for Global Policy Generator > --- > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025341#comment-16025341 ] Kuhu Shukla commented on YARN-6641: --- [~jlowe], request for some more comments. Thanks a lot! > Non-public resource localization on a bad disk causes subsequent containers > failure > --- > > Key: YARN-6641 > URL: https://issues.apache.org/jira/browse/YARN-6641 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-6641.001.patch, YARN-6641.002.patch, > YARN-6641.003.patch > > > YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} > call to allow checking an already localized resource against the list of > good/full directories. > Since LocalResourcesTrackerImpl instantiations for app level resources and > private resources do not use the new constructor, such resources that are on > bad disk will never be checked against good dirs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression
[ https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024866#comment-16024866 ] Bibin A Chundatt edited comment on YARN-4925 at 5/25/17 8:39 PM: - We should also backport dependent YARN-4140 was (Author: bibinchundatt): We should also port dependent YARN-4140 > ContainerRequest in AMRMClient, application should be able to specify > nodes/racks together with nodeLabelExpression > --- > > Key: YARN-4925 > URL: https://issues.apache.org/jira/browse/YARN-4925 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: release-blocker > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch, > YARN-4925-branch-2.7.001.patch > > > Currently with nodelabel AMRMClient will not be able to specify nodelabels > with Node/Rack requests.For application like spark NODE_LOCAL requests cannot > be asked with label expression. > As per the check in {{AMRMClientImpl#checkNodeLabelExpression}} > {noformat} > // Don't allow specify node label against ANY request > if ((containerRequest.getRacks() != null && > (!containerRequest.getRacks().isEmpty())) > || > (containerRequest.getNodes() != null && > (!containerRequest.getNodes().isEmpty( { > throw new InvalidContainerRequestException( > "Cannot specify node label with rack and node"); > } > {noformat} > {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of > OFF-SWITCH. > The above check is not required for ContainerRequest ask /cc [~wangda] thank > you for confirming -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6649) RollingLevelDBTimelineServer throws RuntimeException if object decoding ever fails runtime exception
Jonathan Eagles created YARN-6649: - Summary: RollingLevelDBTimelineServer throws RuntimeException if object decoding ever fails runtime exception Key: YARN-6649 URL: https://issues.apache.org/jira/browse/YARN-6649 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6582) FSAppAttempt demand can be updated atomically in updateDemand()
[ https://issues.apache.org/jira/browse/YARN-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025381#comment-16025381 ] Yufei Gu commented on YARN-6582: Thanks [~kasha] for working on this. The patch looks good to me. Both {{getSchedulerKeys()}} and {{getPendingAsk}} are fine after removing the write lock. +1. > FSAppAttempt demand can be updated atomically in updateDemand() > --- > > Key: YARN-6582 > URL: https://issues.apache.org/jira/browse/YARN-6582 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: YARN-6582.001.patch > > > FSAppAttempt#updateDemand first sets demand to 0, and then adds up all the > outstanding requests. Instead, we could use another variable tmpDemand to > build the new value and atomically replace the demand. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025390#comment-16025390 ] Haibo Chen commented on YARN-6555: -- Thanks [~rohithsharma] for the explanation, and updating the patch! +1 on the latest patch. [~vrushalic] do you have any other comments? > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent
[ https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025425#comment-16025425 ] Eric Payne commented on YARN-5892: -- Thank you for the reviews. bq. allUsersTimesWeights will be less than 1. I think in this case UL value is higher [~sunilg], I think this is similar to the question I answered [above|https://issues.apache.org/jira/browse/YARN-5892?focusedCommentId=15972782&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15972782], but I'll restate it here for the sake of clarity. Having a combined sum of weights < 1 does not cause UL to be too large. This is because {{userLimitResource}} (the return value of {{computeUserLimit}}) is only ever used by {{getComputedResourceLimitFor\[Active|All\]Users}}, which then multiplies the value of {{userLimitResource}} by the appropriate user's weight before returning it. This will result in the correct value of userLimit for each specific user. When the sum of active user(s)'s weight(s) is < 1, then it is true that {{userLimitResource}} is larger than the actual number of resources used. However, {{userLimitResource}} is just an intermediate value. bq. In UserManager, do we also need to lock while updating "activeUsersTimesWeights". Can you please clarify where you see it being read or written outside the lock? I think the code is within locks everwhere it is used. bq. 1) Could you move CapacitySchedulerQueueManager#updateUserWeights to LeafQueue#setupQueueConfigs. [~leftnoteasy], good optimization. I will make this change, do testing and await [~sunilg]'s response before submitting a new patch. > Capacity Scheduler: Support user-specific minimum user limit percent > > > Key: YARN-5892 > URL: https://issues.apache.org/jira/browse/YARN-5892 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: Active users highlighted.jpg, YARN-5892.001.patch, > YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, > YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, > YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, > YARN-5892.012.patch, YARN-5892.013.patch > > > Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} > property is per queue. A cluster admin should be able to set the minimum user > limit percent on a per-user basis within the queue. > This functionality is needed so that when intra-queue preemption is enabled > (YARN-4945 / YARN-2113), some users can be deemed as more important than > other users, and resources from VIP users won't be as likely to be preempted. > For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user > {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed > 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like > this: > {code} > > > yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent > 25 > > > > yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent > 75 > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6582) FSAppAttempt demand can be updated atomically in updateDemand()
[ https://issues.apache.org/jira/browse/YARN-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025424#comment-16025424 ] Yufei Gu commented on YARN-6582: Committed to trunk and branch-2. Thanks [~kasha] for the patch. > FSAppAttempt demand can be updated atomically in updateDemand() > --- > > Key: YARN-6582 > URL: https://issues.apache.org/jira/browse/YARN-6582 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: YARN-6582.001.patch > > > FSAppAttempt#updateDemand first sets demand to 0, and then adds up all the > outstanding requests. Instead, we could use another variable tmpDemand to > build the new value and atomically replace the demand. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-2113: - Attachment: YARN-2113.branch-2.8.0019.patch [~sunilg], I took a shot at backporting YARN-2113.0019.patch to branch-2.8, decoupling it from YARN-5889. I am still running through the tests, but this seems to work fairly well so far. > Add cross-user preemption within CapacityScheduler's leaf-queue > --- > > Key: YARN-2113 > URL: https://issues.apache.org/jira/browse/YARN-2113 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Sunil G > Fix For: 3.0.0-alpha3 > > Attachments: IntraQueue Preemption-Impact Analysis.pdf, > TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt, > YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, > YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, > YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, > YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, > YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, > YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, > YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, > YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption > Behavior.pdf, YARN-2113.v0.patch > > > Preemption today only works across queues and moves around resources across > queues per demand and usage. We should also have user-level preemption within > a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6555) Enable flow context read (& corresponding write) for recovering application with NM restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025452#comment-16025452 ] Vrushali C commented on YARN-6555: -- No more comments, patch looks good. +1. [~haibo.chen] please feel free to commit it. > Enable flow context read (& corresponding write) for recovering application > with NM restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6484) [Documentation] Documenting the YARN Federation feature
[ https://issues.apache.org/jira/browse/YARN-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025456#comment-16025456 ] Subru Krishnan commented on YARN-6484: -- Thanks [~curino] for the solid documentation. I have a few minor comments: * Can you please add a sequence diagram for the *Job execution flow* as that'll make it much easier to understand. * It'll be good if we can also reuse the {{AMRMProxy}} internals diagram from our Hadoop summit talk. * We should call out that the _yarn.resourcemanager.cluster-id_ is the same as what's used for RM HA, i.e. we simply reuse the config. * I feel we should clarify that the _yarn.resourcemanager.epoch_ is unique per sub-cluster, _yarn.resourcemanager.cluster-id_ and having increments of 1000 will provide practical safety from ContainerId clashes. * NIt: {{Federation.md}} has a whitespace at the end. > [Documentation] Documenting the YARN Federation feature > --- > > Key: YARN-6484 > URL: https://issues.apache.org/jira/browse/YARN-6484 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Subru Krishnan >Assignee: Carlo Curino > Attachments: YARN-6484-YARN-2915.v0.patch, > YARN-6484-YARN-2915.v1.patch, YARN-6484-YARN-2915.v2.patch > > > We should document the high level design and configuration to enable YARN > Federation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6634) [API] Define an API for ResourceManager WebServices
[ https://issues.apache.org/jira/browse/YARN-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-6634: --- Attachment: YARN-6634.v3.patch > [API] Define an API for ResourceManager WebServices > --- > > Key: YARN-6634 > URL: https://issues.apache.org/jira/browse/YARN-6634 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Subru Krishnan >Assignee: Giovanni Matteo Fumarola >Priority: Critical > Attachments: YARN-6634.proto.patch, YARN-6634.v1.patch, > YARN-6634.v2.patch, YARN-6634.v3.patch > > > The RM exposes few REST queries but there's no clear API interface defined. > This makes it painful to build either clients or extension components like > Router (YARN-5412) that expose REST interfaces themselves. This jira proposes > adding a RM WebServices protocol similar to the one we have for RPC, i.e. > {{ApplicationClientProtocol}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6634) [API] Define an API for ResourceManager WebServices
[ https://issues.apache.org/jira/browse/YARN-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025463#comment-16025463 ] Giovanni Matteo Fumarola commented on YARN-6634: Fixed the Yetus warnings. TestRMWebServicesAppsModification failed due to the difference between appId and appid. TestRMRestart is not related. > [API] Define an API for ResourceManager WebServices > --- > > Key: YARN-6634 > URL: https://issues.apache.org/jira/browse/YARN-6634 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Subru Krishnan >Assignee: Giovanni Matteo Fumarola >Priority: Critical > Attachments: YARN-6634.proto.patch, YARN-6634.v1.patch, > YARN-6634.v2.patch, YARN-6634.v3.patch > > > The RM exposes few REST queries but there's no clear API interface defined. > This makes it painful to build either clients or extension components like > Router (YARN-5412) that expose REST interfaces themselves. This jira proposes > adding a RM WebServices protocol similar to the one we have for RPC, i.e. > {{ApplicationClientProtocol}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6650) ContainerTokenIdentifier is re-encoded during token verification
Jason Lowe created YARN-6650: Summary: ContainerTokenIdentifier is re-encoded during token verification Key: YARN-6650 URL: https://issues.apache.org/jira/browse/YARN-6650 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.8.0 Reporter: Jason Lowe A ContainerTokenIdentifier is serialized into bytes and signed by the RM secret key. When the NM needs to verify the identifier, it is decoding the bytes into a ContainerTokenIdentifier to get the key ID then re-encoding the identifier into a byte buffer to hash it with the key. This is fine as long as the RM and NM both agree how a ContainerTokenIdentifier should be serialized into bytes. However when the versions of the RM and NM are different and fields were added to the identifier between those versions then the NM may end up re-serializing the fields in a different order than the RM did, especially when there were gaps in the protocol field IDs that were filled in between the versions. If the fields are reordered during the re-encoding then the bytes will not match the original stream that was signed and the token verification will fail. The original token identifier bytes received via RPC need to be used by the verification process, not the bytes generated by re-encoding the identifier. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025487#comment-16025487 ] Hadoop QA commented on YARN-2113: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 4m 26s{color} | {color:red} Docker failed to build yetus/hadoop:5970e82. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-2113 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869941/YARN-2113.branch-2.8.0019.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16022/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add cross-user preemption within CapacityScheduler's leaf-queue > --- > > Key: YARN-2113 > URL: https://issues.apache.org/jira/browse/YARN-2113 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Sunil G > Fix For: 3.0.0-alpha3 > > Attachments: IntraQueue Preemption-Impact Analysis.pdf, > TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt, > YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, > YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, > YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, > YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, > YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, > YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, > YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, > YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption > Behavior.pdf, YARN-2113.v0.patch > > > Preemption today only works across queues and moves around resources across > queues per demand and usage. We should also have user-level preemption within > a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6641) Non-public resource localization on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025489#comment-16025489 ] Jason Lowe commented on YARN-6641: -- Thanks for updating the patch! One last thing I missed in the previous review, the new getDirsHandler method should be package-private like the other only-for-testing methods. > Non-public resource localization on a bad disk causes subsequent containers > failure > --- > > Key: YARN-6641 > URL: https://issues.apache.org/jira/browse/YARN-6641 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-6641.001.patch, YARN-6641.002.patch, > YARN-6641.003.patch > > > YARN-3591 added the {{checkLocalResource}} method to {{isResourcePresent()}} > call to allow checking an already localized resource against the list of > good/full directories. > Since LocalResourcesTrackerImpl instantiations for app level resources and > private resources do not use the new constructor, such resources that are on > bad disk will never be checked against good dirs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5531) UnmanagedAM pool manager for federating application across clusters
[ https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-5531: --- Attachment: YARN-5531-YARN-2915.v13.patch > UnmanagedAM pool manager for federating application across clusters > --- > > Key: YARN-5531 > URL: https://issues.apache.org/jira/browse/YARN-5531 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-5531-YARN-2915.v10.patch, > YARN-5531-YARN-2915.v11.patch, YARN-5531-YARN-2915.v12.patch, > YARN-5531-YARN-2915.v13.patch, YARN-5531-YARN-2915.v1.patch, > YARN-5531-YARN-2915.v2.patch, YARN-5531-YARN-2915.v3.patch, > YARN-5531-YARN-2915.v4.patch, YARN-5531-YARN-2915.v5.patch, > YARN-5531-YARN-2915.v6.patch, YARN-5531-YARN-2915.v7.patch, > YARN-5531-YARN-2915.v8.patch, YARN-5531-YARN-2915.v9.patch > > > One of the main tenets the YARN Federation is to *transparently* scale > applications across multiple clusters. This is achieved by running UAMs on > behalf of the application on other clusters. This JIRA tracks the addition of > a UnmanagedAM pool manager for federating application across clusters which > will be used the FederationInterceptor (YARN-3666) which is part of the > AMRMProxy pipeline introduced in YARN-2884. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6648) Add FederationStateStore interfaces for Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-6648: --- Attachment: YARN-6648-YARN-2915.v1.patch > Add FederationStateStore interfaces for Global Policy Generator > --- > > Key: YARN-6648 > URL: https://issues.apache.org/jira/browse/YARN-6648 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6648-YARN-2915.v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6643) TestRMFailover fails rarely due to port conflict
[ https://issues.apache.org/jira/browse/YARN-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025501#comment-16025501 ] Hudson commented on YARN-6643: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11784 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11784/]) YARN-6643. TestRMFailover fails rarely due to port conflict. Contributed (jlowe: rev 3fd6a2da4e537423d1462238e10cc9e1f698d1c2) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java > TestRMFailover fails rarely due to port conflict > > > Key: YARN-6643 > URL: https://issues.apache.org/jira/browse/YARN-6643 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.9.0, 3.0.0-alpha3, 2.8.2 > > Attachments: YARN-6643.001.patch > > > We've seen various tests in {{TestRMFailover}} fail very rarely with a > message like "org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.IOException: ResourceManager failed to start. Final state is > STOPPED". > After some digging, it turns out that it's due to a port conflict with the > embedded ZooKeeper in the tests. The embedded ZooKeeper uses > {{ServerSocketUtil#getPort}} to choose a free port, but the RMs are > configured to 1 + and 2 + (e.g. the > default port for the RM is 8032, so you'd use 18032 and 28032). > When I was able to reproduce this, I saw that ZooKeeper was using port 18033, > which is 1 + 8033, the default RM Admin port. It results in an error > like this, causing the RM to be unable to start, and hence the original error > message in the test failure: > {noformat} > 2017-05-24 01:16:52,735 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in > state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:158) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1147) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$2.run(MiniYARNCluster.java:310) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:720) > at org.apache.hadoop.ipc.Server.bind(Server.java:482) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:688) > at org.apache.hadoop.ipc.Server.(Server.java:2376) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1042) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:887) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryP
[jira] [Commented] (YARN-6582) FSAppAttempt demand can be updated atomically in updateDemand()
[ https://issues.apache.org/jira/browse/YARN-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025502#comment-16025502 ] Hudson commented on YARN-6582: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11784 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11784/]) YARN-6582. FSAppAttempt demand can be updated atomically in (yufei: rev 87590090c887829e874a7132be9cf8de061437d6) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java > FSAppAttempt demand can be updated atomically in updateDemand() > --- > > Key: YARN-6582 > URL: https://issues.apache.org/jira/browse/YARN-6582 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: YARN-6582.001.patch > > > FSAppAttempt#updateDemand first sets demand to 0, and then adds up all the > outstanding requests. Instead, we could use another variable tmpDemand to > build the new value and atomically replace the demand. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6650) ContainerTokenIdentifier is re-encoded during token verification
[ https://issues.apache.org/jira/browse/YARN-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025503#comment-16025503 ] Jason Lowe commented on YARN-6650: -- The decode then re-encode issue is not really specific to ContainerTokenIdentifier. Any token that is re-encoded in such a way where unknown fields are either omitted or not guaranteed to be serialized in the same order as done by the token creator could be problematic for upgrade scenarios. > ContainerTokenIdentifier is re-encoded during token verification > > > Key: YARN-6650 > URL: https://issues.apache.org/jira/browse/YARN-6650 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Affects Versions: 2.8.0 >Reporter: Jason Lowe > > A ContainerTokenIdentifier is serialized into bytes and signed by the RM > secret key. When the NM needs to verify the identifier, it is decoding the > bytes into a ContainerTokenIdentifier to get the key ID then re-encoding the > identifier into a byte buffer to hash it with the key. This is fine as long > as the RM and NM both agree how a ContainerTokenIdentifier should be > serialized into bytes. > However when the versions of the RM and NM are different and fields were > added to the identifier between those versions then the NM may end up > re-serializing the fields in a different order than the RM did, especially > when there were gaps in the protocol field IDs that were filled in between > the versions. If the fields are reordered during the re-encoding then the > bytes will not match the original stream that was signed and the token > verification will fail. > The original token identifier bytes received via RPC need to be used by the > verification process, not the bytes generated by re-encoding the identifier. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6528) Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
[ https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-6528: -- Attachment: YARN-6528.v005.patch YARN-6528.v005.patch fixes the findbugs error and the test failures. This patch has been tested on a single node cluster. > Add JMX metrics for Plan Follower and Agent Placement and Plan Operations > - > > Key: YARN-6528 > URL: https://issues.apache.org/jira/browse/YARN-6528 > Project: Hadoop YARN > Issue Type: Task >Reporter: Sean Po >Assignee: Sean Po > Attachments: YARN-6528.v001.patch, YARN-6528.v002.patch, > YARN-6528.v003.patch, YARN-6528.v004.patch, YARN-6528.v005.patch > > > YARN-1051 introduced a ReservationSytem that enables the YARN RM to handle > time explicitly, i.e. users can now "reserve" capacity ahead of time which is > predictably allocated to them. In order to understand in finer detail the > performance of Rayon, YARN-6528 proposes to include JMX metrics in the Plan > Follower, Agent Placement and Plan Operations components of Rayon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6634) [API] Define an API for ResourceManager WebServices
[ https://issues.apache.org/jira/browse/YARN-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025550#comment-16025550 ] Hadoop QA commented on YARN-6634: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 25 new + 4 unchanged - 36 fixed = 29 total (was 40) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 145 new + 852 unchanged - 22 fixed = 997 total (was 874) {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 14s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6634 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869943/YARN-6634.v3.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 75348e203b80 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2b5ad48 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16021/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/16021/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16021/artifact/patchp
[jira] [Created] (YARN-6651) Flow Activity page should specify 'metricstoretrieve' in its query to ATSv2 to get back CPU and memory
Haibo Chen created YARN-6651: Summary: Flow Activity page should specify 'metricstoretrieve' in its query to ATSv2 to get back CPU and memory Key: YARN-6651 URL: https://issues.apache.org/jira/browse/YARN-6651 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha2 Reporter: Haibo Chen When you click on Flow Acitivity => {a flow} => flow runs, the web server sends a REST query to ATSv2 TimelineReaderServer, but it does not include a query param 'metricstoretrieve" to get any metrics back. Instead, we should add '?metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' to the query to get CPU and MEMORY back. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6651) Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to retrieve CPU and memory
[ https://issues.apache.org/jira/browse/YARN-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6651: - Summary: Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to retrieve CPU and memory (was: Flow Activity page should specify 'metricstoretrieve' in its query to ATSv2 to get back CPU and memory ) > Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to > retrieve CPU and memory > -- > > Key: YARN-6651 > URL: https://issues.apache.org/jira/browse/YARN-6651 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen > > When you click on Flow Acitivity => {a flow} => flow runs, the web server > sends a REST query to ATSv2 TimelineReaderServer, but it does not include a > query param 'metricstoretrieve" to get any metrics back. > Instead, we should add > '?metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' to the > query to get CPU and MEMORY back. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6651) Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to retrieve CPU and memory
[ https://issues.apache.org/jira/browse/YARN-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6651: - Issue Type: Task (was: Bug) > Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to > retrieve CPU and memory > -- > > Key: YARN-6651 > URL: https://issues.apache.org/jira/browse/YARN-6651 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen > > When you click on Flow Acitivity => {a flow} => flow runs, the web server > sends a REST query to ATSv2 TimelineReaderServer, but it does not include a > query param 'metricstoretrieve" to get any metrics back. > Instead, we should add > '?metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' to the > query to get CPU and MEMORY back. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6651) Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to retrieve CPU and memory
[ https://issues.apache.org/jira/browse/YARN-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6651: - Description: When you click on Flow Acitivity => \{a flow\} => flow runs, the web server sends a REST query to ATSv2 TimelineReaderServer, but it does not include a query param 'metricstoretrieve" to get any metrics back. Instead, we should add '?metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' to the query to get CPU and MEMORY back. was: When you click on Flow Acitivity => {a flow} => flow runs, the web server sends a REST query to ATSv2 TimelineReaderServer, but it does not include a query param 'metricstoretrieve" to get any metrics back. Instead, we should add '?metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' to the query to get CPU and MEMORY back. > Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to > retrieve CPU and memory > -- > > Key: YARN-6651 > URL: https://issues.apache.org/jira/browse/YARN-6651 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen > > When you click on Flow Acitivity => \{a flow\} => flow runs, the web server > sends a REST query to ATSv2 TimelineReaderServer, but it does not include a > query param 'metricstoretrieve" to get any metrics back. > Instead, we should add > '?metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' to the > query to get CPU and MEMORY back. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6651) Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to retrieve CPU and memory
[ https://issues.apache.org/jira/browse/YARN-6651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6651: - Issue Type: Sub-task (was: Task) Parent: YARN-3368 > Flow Activity should specify 'metricstoretrieve' in its query to ATSv2 to > retrieve CPU and memory > -- > > Key: YARN-6651 > URL: https://issues.apache.org/jira/browse/YARN-6651 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen > > When you click on Flow Acitivity => {a flow} => flow runs, the web server > sends a REST query to ATSv2 TimelineReaderServer, but it does not include a > query param 'metricstoretrieve" to get any metrics back. > Instead, we should add > '?metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' to the > query to get CPU and MEMORY back. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6652) Merge flow info and flow runs
Haibo Chen created YARN-6652: Summary: Merge flow info and flow runs Key: YARN-6652 URL: https://issues.apache.org/jira/browse/YARN-6652 Project: Hadoop YARN Issue Type: Improvement Components: yarn-ui-v2 Affects Versions: 3.0.0-alpha2 Reporter: Haibo Chen If a user clicks on a flow from the flow activity page, Flow Run and Flow Info are shown separately. Usually, users want to go to individual flow runs. With the current work flow, the user will need to click on Flow Run because Flow Info is selected by default. Given that Flow Info does not have much information, It'd be a nice improvement if we can show flow info and flow run together, that is, one section at the top containing flow info, another section at the bottom containing the flow runs -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6652) Merge flow info and flow runs
[ https://issues.apache.org/jira/browse/YARN-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6652: - Issue Type: Sub-task (was: Improvement) Parent: YARN-3368 > Merge flow info and flow runs > - > > Key: YARN-6652 > URL: https://issues.apache.org/jira/browse/YARN-6652 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen > > If a user clicks on a flow from the flow activity page, Flow Run and Flow > Info are shown separately. Usually, users want to go to individual flow runs. > With the current work flow, the user will need to click on Flow Run because > Flow Info is selected by default. > Given that Flow Info does not have much information, It'd be a nice > improvement if we can show flow info and flow run together, that is, one > section at the top containing flow info, another section at the bottom > containing the flow runs -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6246) Identifying starved apps does not need the scheduler writelock
[ https://issues.apache.org/jira/browse/YARN-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025578#comment-16025578 ] Daniel Templeton commented on YARN-6246: LGTM. Fix your javadoc error and the checkstyle issue, and I'm happy. > Identifying starved apps does not need the scheduler writelock > -- > > Key: YARN-6246 > URL: https://issues.apache.org/jira/browse/YARN-6246 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: YARN-6246.001.patch, YARN-6246.002.patch, > YARN-6246.003.patch, YARN-6246.004.patch > > > Currently, the starvation checks are done holding the scheduler writelock. We > are probably better of doing this outside. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6643) TestRMFailover fails rarely due to port conflict
[ https://issues.apache.org/jira/browse/YARN-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025582#comment-16025582 ] Robert Kanter commented on YARN-6643: - Thanks Jason! > TestRMFailover fails rarely due to port conflict > > > Key: YARN-6643 > URL: https://issues.apache.org/jira/browse/YARN-6643 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.9.0, 3.0.0-alpha3 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.9.0, 3.0.0-alpha3, 2.8.2 > > Attachments: YARN-6643.001.patch > > > We've seen various tests in {{TestRMFailover}} fail very rarely with a > message like "org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.IOException: ResourceManager failed to start. Final state is > STOPPED". > After some digging, it turns out that it's due to a port conflict with the > embedded ZooKeeper in the tests. The embedded ZooKeeper uses > {{ServerSocketUtil#getPort}} to choose a free port, but the RMs are > configured to 1 + and 2 + (e.g. the > default port for the RM is 8032, so you'd use 18032 and 28032). > When I was able to reproduce this, I saw that ZooKeeper was using port 18033, > which is 1 + 8033, the default RM Admin port. It results in an error > like this, causing the RM to be unable to start, and hence the original error > message in the test failure: > {noformat} > 2017-05-24 01:16:52,735 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in > state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:158) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1147) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$2.run(MiniYARNCluster.java:310) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:18033] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:720) > at org.apache.hadoop.ipc.Server.bind(Server.java:482) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:688) > at org.apache.hadoop.ipc.Server.(Server.java:2376) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1042) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:887) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132) > ... 9 more > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) >
[jira] [Created] (YARN-6653) Retrieve CPU and MEMORY metrics for applications in a flow run
Haibo Chen created YARN-6653: Summary: Retrieve CPU and MEMORY metrics for applications in a flow run Key: YARN-6653 URL: https://issues.apache.org/jira/browse/YARN-6653 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha2 Reporter: Haibo Chen Similarly to YARN-6651, 'metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' can be added to the web ui query fired by a user listing all applications in a flow run. CPU and MEMORY can be retrieved this way. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6653) Retrieve CPU and MEMORY metrics for applications in a flow run
[ https://issues.apache.org/jira/browse/YARN-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6653: - Issue Type: Sub-task (was: Bug) Parent: YARN-3368 > Retrieve CPU and MEMORY metrics for applications in a flow run > -- > > Key: YARN-6653 > URL: https://issues.apache.org/jira/browse/YARN-6653 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen > > Similarly to YARN-6651, > 'metricstoretrieve=YARN_APPLICATION_CPU,YARN_APPLICATION_MEMORY' can be added > to the web ui query fired by a user listing all applications in a flow run. > CPU and MEMORY can be retrieved this way. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5531) UnmanagedAM pool manager for federating application across clusters
[ https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025603#comment-16025603 ] Hadoop QA commented on YARN-5531: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 15s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 18s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 29s{color} | {color:green} YARN-2915 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 5s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in YARN-2915 has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 51s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in YARN-2915 has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} YARN-2915 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 49 unchanged - 1 fixed = 49 total (was 50) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common generated 1 new + 162 unchanged - 0 fixed = 163 total (was 162) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 15s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 38m 56s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}122m 57s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-5531 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869947/YARN-5531-YARN-2915.v13.patch | | Optio
[jira] [Commented] (YARN-6528) Add JMX metrics for Plan Follower and Agent Placement and Plan Operations
[ https://issues.apache.org/jira/browse/YARN-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025606#comment-16025606 ] Hadoop QA commented on YARN-6528: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 29s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 29 new + 448 unchanged - 2 fixed = 477 total (was 450) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 9s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.reservation.planning.TestSimpleCapacityReplanner | | | hadoop.yarn.server.resourcemanager.reservation.TestInMemoryPlan | | | hadoop.yarn.server.resourcemanager.reservation.planning.TestGreedyReservationAgent | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy | | | hadoop.yarn.server.resourcemanager.reservation.planning.TestAlignedPlanner | | | hadoop.yarn.server.resourcemanager.reservation.TestNoOverCommitPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6528 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869954/YARN-6528.v005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f3a11d9e60fb 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2b5ad48 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16025/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job
[jira] [Updated] (YARN-5531) UnmanagedAM pool manager for federating application across clusters
[ https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-5531: --- Attachment: YARN-5531-YARN-2915.v14.patch > UnmanagedAM pool manager for federating application across clusters > --- > > Key: YARN-5531 > URL: https://issues.apache.org/jira/browse/YARN-5531 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Botong Huang > Attachments: YARN-5531-YARN-2915.v10.patch, > YARN-5531-YARN-2915.v11.patch, YARN-5531-YARN-2915.v12.patch, > YARN-5531-YARN-2915.v13.patch, YARN-5531-YARN-2915.v14.patch, > YARN-5531-YARN-2915.v1.patch, YARN-5531-YARN-2915.v2.patch, > YARN-5531-YARN-2915.v3.patch, YARN-5531-YARN-2915.v4.patch, > YARN-5531-YARN-2915.v5.patch, YARN-5531-YARN-2915.v6.patch, > YARN-5531-YARN-2915.v7.patch, YARN-5531-YARN-2915.v8.patch, > YARN-5531-YARN-2915.v9.patch > > > One of the main tenets the YARN Federation is to *transparently* scale > applications across multiple clusters. This is achieved by running UAMs on > behalf of the application on other clusters. This JIRA tracks the addition of > a UnmanagedAM pool manager for federating application across clusters which > will be used the FederationInterceptor (YARN-3666) which is part of the > AMRMProxy pipeline introduced in YARN-2884. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6646) Modifier 'static' is redundant for inner enums less
[ https://issues.apache.org/jira/browse/YARN-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated YARN-6646: Description: Java enumeration type is a static constant, implicitly modified with static final,Modifier 'static' is redundant for inner enums less.So I suggest deleting the 'static' modifier. > Modifier 'static' is redundant for inner enums less > --- > > Key: YARN-6646 > URL: https://issues.apache.org/jira/browse/YARN-6646 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha3 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: YARN-6646.001.patch > > > Java enumeration type is a static constant, implicitly modified with static > final,Modifier 'static' is redundant for inner enums less.So I suggest > deleting the 'static' modifier. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5892) Capacity Scheduler: Support user-specific minimum user limit percent
[ https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025713#comment-16025713 ] Sunil G commented on YARN-5892: --- Thanks [~eepayne] bq.which then multiplies the value of userLimitResource by the appropriate user's weight before returning it I think I am fine here as we multiply with real weight of user, this will help to bring UL to correct value. Thanks for explaining in detail. I also have one more doubt now. {code} Resource userSpecificUserLimit = Resources.multiplyAndNormalizeUp(resourceCalculator, userLimitResource, weight, lQueue.getMinimumAllocation()); {code} I think we could use multiplyAndNormalizeDown here. I have 2 reasons for this. 1) Ideally we allow atleast one container (extra) for a user given UL is lesser. So it might be fine to use multiplyAndNormalizeDown given we are not breaking a valid use case. We do a > check here, not >= {code:title=LeafQueue#canAssignToUser} if (Resources.greaterThan(resourceCalculator, clusterResource, user.getUsed(nodePartition), limit)) { ... {code} 2) weight_user1=0.1, weight_user2=0.1. Now consider userLimitResource is some how 10GB and minimumAllocation is 4GB. In this case, both user1 and 2 will get UL as 4GB. This will help each user to get 2 containers each. I assume we have queue elasticity and other queue has some more resources. In this case, I feel we do not need to award a user with 2 containers, correct.? Please correct me if I am wrong. bq.I think the code is within locks everwhere it is used. Yes. I did check the code detail. We are fine here, below code was not having lock which confused me, but its caller has correct lock. {{UsersManager.addUser(String userName, User user)}} > Capacity Scheduler: Support user-specific minimum user limit percent > > > Key: YARN-5892 > URL: https://issues.apache.org/jira/browse/YARN-5892 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: Active users highlighted.jpg, YARN-5892.001.patch, > YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, > YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, > YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, > YARN-5892.012.patch, YARN-5892.013.patch > > > Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} > property is per queue. A cluster admin should be able to set the minimum user > limit percent on a per-user basis within the queue. > This functionality is needed so that when intra-queue preemption is enabled > (YARN-4945 / YARN-2113), some users can be deemed as more important than > other users, and resources from VIP users won't be as likely to be preempted. > For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user > {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed > 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like > this: > {code} > > > yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent > 25 > > > > yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent > 75 > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6649) RollingLevelDBTimelineServer throws RuntimeException if object decoding ever fails runtime exception
[ https://issues.apache.org/jira/browse/YARN-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-6649: -- Attachment: YARN-6649.1.patch > RollingLevelDBTimelineServer throws RuntimeException if object decoding ever > fails runtime exception > > > Key: YARN-6649 > URL: https://issues.apache.org/jira/browse/YARN-6649 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles >Priority: Critical > Attachments: YARN-6649.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025741#comment-16025741 ] Feng Yuan commented on YARN-6630: - Hi, wy {code}ContainerRetryPolicy{code} is configuarable,for example if you are using DistributeShell app you can set this by parameter:*--container_retry_policy*. IMO,{code}yarn.nodemanager.recovery.enabled=true{code} and {code}ContainerRetryPolicy= NEVER_RETRY{code} is not ambivalent. I think ContainerRetryPolicy is create to let app control which container should retry which not. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > Attachments: YARN-6630.001.patch > > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6630) Container worker dir could not recover when NM restart
[ https://issues.apache.org/jira/browse/YARN-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025741#comment-16025741 ] Feng Yuan edited comment on YARN-6630 at 5/26/17 3:11 AM: -- Hi, wy {code}ContainerRetryPolicy{code} is configuarable,for example if you are using DistributeShell app you can set this by parameter:*--container_retry_policy*. IMO,{code}yarn.nodemanager.recovery.enabled=true{code} and {code}ContainerRetryPolicy= NEVER_RETRY{code} is not ambivalent. I think ContainerRetryPolicy is create to let app control which container should retry which not. For example in ApplicationMaster assemble ContainerLaunchContext, can set this. was (Author: feng yuan): Hi, wy {code}ContainerRetryPolicy{code} is configuarable,for example if you are using DistributeShell app you can set this by parameter:*--container_retry_policy*. IMO,{code}yarn.nodemanager.recovery.enabled=true{code} and {code}ContainerRetryPolicy= NEVER_RETRY{code} is not ambivalent. I think ContainerRetryPolicy is create to let app control which container should retry which not. > Container worker dir could not recover when NM restart > -- > > Key: YARN-6630 > URL: https://issues.apache.org/jira/browse/YARN-6630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang > Attachments: YARN-6630.001.patch > > > When yarn.nodemanager.recovery.enabled is true and ContainerRetryPolicy is > NEVER_RETRY, container worker dir will not be saved in NM state store. > {code:title=ContainerLaunch.java} > ... > private void recordContainerWorkDir(ContainerId containerId, > String workDir) throws IOException{ > container.setWorkDir(workDir); > if (container.isRetryContextSet()) { > context.getNMStateStore().storeContainerWorkDir(containerId, workDir); > } > } > {code} > Then NM restarts, container.workDir is null, and may cause other exceptions. > {code:title=ContainerImpl.java} > static class ResourceLocalizedWhileRunningTransition > extends ContainerTransition { > ... > String linkFile = new Path(container.workDir, link).toString(); > ... > {code} > {code} > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:159) > at org.apache.hadoop.fs.Path.(Path.java:175) > at org.apache.hadoop.fs.Path.(Path.java:110) > ... ... > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6654) RollingLevelDBTimelineStore introduce minor backwards compatible change
Jonathan Eagles created YARN-6654: - Summary: RollingLevelDBTimelineStore introduce minor backwards compatible change Key: YARN-6654 URL: https://issues.apache.org/jira/browse/YARN-6654 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker There is a small minor backwards compatible change introduced while upgrading fst library from 2.24 to 2.50. {code} Exception in thread "main" java.io.IOException: java.lang.RuntimeException: unable to find class for code 83 at org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:243) at org.nustaq.serialization.FSTConfiguration.asObject(FSTConfiguration.java:1125) at org.nustaq.serialization.FSTNoJackson.main(FSTNoJackson.java:31) Caused by: java.lang.RuntimeException: unable to find class for code 83 at org.nustaq.serialization.FSTClazzNameRegistry.decodeClass(FSTClazzNameRegistry.java:180) at org.nustaq.serialization.coders.FSTStreamDecoder.readClass(FSTStreamDecoder.java:472) at org.nustaq.serialization.FSTObjectInput.readClass(FSTObjectInput.java:933) at org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:343) at org.nustaq.serialization.FSTObjectInput.readObjectInternal(FSTObjectInput.java:327) at org.nustaq.serialization.serializers.FSTArrayListSerializer.instantiate(FSTArrayListSerializer.java:63) at org.nustaq.serialization.FSTObjectInput.instantiateAndReadWithSer(FSTObjectInput.java:497) at org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:366) at org.nustaq.serialization.FSTObjectInput.readObjectInternal(FSTObjectInput.java:327) at org.nustaq.serialization.serializers.FSTMapSerializer.instantiate(FSTMapSerializer.java:78) at org.nustaq.serialization.FSTObjectInput.instantiateAndReadWithSer(FSTObjectInput.java:497) at org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:366) at org.nustaq.serialization.FSTObjectInput.readObjectInternal(FSTObjectInput.java:327) at org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:307) at org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:241) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6649) RollingLevelDBTimelineServer throws RuntimeException if object decoding ever fails runtime exception
[ https://issues.apache.org/jira/browse/YARN-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025779#comment-16025779 ] Hadoop QA commented on YARN-6649: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 11s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6649 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869983/YARN-6649.1.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2ec80658f3e2 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2b5ad48 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16027/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16027/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RollingLevelDBTimelineServer throws RuntimeException if object decoding ever > fails runtime exception > > > Key: YARN-6649 > URL: https://issues.apache.org/jira/browse/YARN-6649 > Project: Hadoop
[jira] [Updated] (YARN-6654) RollingLevelDBTimelineStore introduce minor backwards compatible change
[ https://issues.apache.org/jira/browse/YARN-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-6654: -- Attachment: YARN-6654.1.patch > RollingLevelDBTimelineStore introduce minor backwards compatible change > --- > > Key: YARN-6654 > URL: https://issues.apache.org/jira/browse/YARN-6654 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles >Priority: Blocker > Attachments: YARN-6654.1.patch > > > There is a small minor backwards compatible change introduced while upgrading > fst library from 2.24 to 2.50. > {code} > Exception in thread "main" java.io.IOException: java.lang.RuntimeException: > unable to find class for code 83 > at > org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:243) > at > org.nustaq.serialization.FSTConfiguration.asObject(FSTConfiguration.java:1125) > at org.nustaq.serialization.FSTNoJackson.main(FSTNoJackson.java:31) > Caused by: java.lang.RuntimeException: unable to find class for code 83 > at > org.nustaq.serialization.FSTClazzNameRegistry.decodeClass(FSTClazzNameRegistry.java:180) > at > org.nustaq.serialization.coders.FSTStreamDecoder.readClass(FSTStreamDecoder.java:472) > at > org.nustaq.serialization.FSTObjectInput.readClass(FSTObjectInput.java:933) > at > org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:343) > at > org.nustaq.serialization.FSTObjectInput.readObjectInternal(FSTObjectInput.java:327) > at > org.nustaq.serialization.serializers.FSTArrayListSerializer.instantiate(FSTArrayListSerializer.java:63) > at > org.nustaq.serialization.FSTObjectInput.instantiateAndReadWithSer(FSTObjectInput.java:497) > at > org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:366) > at > org.nustaq.serialization.FSTObjectInput.readObjectInternal(FSTObjectInput.java:327) > at > org.nustaq.serialization.serializers.FSTMapSerializer.instantiate(FSTMapSerializer.java:78) > at > org.nustaq.serialization.FSTObjectInput.instantiateAndReadWithSer(FSTObjectInput.java:497) > at > org.nustaq.serialization.FSTObjectInput.readObjectWithHeader(FSTObjectInput.java:366) > at > org.nustaq.serialization.FSTObjectInput.readObjectInternal(FSTObjectInput.java:327) > at > org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:307) > at > org.nustaq.serialization.FSTObjectInput.readObject(FSTObjectInput.java:241) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6111) Rumen input does't work in SLS
[ https://issues.apache.org/jira/browse/YARN-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025782#comment-16025782 ] YuJie Huang commented on YARN-6111: --- Ok, thank you very much! Is the latest Hadoop version ok in SLS? > Rumen input does't work in SLS > -- > > Key: YARN-6111 > URL: https://issues.apache.org/jira/browse/YARN-6111 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 > Environment: ubuntu14.0.4 os >Reporter: YuJie Huang >Assignee: Yufei Gu > Labels: test > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6111.001.patch > > > Hi guys, > I am trying to learn the use of SLS. > I would like to get the file realtimetrack.json, but this it only > contains "[]" at the end of a simulation. This is the command I use to > run the instance: > HADOOP_HOME $ bin/slsrun.sh --input-rumen=sample-data/2jobsmin-rumen-jh.json > --output-dir=sample-data > All other files, including metrics, appears to be properly populated.I can > also trace with web:http://localhost:10001/simulate > Can someone help? > Thanks -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6555) Store application flow context in NM state store for work-preserving restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6555: - Summary: Store application flow context in NM state store for work-preserving restart (was: Enable flow context read (& corresponding write) for recovering application with NM restart ) > Store application flow context in NM state store for work-preserving restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6555) Store application flow context in NM state store for work-preserving restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6555: - Fix Version/s: 3.0.0-alpha3 > Store application flow context in NM state store for work-preserving restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5531) UnmanagedAM pool manager for federating application across clusters
[ https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025798#comment-16025798 ] Hadoop QA commented on YARN-5531: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 46s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 58s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 56s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 33s{color} | {color:green} YARN-2915 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 5s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in YARN-2915 has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 51s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in YARN-2915 has 5 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s{color} | {color:green} YARN-2915 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 48 unchanged - 1 fixed = 48 total (was 49) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 44s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}134m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-5531 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869974/YARN-5531-YARN-2915.v14.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs
[jira] [Updated] (YARN-6555) Store application flow context in NM state store for work-preserving restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6555: - Fix Version/s: YARN-5355-branch-2 YARN-5355 > Store application flow context in NM state store for work-preserving restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 > > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6654) RollingLevelDBTimelineStore introduce minor backwards compatible change
[ https://issues.apache.org/jira/browse/YARN-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025811#comment-16025811 ] Hadoop QA commented on YARN-6654: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 26s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6654 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12869995/YARN-6654.1.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d507ff7f8d80 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 47474ff | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16028/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16028/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RollingLevelDBTimelineStore introduce minor backwards compatible change > --- > > Key: YARN-6654 > URL: https://issues.apache.org/jira/browse/YARN-6654 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jona
[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025813#comment-16025813 ] Hudson commented on YARN-6555: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11786 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11786/]) YARN-6555. Store application flow context in NM state store for (haibochen: rev 47474fffac085e0e5ea46336bf80ccd0677017a3) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java > Store application flow context in NM state store for work-preserving restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 > > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025840#comment-16025840 ] Rohith Sharma K S commented on YARN-6555: - [~haibo.chen] could we merge this into trunk? > Store application flow context in NM state store for work-preserving restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 > > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025843#comment-16025843 ] Haibo Chen commented on YARN-6555: -- Yes, I already committed this into trunk > Store application flow context in NM state store for work-preserving restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 > > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart
[ https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025849#comment-16025849 ] Rohith Sharma K S commented on YARN-6555: - cool.. thank you :-) > Store application flow context in NM state store for work-preserving restart > > > Key: YARN-6555 > URL: https://issues.apache.org/jira/browse/YARN-6555 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 >Reporter: Vrushali C >Assignee: Rohith Sharma K S > Labels: yarn-5355-merge-blocker > Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3 > > Attachments: YARN-6555.001.patch, YARN-6555.002.patch, > YARN-6555.003.patch > > > If timeline service v2 is enabled and NM is restarted with recovery enabled, > then NM fails to start and throws an error as "flow context can't be null". > This is happening because the flow context did not exist before but now that > timeline service v2 is enabled, ApplicationImpl expects it to exist. > This would also happen even if flow context existed before but since we are > not persisting it / reading it during > ContainerManagerImpl#recoverApplication, it does not get passed in to > ApplicationImpl. > full stack trace > {code} > 2017-05-03 21:51:52,178 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > java.lang.IllegalArgumentException: flow context cannot be null > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6111) Rumen input does't work in SLS
[ https://issues.apache.org/jira/browse/YARN-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025872#comment-16025872 ] Yufei Gu commented on YARN-6111: [~yoyo], SLS should work well on trunk. > Rumen input does't work in SLS > -- > > Key: YARN-6111 > URL: https://issues.apache.org/jira/browse/YARN-6111 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 > Environment: ubuntu14.0.4 os >Reporter: YuJie Huang >Assignee: Yufei Gu > Labels: test > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6111.001.patch > > > Hi guys, > I am trying to learn the use of SLS. > I would like to get the file realtimetrack.json, but this it only > contains "[]" at the end of a simulation. This is the command I use to > run the instance: > HADOOP_HOME $ bin/slsrun.sh --input-rumen=sample-data/2jobsmin-rumen-jh.json > --output-dir=sample-data > All other files, including metrics, appears to be properly populated.I can > also trace with web:http://localhost:10001/simulate > Can someone help? > Thanks -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org