[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549943#comment-14549943 ] Weiwei Yang commented on YARN-3601: --- I set a false flag so that HttpURLConnection does NOT automatically follow the redirect, this fixes too many redirections problem. (In the past it doesn't have this problem because there is a refresh time of 3 seconds so the client is still able to retrieve the redirect url from the http header). I am now able to retrieve redirection url from header field "Location", and null if there is no redirection. The overall logic is not changed, the test case is fixed now. > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549949#comment-14549949 ] Arun Suresh commented on YARN-3633: --- Thanks for the patch [~ragarwal], Assuming we allow, as per the patch, the first AM to be scheduled, then, as per the example you specified in the description, the AM will take up 3GB in an 5GB queue... presuming each worker task requires more resources that the AM (I am guessing this should be true for most cases), then no other task can be scheduled on that queue. and remaining queues are anyway log-jammed since the maxAMshare logic would kick in. Wondering if its a valid scenario.. > With Fair Scheduler, cluster can logjam when there are too many queues > -- > > Key: YARN-3633 > URL: https://issues.apache.org/jira/browse/YARN-3633 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Critical > Attachments: YARN-3633.patch > > > It's possible to logjam a cluster by submitting many applications at once in > different queues. > For example, let's say there is a cluster with 20GB of total memory. Let's > say 4 users submit applications at the same time. The fair share of each > queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most > 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the > cluster logjams. Nothing gets scheduled even when 20GB of resources are > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3654) ContainerLogsPage web UI should not have meta-refresh
[ https://issues.apache.org/jira/browse/YARN-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549967#comment-14549967 ] Hadoop QA commented on YARN-3654: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 36s | The applied patch generated 2 new checkstyle issues (total was 12, now 13). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 4s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 6s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 42m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733729/YARN-3654.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7991/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7991/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7991/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7991/console | This message was automatically generated. > ContainerLogsPage web UI should not have meta-refresh > - > > Key: YARN-3654 > URL: https://issues.apache.org/jira/browse/YARN-3654 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3654.1.patch, YARN-3654.2.patch > > > Currently, When we try to find the container logs for the finished > application, it will re-direct to the url which we re-configured for > yarn.log.server.url in yarn-site.xml. But in ContainerLogsPage, we are using > meta-refresh: > {code} > set(TITLE, join("Redirecting to log server for ", $(CONTAINER_ID))); > html.meta_http("refresh", "1; url=" + redirectUrl); > {code} > which is not good for some browsers which need to enable the meta-refresh in > their security setting, especially for IE which meta-refresh is considered a > security hole. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549983#comment-14549983 ] Hadoop QA commented on YARN-3601: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 19s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 50s | Tests passed in hadoop-yarn-client. | | | | 23m 9s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733727/YARN-3601.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 93972a3 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7992/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7992/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7992/console | This message was automatically generated. > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raju Bairishetti updated YARN-3646: --- Attachment: YARN-3646.patch > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xia Hu updated YARN-3126: - Attachment: resourcelimit-test.patch Add a unit test for this patch. > FairScheduler: queue's usedResource is always more than the maxResource limit > - > > Key: YARN-3126 > URL: https://issues.apache.org/jira/browse/YARN-3126 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.3.0 > Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. >Reporter: Xia Hu > Labels: BB2015-05-TBR, assignContainer, fairscheduler, resources > Fix For: trunk-win > > Attachments: resourcelimit-02.patch, resourcelimit-test.patch, > resourcelimit.patch > > > When submitting spark application(both spark-on-yarn-cluster and > spark-on-yarn-cleint model), the queue's usedResources assigned by > fairscheduler always can be more than the queue's maxResources limit. > And by reading codes of fairscheduler, I suppose this issue happened because > of ignore to check the request resources when assign Container. > Here is the detail: > 1. choose a queue. In this process, it will check if queue's usedResource is > bigger than its max, with assignContainerPreCheck. > 2. then choose a app in the certain queue. > 3. then choose a container. And here is the question, there is no check > whether this container would make the queue sources over its max limit. If a > queue's usedResource is 13G, the maxResource limit is 16G, then a container > which asking for 4G resources may be assigned successful. > This problem will always happen in spark application, cause we can ask for > different container resources in different applications. > By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550007#comment-14550007 ] Xia Hu commented on YARN-3126: -- I have submitted a unit test just now, review it again, thx~ > FairScheduler: queue's usedResource is always more than the maxResource limit > - > > Key: YARN-3126 > URL: https://issues.apache.org/jira/browse/YARN-3126 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.3.0 > Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. >Reporter: Xia Hu > Labels: BB2015-05-TBR, assignContainer, fairscheduler, resources > Fix For: trunk-win > > Attachments: resourcelimit-02.patch, resourcelimit-test.patch, > resourcelimit.patch > > > When submitting spark application(both spark-on-yarn-cluster and > spark-on-yarn-cleint model), the queue's usedResources assigned by > fairscheduler always can be more than the queue's maxResources limit. > And by reading codes of fairscheduler, I suppose this issue happened because > of ignore to check the request resources when assign Container. > Here is the detail: > 1. choose a queue. In this process, it will check if queue's usedResource is > bigger than its max, with assignContainerPreCheck. > 2. then choose a app in the certain queue. > 3. then choose a container. And here is the question, there is no check > whether this container would make the queue sources over its max limit. If a > queue's usedResource is 13G, the maxResource limit is 16G, then a container > which asking for 4G resources may be assigned successful. > This problem will always happen in spark application, cause we can ask for > different container resources in different applications. > By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550008#comment-14550008 ] Rohit Agarwal commented on YARN-3633: - Other non-AM containers can be scheduled in the queue - unlike the maxAMShare limit, the fair share is not a hard limit. So, the FS will schedule non-AM containers in this queue when it cannot schedule AM containers in other queues. I gave a walkthrough in this comment: https://issues.apache.org/jira/browse/YARN-3633?focusedCommentId=14542895&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542895 > With Fair Scheduler, cluster can logjam when there are too many queues > -- > > Key: YARN-3633 > URL: https://issues.apache.org/jira/browse/YARN-3633 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Critical > Attachments: YARN-3633.patch > > > It's possible to logjam a cluster by submitting many applications at once in > different queues. > For example, let's say there is a cluster with 20GB of total memory. Let's > say 4 users submit applications at the same time. The fair share of each > queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most > 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the > cluster logjams. Nothing gets scheduled even when 20GB of resources are > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3677) Fix findbugs warnings in FileSystemRMStateStore.java
Akira AJISAKA created YARN-3677: --- Summary: Fix findbugs warnings in FileSystemRMStateStore.java Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Priority: Minor There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in FileSystemRMStateStore.java
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550015#comment-14550015 ] Akira AJISAKA commented on YARN-3677: - setIsHDFS method should be synchronized. {code} @VisibleForTesting void setIsHDFS(boolean isHDFS) { this.isHDFS = isHDFS; } {code} Looks like this issue is caused by commit 9a2a95 but there is no issue id in the commit message. Hi [~vinodkv], would you point the jira related to the commit? > Fix findbugs warnings in FileSystemRMStateStore.java > > > Key: YARN-3677 > URL: https://issues.apache.org/jira/browse/YARN-3677 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Akira AJISAKA >Priority: Minor > Labels: newbie > > There is 1 findbugs warning in FileSystemRMStateStore.java. > {noformat} > Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of > time > Unsynchronized access at FileSystemRMStateStore.java: [line 156] > Field > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS > Synchronized 66% of the time > Synchronized access at FileSystemRMStateStore.java: [line 148] > Synchronized access at FileSystemRMStateStore.java: [line 859] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in FileSystemRMStateStore.java
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550051#comment-14550051 ] Tsuyoshi Ozawa commented on YARN-3677: -- [~ajisakaa] thank you for finding the issue. The commit message says that the contribution is done by [~asuresh]. I think we should revert the change if the JIRA has not been opened yet - we should discuss the point. IMHO, we shouldn't switch the behaviour based on whether HDFS is used or not without the special reason. > Fix findbugs warnings in FileSystemRMStateStore.java > > > Key: YARN-3677 > URL: https://issues.apache.org/jira/browse/YARN-3677 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Akira AJISAKA >Priority: Minor > Labels: newbie > > There is 1 findbugs warning in FileSystemRMStateStore.java. > {noformat} > Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of > time > Unsynchronized access at FileSystemRMStateStore.java: [line 156] > Field > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS > Synchronized 66% of the time > Synchronized access at FileSystemRMStateStore.java: [line 148] > Synchronized access at FileSystemRMStateStore.java: [line 859] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550092#comment-14550092 ] Hadoop QA commented on YARN-3646: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 1s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 2s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 17s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 63m 53s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733743/YARN-3646.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 93972a3 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7994/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7994/console | This message was automatically generated. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationRepo
[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550119#comment-14550119 ] Hadoop QA commented on YARN-3126: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 44s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 16s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 60m 19s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 77m 34s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733746/resourcelimit-test.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 93972a3 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7993/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7993/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7993/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7993/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7993/console | This message was automatically generated. > FairScheduler: queue's usedResource is always more than the maxResource limit > - > > Key: YARN-3126 > URL: https://issues.apache.org/jira/browse/YARN-3126 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.3.0 > Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. >Reporter: Xia Hu > Labels: BB2015-05-TBR, assignContainer, fairscheduler, resources > Fix For: trunk-win > > Attachments: resourcelimit-02.patch, resourcelimit-test.patch, > resourcelimit.patch > > > When submitting spark application(both spark-on-yarn-cluster and > spark-on-yarn-cleint model), the queue's usedResources assigned by > fairscheduler always can be more than the queue's maxResources limit. > And by reading codes of fairscheduler, I suppose this issue happened because > of ignore to check the request resources when assign Container. > Here is the detail: > 1. choose a queue. In this process, it will check if queue's usedResource is > bigger than its max, with assignContainerPreCheck. > 2. then choose a app in the certain queue. > 3. then choose a container. And here is the question, there is no check > whether this container would make the queue sources over its max limit. If a > queue's usedResource is 13G, the maxResource limit is 16G, then a container > which asking for 4G resources may be assigned successful. > This problem will always happen in spark application, cause we can ask for > different container resources in different applications. > By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2821: Attachment: YARN-2821.005.patch Uploaded 005.patch which adds the tests requested by [~jianhe]. > Distributed shell app master becomes unresponsive sometimes > --- > > Key: YARN-2821 > URL: https://issues.apache.org/jira/browse/YARN-2821 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-2821.002.patch, YARN-2821.003.patch, > YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, > apache-yarn-2821.1.patch > > > We've noticed that once in a while the distributed shell app master becomes > unresponsive and is eventually killed by the RM. snippet of the logs - > {noformat} > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: > appattempt_1415123350094_0017_01 received 0 previous attempts' running > containers on AM registration. > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez2:45454 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_02, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > QUERY_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez3:45454 > 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez4:45454 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=3 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_03, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_04, > containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_05, > containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_03 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_05 > 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_04 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1415123350094_0017_01_05 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1415123350094_0017_01_03 > 14/11/04 18:21:39 INFO impl.Contai
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Attachment: YARN-41-5.patch I am attaching patch as per latest source code and also with the above comments fix. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550216#comment-14550216 ] Hadoop QA commented on YARN-2821: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 18s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 56s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 42m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733765/YARN-2821.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eb4c9dd | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7995/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7995/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7995/console | This message was automatically generated. > Distributed shell app master becomes unresponsive sometimes > --- > > Key: YARN-2821 > URL: https://issues.apache.org/jira/browse/YARN-2821 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-2821.002.patch, YARN-2821.003.patch, > YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, > apache-yarn-2821.1.patch > > > We've noticed that once in a while the distributed shell app master becomes > unresponsive and is eventually killed by the RM. snippet of the logs - > {noformat} > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: > appattempt_1415123350094_0017_01 received 0 previous attempts' running > containers on AM registration. > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[]Priority[0] > 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : > onprem-tez2:45454 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from > RM for container ask, allocatedCnt=1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell > command on a new container., > containerId=container_1415123350094_0017_01_02, > containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, > containerResourceMemory1024, containerResourceVirtualCores1 > 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up > container launch container for > containerid=container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > START_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : > onprem-tez2:45454 > 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: > QUERY_CONTAINER for Container container_1415123350094_0017_01_02 > 14/11/04 18:21:
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550233#comment-14550233 ] Rohith commented on YARN-3646: -- bq. Seems we do not even require exceptionToPolicy for FOREVER policy if we catch the exception in shouldRetry method. make sense to me,will reveiw the patch, thanks > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550256#comment-14550256 ] Rohith commented on YARN-3646: -- Thanks for working on this issue.. The patch overall looks good to me. nit : Can the test moved to Yarn package since issue is in Yarn? Otherwise if there is any changed in the RMProxy, test will not run. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550258#comment-14550258 ] Rohith commented on YARN-3646: -- And I verified in one node cluster by enabling and disabling retryforever policy. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3543: - Attachment: 0004-YARN-3543.patch > ApplicationReport should be able to tell whether the Application is AM > managed or not. > --- > > Key: YARN-3543 > URL: https://issues.apache.org/jira/browse/YARN-3543 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: Spandan Dutta >Assignee: Rohith > Labels: BB2015-05-TBR > Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, > 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, > 0003-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG > > > Currently we can know whether the application submitted by the user is AM > managed from the applicationSubmissionContext. This can be only done at the > time when the user submits the job. We should have access to this info from > the ApplicationReport as well so that we can check whether an app is AM > managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550268#comment-14550268 ] Hudson commented on YARN-3541: -- FAILURE: Integrated in Hadoop-Yarn-trunk #932 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/932/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java > Add version info on timeline service / generic history web UI and REST API > -- > > Key: YARN-3541 > URL: https://issues.apache.org/jira/browse/YARN-3541 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.8.0 > > Attachments: YARN-3541.1.patch, YARN-3541.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550288#comment-14550288 ] Raju Bairishetti commented on YARN-3646: Thanks [~rohithsharma] for the review. Looks like it is mainly an issue with retry policy. > Applications are getting stuck some times in case of retry policy forever > - > > Key: YARN-3646 > URL: https://issues.apache.org/jira/browse/YARN-3646 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Reporter: Raju Bairishetti > Attachments: YARN-3646.patch > > > We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER > retry policy. > Yarn client is infinitely retrying in case of exceptions from the RM as it is > using retrying policy as FOREVER. The problem is it is retrying for all kinds > of exceptions (like ApplicationNotFoundException), even though it is not a > connection failure. Due to this my application is not progressing further. > *Yarn client should not retry infinitely in case of non connection failures.* > We have written a simple yarn-client which is trying to get an application > report for an invalid or older appId. ResourceManager is throwing an > ApplicationNotFoundException as this is an invalid or older appId. But > because of retry policy FOREVER, client is keep on retrying for getting the > application report and ResourceManager is throwing > ApplicationNotFoundException continuously. > {code} > private void testYarnClientRetryPolicy() throws Exception{ > YarnConfiguration conf = new YarnConfiguration(); > conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > -1); > YarnClient yarnClient = YarnClient.createYarnClient(); > yarnClient.init(conf); > yarnClient.start(); > ApplicationId appId = ApplicationId.newInstance(1430126768987L, > 10645); > ApplicationReport report = yarnClient.getApplicationReport(appId); > } > {code} > *RM logs:* > {noformat} > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875162 Retry#0 > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application > with id 'application_1430126768987_10645' doesn't exist in RM. > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > > 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport > from 10.14.120.231:61621 Call#875163 Retry#0 > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550299#comment-14550299 ] Hudson commented on YARN-3541: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/201/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java > Add version info on timeline service / generic history web UI and REST API > -- > > Key: YARN-3541 > URL: https://issues.apache.org/jira/browse/YARN-3541 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.8.0 > > Attachments: YARN-3541.1.patch, YARN-3541.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550314#comment-14550314 ] Hadoop QA commented on YARN-41: --- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 30s | The applied patch generated 18 new checkstyle issues (total was 15, now 33). | | {color:green}+1{color} | whitespace | 0m 15s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 45s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 5m 57s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 49m 59s | Tests passed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 53s | Tests passed in hadoop-yarn-server-tests. | | | | 99m 18s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733771/YARN-41-5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eb4c9dd | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7996/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7996/console | This message was automatically generated. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Attachment: (was: YARN-41-5.patch) > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Attachment: YARN-41-5.patch > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated YARN-3630: -- Attachment: YARN-3630.001.patch.patch Initial patch with adaptive heartbeat policy unimplemented. If we determine to implement a good enough adaptive heartbeat policy, this jira would depend YARN-3652, where we have enough information of the scheduler's load to determine the heartbeat interval. > YARN should suggest a heartbeat interval for applications > - > > Key: YARN-3630 > URL: https://issues.apache.org/jira/browse/YARN-3630 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, scheduler >Affects Versions: 2.7.0 >Reporter: Zoltán Zvara >Assignee: Xianyin Xin >Priority: Minor > Attachments: YARN-3630.001.patch.patch > > > It seems currently applications - for example Spark - are not adaptive to RM > regarding heartbeat intervals. RM should be able to suggest a desired > heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Attachment: YARN-41-6.patch Updated the patch checkstyle fixes. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550380#comment-14550380 ] Devaraj K edited comment on YARN-41 at 5/19/15 12:53 PM: - Updated the patch with checkstyle fixes. was (Author: devaraj.k): Updated the patch checkstyle fixes. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3591: -- Target Version/s: 2.8.0 (was: 2.7.1) Affects Version/s: (was: 2.6.0) 2.7.0 > Resource Localisation on a bad disk causes subsequent containers failure > - > > Key: YARN-3591 > URL: https://issues.apache.org/jira/browse/YARN-3591 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, > YARN-3591.2.patch > > > It happens when a resource is localised on the disk, after localising that > disk has gone bad. NM keeps paths for localised resources in memory. At the > time of resource request isResourcePresent(rsrc) will be called which calls > file.exists() on the localised path. > In some cases when disk has gone bad, inodes are stilled cached and > file.exists() returns true. But at the time of reading, file will not open. > Note: file.exists() actually calls stat64 natively which returns true because > it was able to find inode information from the OS. > A proposal is to call file.list() on the parent path of the resource, which > will call open() natively. If the disk is good it should return an array of > paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3605) _ as method name may not be supported much longer
[ https://issues.apache.org/jira/browse/YARN-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned YARN-3605: --- Assignee: Devaraj K > _ as method name may not be supported much longer > - > > Key: YARN-3605 > URL: https://issues.apache.org/jira/browse/YARN-3605 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Robert Joseph Evans >Assignee: Devaraj K > > I was trying to run the precommit test on my mac under JDK8, and I got the > following error related to javadocs. > > "(use of '_' as an identifier might not be supported in releases after Java > SE 8)" > It looks like we need to at least change the method name to not be '_' any > more, or possibly replace the HTML generation with something more standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3591: -- Attachment: YARN-3591.3.patch > Resource Localisation on a bad disk causes subsequent containers failure > - > > Key: YARN-3591 > URL: https://issues.apache.org/jira/browse/YARN-3591 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, > YARN-3591.2.patch, YARN-3591.3.patch > > > It happens when a resource is localised on the disk, after localising that > disk has gone bad. NM keeps paths for localised resources in memory. At the > time of resource request isResourcePresent(rsrc) will be called which calls > file.exists() on the localised path. > In some cases when disk has gone bad, inodes are stilled cached and > file.exists() returns true. But at the time of reading, file will not open. > Note: file.exists() actually calls stat64 natively which returns true because > it was able to find inode information from the OS. > A proposal is to call file.list() on the parent path of the resource, which > will call open() natively. If the disk is good it should return an array of > paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3678) DelayedProcessKiller may kill other process other than container
gu-chi created YARN-3678: Summary: DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550390#comment-14550390 ] gu-chi commented on YARN-3678: -- I think if decrease the max_pid setting in OS can enlarge the possibility of reproducing, working on > DelayedProcessKiller may kill other process other than container > > > Key: YARN-3678 > URL: https://issues.apache.org/jira/browse/YARN-3678 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: gu-chi >Priority: Critical > > Suppose one container finished, then it will do clean up, the PID file still > exist and will trigger once singalContainer, this will kill the process with > the pid in PID file, but as container already finished, so this PID may be > occupied by other process, this may cause serious issue. > As I know, my NM was killed unexpectedly, what I described can be the cause. > Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets
[ https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550447#comment-14550447 ] Mit Desai commented on YARN-3624: - Filed YARN-2679 > ApplicationHistoryServer reverses the order of the filters it gets > -- > > Key: YARN-3624 > URL: https://issues.apache.org/jira/browse/YARN-3624 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-3624.patch > > > AppliactionHistoryServer should not alter the order in which it gets the > filter chain. Additional filters should be added at the end of the chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3679) Add documentation for timeline server filter ordering
Mit Desai created YARN-3679: --- Summary: Add documentation for timeline server filter ordering Key: YARN-3679 URL: https://issues.apache.org/jira/browse/YARN-3679 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Currently the auth filter is before static user filter by default. After YARN-3624, the filter order is no longer reversed. So the pseudo auth's allowing anonymous config is useless with both filters loaded in the new order, because static user will be created before presenting it to auth filter. The user can remove static user filter from the config to get anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned YARN-3679: --- Assignee: Mit Desai > Add documentation for timeline server filter ordering > - > > Key: YARN-3679 > URL: https://issues.apache.org/jira/browse/YARN-3679 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mit Desai >Assignee: Mit Desai > > Currently the auth filter is before static user filter by default. After > YARN-3624, the filter order is no longer reversed. So the pseudo auth's > allowing anonymous config is useless with both filters loaded in the new > order, because static user will be created before presenting it to auth > filter. The user can remove static user filter from the config to get > anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550462#comment-14550462 ] Sietse T. Au commented on YARN-1902: All solutions will still be workarounds unless the protocol is revised. Another workaround would be to keep track of the requests by counting the number of requested containers and not sending new container requests to RM until the previous batch has been satisfied. Consider the following scenario in the following order: 1. addContainerRequest is called n times and at each call the expectedContainers counter is incremented, the container request is added to a list of currentContainerRequests. 2. allocate is called, a boolean waitingForResponse is set to true when ask.size > 0 which indicates container requests have been made. 3. addContainerRequest is called m times, since waitingForResponse is true, the request will be added to a list of queuedContainerRequests, the asks will be added to asksQueue and not asks. 4. allocate is called, n - 1 containers are returned, expectedContainers will be decremented by n - 1. 5. allocate is called again, 1 container is returned, expectedContainers will be 0, waitingForResponse is set to false, for each currentContainerRequest removeContainerRequest, currentContainerRequests = queuedContainerRequests, asks = asksQueue, expectedContainers = queuedContainerRequests.size 6. allocate is called and (3) will be submitted. Here, the satisfied container requests will be correctly removed from the table without user intervention and seems to apply to common use cases, excess containers now will only happen when containerRequest is removed after an allocate. But since there is no guarantee that it will be removed in time at the RM, it doesn't seem to be very significant. One problem here is that the expectedContainers will be invalid when you do the following: blacklist all the possible nodes, add container request, allocate, remove blacklist, add container request, allocate. This would make the client wait forever for a response of the first request as it will never be satisfied. I'm not sure what else can be done by users apart from extending the AMRMClientImpl to fit their use case. > Allocation of too many containers when a second request is done with the same > resource capability > - > > Key: YARN-1902 > URL: https://issues.apache.org/jira/browse/YARN-1902 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Sietse T. Au >Assignee: Sietse T. Au > Labels: client > Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch > > > Regarding AMRMClientImpl > Scenario 1: > Given a ContainerRequest x with Resource y, when addContainerRequest is > called z times with x, allocate is called and at least one of the z allocated > containers is started, then if another addContainerRequest call is done and > subsequently an allocate call to the RM, (z+1) containers will be allocated, > where 1 container is expected. > Scenario 2: > No containers are started between the allocate calls. > Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) > are requested in both scenarios, but that only in the second scenario, the > correct behavior is observed. > Looking at the implementation I have found that this (z+1) request is caused > by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any > information about whether a request has been sent to the RM yet or not. > There are workarounds for this, such as releasing the excess containers > received. > The solution implemented is to initialize a new ResourceRequest in > ResourceRequestInfo when a request has been successfully sent to the RM. > The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets
[ https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550479#comment-14550479 ] Mit Desai commented on YARN-3624: - Correction: YARN-3679 > ApplicationHistoryServer reverses the order of the filters it gets > -- > > Key: YARN-3624 > URL: https://issues.apache.org/jira/browse/YARN-3624 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-3624.patch > > > AppliactionHistoryServer should not alter the order in which it gets the > filter chain. Additional filters should be added at the end of the chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3680) Graceful queue capacity reclaim without KilledTaskAttempts
Hari Sekhon created YARN-3680: - Summary: Graceful queue capacity reclaim without KilledTaskAttempts Key: YARN-3680 URL: https://issues.apache.org/jira/browse/YARN-3680 Project: Hadoop YARN Issue Type: Improvement Components: applications, capacityscheduler, resourcemanager, scheduler Affects Versions: 2.6.0 Environment: HDP 2.2.4 Reporter: Hari Sekhon Request to allow graceful reclaim of queue resources by waiting until running containers finish naturally rather than killing them. For example if you were to dynamically reconfigure Yarn queue capacity/maximum-capacity decreasing one queue, then containers in that queue start getting killed (and pre-emption is not configured on this cluster) - instead of containers being allowed to finish naturally and just having those freed resources no longer be available for new tasks of that job. This is relevant if there are non-idempotent changes being done by a task that can cause issues if the task is half competed and then run task killed and re-run from the beginning later. For example I bulk index to Elasticsearch with uniquely generated IDs since the source data doesn't have any key or even compound key that is unique. This means if a task sends half it's data and then is killed and starts again it introduces a large number of duplicates into the ES index without any mechanism to dedupe later other than rebuilding the entire index from scratch which is hundreds of millions of docs multiplied by many many indices. I appreciate this is a serious request and could cause problems with long running services never returning their resources... so there needs to be some kind of interaction of variables or similar to separate the indefinitely running tasks for long lived services from the finite-runtime analytic job tasks with some sort of time-based safety cut off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550500#comment-14550500 ] Hudson commented on YARN-3541: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2130 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2130/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java > Add version info on timeline service / generic history web UI and REST API > -- > > Key: YARN-3541 > URL: https://issues.apache.org/jira/browse/YARN-3541 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.8.0 > > Attachments: YARN-3541.1.patch, YARN-3541.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550529#comment-14550529 ] Hudson commented on YARN-3541: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #190 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/190/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java > Add version info on timeline service / generic history web UI and REST API > -- > > Key: YARN-3541 > URL: https://issues.apache.org/jira/browse/YARN-3541 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.8.0 > > Attachments: YARN-3541.1.patch, YARN-3541.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550558#comment-14550558 ] Hadoop QA commented on YARN-41: --- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 16s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 49s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 5m 59s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 50m 15s | Tests passed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 51s | Tests passed in hadoop-yarn-server-tests. | | | | 99m 0s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733802/YARN-41-6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / de30d66 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7998/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7998/console | This message was automatically generated. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550566#comment-14550566 ] Hadoop QA commented on YARN-3591: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 51s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 37s | The applied patch generated 3 new checkstyle issues (total was 174, now 177). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 4s | The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 10s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 42m 40s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | | File.separator used for regular expression in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.isParent(String, String) At LocalResourcesTrackerImpl.java:in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.isParent(String, String) At LocalResourcesTrackerImpl.java:[line 483] | | | File.separator used for regular expression in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.isParent(String, String) At LocalResourcesTrackerImpl.java:in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.isParent(String, String) At LocalResourcesTrackerImpl.java:[line 484] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733804/YARN-3591.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / de30d66 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7999/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7999/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7999/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7999/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7999/console | This message was automatically generated. > Resource Localisation on a bad disk causes subsequent containers failure > - > > Key: YARN-3591 > URL: https://issues.apache.org/jira/browse/YARN-3591 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, > YARN-3591.2.patch, YARN-3591.3.patch > > > It happens when a resource is localised on the disk, after localising that > disk has gone bad. NM keeps paths for localised resources in memory. At the > time of resource request isResourcePresent(rsrc) will be called which calls > file.exists() on the localised path. > In some cases when disk has gone bad, inodes are stilled cached and > file.exists() returns true. But at the time of reading, file will not open. > Note: file.exists() actually calls stat64 natively which returns true because > it was able to find inode information from the OS. > A proposal is to call file.list() on the parent path of the resource, which > will call open() natively. If the disk is good it should return an array of > paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-3679: Attachment: YARN-3679.patch [~jeagles], [~zjshen], can you take a look on the patch? > Add documentation for timeline server filter ordering > - > > Key: YARN-3679 > URL: https://issues.apache.org/jira/browse/YARN-3679 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-3679.patch > > > Currently the auth filter is before static user filter by default. After > YARN-3624, the filter order is no longer reversed. So the pseudo auth's > allowing anonymous config is useless with both filters loaded in the new > order, because static user will be created before presenting it to auth > filter. The user can remove static user filter from the config to get > anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550589#comment-14550589 ] Hudson commented on YARN-3541: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #200 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/200/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java > Add version info on timeline service / generic history web UI and REST API > -- > > Key: YARN-3541 > URL: https://issues.apache.org/jira/browse/YARN-3541 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.8.0 > > Attachments: YARN-3541.1.patch, YARN-3541.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550614#comment-14550614 ] Hadoop QA commented on YARN-3679: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 2m 54s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 55s | Site still builds. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 6m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733835/YARN-3679.patch | | Optional Tests | site | | git revision | trunk / de30d66 | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8000/console | This message was automatically generated. > Add documentation for timeline server filter ordering > - > > Key: YARN-3679 > URL: https://issues.apache.org/jira/browse/YARN-3679 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-3679.patch > > > Currently the auth filter is before static user filter by default. After > YARN-3624, the filter order is no longer reversed. So the pseudo auth's > allowing anonymous config is useless with both filters loaded in the new > order, because static user will be created before presenting it to auth > filter. The user can remove static user filter from the config to get > anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550617#comment-14550617 ] Hadoop QA commented on YARN-3543: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 6s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 14 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 1s | The applied patch generated 1 new checkstyle issues (total was 14, now 14). | | {color:green}+1{color} | whitespace | 0m 11s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 7m 14s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 103m 57s | Tests passed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 29s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 54s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 5s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 17s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 29s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 49m 58s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 213m 56s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestRM | | | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.webapp.TestAppPage | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions | | | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733786/0004-YARN-3543.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eb4c9dd | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7997/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7997/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/7997/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7997/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7997/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7997/artifact
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550625#comment-14550625 ] Hudson commented on YARN-3541: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2148 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2148/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java > Add version info on timeline service / generic history web UI and REST API > -- > > Key: YARN-3541 > URL: https://issues.apache.org/jira/browse/YARN-3541 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.8.0 > > Attachments: YARN-3541.1.patch, YARN-3541.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550660#comment-14550660 ] Vrushali C commented on YARN-3411: -- Hi [~zjshen] Thanks for the review! bq I saw in HBase implementation flow version is not included as part of row key. This is a bit different from primary key design of Phoenix implementation. Would you mind elaborating your rationale a bit? Yes, I think the flow version need not be part of the primary key. A flow can be uniquely identified with the flow name and run id (and of course cluster and user id). Given a run id, we can determine the version. For production jobs, the version does not change, so we would be repeating it across runs. I haven’t looked into the Phoenix schema to understand why it is needed on the Phoenix side. cc [~gtCarrera9] bq Shall we make the constants in TimelineEntitySchemaConstants follow Hadoop convention? We can keep them in this class now. Once we decide to move on with HBase impl, we should move (some of) them into YarnConfiguration as API. Yes, I did not add them to YarnConfiguration as API since I figured it may be cleaner to keep this code contained within timelineservice.storage to help remove it if needed. But will rename them as per Hadoop convention. bq. In fact, you can leave these classes not annotated. I see, I had added the annotations for these classes after some of the review suggestions, I think from @sjlee. bq. According to TimelineSchemaCreator, we need to run command line to create the table when we setup the backend, right? Can we include creating the table into the lifecycle of HBaseTimelineWriterImpl? Hmm, so schema creation happens more or less once in the lifetime of the hbase cluster like during cluster setup (or perhaps if we decide to drop and recreate it, which is rare in production). I believe writers will come to life and cease to exist with each yarn application lifecycle but cluster is more or less eternal, so adding to this step to the lifecycle of a Writer Impl object seems somewhat out of place to me. Appreciate the review! thanks Vrushali > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3583: -- Attachment: 0004-YARN-3583.patch Thank you [~leftnoteasy] for the comments. I updated the patch addressing the same. > Support of NodeLabel object instead of plain String in YarnClient side. > --- > > Key: YARN-3583 > URL: https://issues.apache.org/jira/browse/YARN-3583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, > 0003-YARN-3583.patch, 0004-YARN-3583.patch > > > Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. > getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of > using plain label name. > This will help to bring other label details such as Exclusivity to client > side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550668#comment-14550668 ] Devaraj K commented on YARN-41: --- {code:xml} Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] {code} It is unrelated to the patch, YARN-3677 exists to track this findbugs issue. > The RM should handle the graceful shutdown of the NM. > - > > Key: YARN-41 > URL: https://issues.apache.org/jira/browse/YARN-41 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Ravi Teja Ch N V >Assignee: Devaraj K > Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, > MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, > YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch > > > Instead of waiting for the NM expiry, RM should remove and handle the NM, > which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550686#comment-14550686 ] Sangjin Lee commented on YARN-3411: --- Regarding the public/private annotations, I often got beat it to the head that in principle the private annotation is opt-in; i.e. if there is no visibility annotation it's implicitly assumed to be up for use. I've gotten review comments that said we should mark them explicitly as private even if they are clearly YARN-internal classes. That's just my experience on this. What is the official recommendation on this? cc [~vinodkv], [~kasha] > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550688#comment-14550688 ] Junping Du commented on YARN-3411: -- bq. I was thinking it is not necessary since the entity information would come in a more streaming fashion, one update at a time anyways. If say, one column is written and other is not, the callee can retry again, hbase put will simply over-write existing value. It sounds reasonable. Let's keep it so far and check if necessary to change for some corner cases (like we could update event and metrics at the same time for some job's final counter) in future. Thanks for addressing my previous comments. Latest patch looks much closer! Some additional comments (besides Zhijie's comments above) on latest (006) patch: In TimelineCollectorManager.java, {code} + @Override + protected void serviceStop() throws Exception { +super.serviceStop(); +if (writer != null) { + writer.close(); +} + } {code} We should stop running collectors before stopping the shared writer. Also, put super.serviceStop(); to the last one to stop as conforming the common practice. In EntityColumnFamilyDetails.java, {code} + * @param rowKey + * @param entityTable + * @param inputValue + * @throws IOException + */ + public void store(byte[] rowKey, BufferedMutator entityTable, String key, + String inputValue) throws IOException { {code} Lacking a param of key in javadoc. In HBaseTimelineWriterImpl.java, For write() which is synchronized semantics so far (we haven't implemented async yet), we put each kind of entity to the table by calling entityTable.mutate(...) which cache these entities locally and flush it later under some conditions. Do we need to call entityTable.flush() for semantics of strict synchronized writing? If not, at least, we should flush it at serviceStop() as close() directly could lose some cached data. {code} @Override protected void serviceStop() throws Exception { super.serviceStop(); if (entityTable != null) { LOG.info("closing entity table"); entityTable.close(); } if (conn != null) { LOG.info("closing the hbase Connection"); conn.close(); } } {code} Also, as comments above, put super.serviceStop() as the last one to stop. In Range.java, {code} +@InterfaceAudience.Private +@InterfaceStability.Unstable +public class Range { {code} For class marked as private, we don't necessary to put InterfaceStability annotation there. In TimelineWriterUtils.java, {code} +for (byte[] comp : components) { + finalSize += comp.length; +} {code} Null check for comp. {code} + System.arraycopy(components[i], 0, buf, offset, components[i].length); + offset += components[i].length; {code} Null check for components[i]. {code} + * @param source + * @param separator + * @return byte[][] after splitting the input source + */ + public static byte[][] split(byte[] source, byte[] separator, int limit) { {code} Missing param of limit in javadoc. It sounds we didn't do any null check for separator in splitRanges(), but we do null check in join(). It should keep consistent at least in one class. {code} + public static long getValueAsLong(final byte[] key, + final Map taskValues) throws IOException { +byte[] value = taskValues.get(key); +if (value != null) { + Number val = (Number) GenericObjectMapper.read(value); + return val.longValue(); +} else { + return 0L; +} + } {code} Shall we use Long instead of long? Or we cannot diff value with null and real 0. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usab
[jira] [Created] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
Sumana Sathish created YARN-3681: Summary: yarn cmd says "could not find main class 'queue'" in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Priority: Critical Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550704#comment-14550704 ] Wangda Tan commented on YARN-3565: -- Thanks [~Naganarasimha]. Looks good, +1. > NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object > instead of String > - > > Key: YARN-3565 > URL: https://issues.apache.org/jira/browse/YARN-3565 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, > YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch > > > Now NM HB/Register uses Set, it will be hard to add new fields if we > want to support specifying NodeLabel type such as exclusivity/constraints, > etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumana Sathish updated YARN-3681: - Attachment: yarncmd.png > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Priority: Critical > Labels: yarn-client > Attachments: yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550711#comment-14550711 ] Wangda Tan commented on YARN-3583: -- Thanks [~sunilg], looks good, +1. > Support of NodeLabel object instead of plain String in YarnClient side. > --- > > Key: YARN-3583 > URL: https://issues.apache.org/jira/browse/YARN-3583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, > 0003-YARN-3583.patch, 0004-YARN-3583.patch > > > Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. > getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of > using plain label name. > This will help to bring other label details such as Exclusivity to client > side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550714#comment-14550714 ] Karthik Kambatla commented on YARN-3411: My recommendation regarding visibility annotations is to always specify them irrespective of whether they are Private or Public. My rationale - at the time of implementing something, it is good to actively think about intended usage. That said, our compat guidelines explicitly say that classes not annotated are implicitly Private. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3681: -- Assignee: Varun Saxena > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Critical > Labels: yarn-client > Attachments: yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550717#comment-14550717 ] Karthik Kambatla commented on YARN-3411: By classes I meant outer classes. Inner classes and other members inherit the annotation of the outer class. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumana Sathish updated YARN-3681: - Labels: windows yarn-client (was: yarn-client) > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Critical > Labels: windows, yarn-client > Attachments: yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1735) For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB
[ https://issues.apache.org/jira/browse/YARN-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550728#comment-14550728 ] Siqi Li commented on YARN-1735: --- Hi [~jianhe], I just rebased the patch, and checkstyle, whitespace and findbugs seem to be irrelevant. > For FairScheduler AvailableMB in QueueMetrics is the same as AllocateMB > --- > > Key: YARN-1735 > URL: https://issues.apache.org/jira/browse/YARN-1735 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Siqi Li > Attachments: YARN-1735.v1.patch, YARN-1735.v2.patch, > YARN-1735.v3.patch, YARN-1735.v4.patch > > > in monitoring graphs the AvailableMB of each queue regularly spikes between > the AllocatedMB and the entire cluster capacity. > This cannot be correct since AvailableMB should never be more than the queue > max allocation. The spikes are quite confusing since the availableMB is set > as the fair share of each queue and the fair share of each queue is bond by > their allowed max resource. > Other than the spiking, the availableMB is always equal to allocatedMB. I > think this is not very useful, availableMB for each queue should be their > allowed max resource minus allocatedMB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3681: --- Attachment: YARN-3681.01.patch Attaching a trivial patch. This should fix the issue > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Critical > Labels: windows, yarn-client > Attachments: YARN-3681.01.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550733#comment-14550733 ] Junping Du commented on YARN-3411: -- bq. Regarding the public/private annotations, I often got beat it to the head that in principle the private annotation is opt-in; i.e. if there is no visibility annotation it's implicitly assumed to be up for use. I've gotten review comments that said we should mark them explicitly as private even if they are clearly YARN-internal classes. That's just my experience on this. In my understanding, it depends on if this class is within api package that under hadoop-yarn-api or hadoop-yarn-common modules. If so, we may need to explicitly mark it as Private if we don't like to share it outside of hadoop projects. For other classes (like case here for TimelineWriterUtils which is in server side), we don't have to mark any annotation. [~vinodkv], can you confirm this? > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1945) Adding description for each pool in Fair Scheduler Page from fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550734#comment-14550734 ] Siqi Li commented on YARN-1945: --- Hi [~xgong], I just rebased the patch, and checkstyle and findbugs doesn't seem to apply > Adding description for each pool in Fair Scheduler Page from > fair-scheduler.xml > --- > > Key: YARN-1945 > URL: https://issues.apache.org/jira/browse/YARN-1945 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1945.v2.patch, YARN-1945.v3.patch, > YARN-1945.v4.patch, YARN-1945.v5.patch, YARN-1945.v6.patch, YARN-1945.v7.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says "could not find main class 'queue'" in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550744#comment-14550744 ] Hadoop QA commented on YARN-3681: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 0m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | release audit | 0m 14s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 0m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733860/YARN-3681.01.patch | | Optional Tests | | | git revision | trunk / de30d66 | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8002/console | This message was automatically generated. > yarn cmd says "could not find main class 'queue'" in windows > > > Key: YARN-3681 > URL: https://issues.apache.org/jira/browse/YARN-3681 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 > Environment: Windows Only >Reporter: Sumana Sathish >Assignee: Varun Saxena >Priority: Critical > Labels: windows, yarn-client > Attachments: YARN-3681.01.patch, yarncmd.png > > > Attached the screenshot of the command prompt in windows running yarn queue > command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550758#comment-14550758 ] Xuan Gong commented on YARN-3601: - +1 LGTM. Will commit > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550764#comment-14550764 ] Xuan Gong commented on YARN-3601: - Committed into trunk/branch-2/branch-2.7. Thanks, [~cheersyang] > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Fix For: 2.7.1 > > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550770#comment-14550770 ] Hudson commented on YARN-3601: -- FAILURE: Integrated in Hadoop-trunk-Commit #7862 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7862/]) YARN-3601. Fix UT TestRMFailover.testRMWebAppRedirect. Contributed by Weiwei Yang (xgong: rev 5009ad4a7f712fc578b461ecec53f7f97eaaed0c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java * hadoop-yarn-project/CHANGES.txt > Fix UT TestRMFailover.testRMWebAppRedirect > -- > > Key: YARN-3601 > URL: https://issues.apache.org/jira/browse/YARN-3601 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp > Environment: Red Hat Enterprise Linux Workstation release 6.5 > (Santiago) >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Labels: test > Fix For: 2.7.1 > > Attachments: YARN-3601.001.patch > > > This test case was not working since the commit from YARN-2605. It failed > with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550806#comment-14550806 ] Ravi Prakash commented on YARN-3302: +1. Lgtm. Committing shortly. Thanks Ravindra, Varun and Vinod. > TestDockerContainerExecutor should run automatically if it can detect docker > in the usual place > --- > > Key: YARN-3302 > URL: https://issues.apache.org/jira/browse/YARN-3302 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Ravi Prakash >Assignee: Ravindra Kumar Naik > Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, > YARN-3302-trunk.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3302) TestDockerContainerExecutor should run automatically if it can detect docker in the usual place
[ https://issues.apache.org/jira/browse/YARN-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550835#comment-14550835 ] Hudson commented on YARN-3302: -- FAILURE: Integrated in Hadoop-trunk-Commit #7863 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7863/]) YARN-3302. TestDockerContainerExecutor should run automatically if it can detect docker in the usual place (Ravindra Kumar Naik via raviprak) (raviprak: rev c97f32e7b9d9e1d4c80682cc01741579166174d1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/CHANGES.txt > TestDockerContainerExecutor should run automatically if it can detect docker > in the usual place > --- > > Key: YARN-3302 > URL: https://issues.apache.org/jira/browse/YARN-3302 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.6.0 >Reporter: Ravi Prakash >Assignee: Ravindra Kumar Naik > Attachments: YARN-3302-trunk.001.patch, YARN-3302-trunk.002.patch, > YARN-3302-trunk.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550839#comment-14550839 ] Akira AJISAKA commented on YARN-2336: - The test failure looks unrelated to the patch. The test passed locally. The findbugs warning is not related to the patch. See YARN-3677 for detail. > Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree > -- > > Key: YARN-2336 > URL: https://issues.apache.org/jira/browse/YARN-2336 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.1, 2.6.0 >Reporter: Kenji Kikushima >Assignee: Akira AJISAKA > Labels: BB2015-05-RFC > Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, > YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, > YARN-2336.009.patch, YARN-2336.patch > > > When we have sub queues in Fair Scheduler, REST api returns a missing '[' > blacket JSON for childQueues. > This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3619: Attachment: YARN-3619.000.patch > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550854#comment-14550854 ] Rohith commented on YARN-3543: -- About the -1's from QA, # Findbugs is YARN-3677 exists to track issue. # Checkstyle error is number of parameter exceeds 7, which need to be ignored i think. Am not sure , should it be added to any ignore file or just ignore it. # Reg test failures, I am doubt on the test machines, many tests are failing .. ## Type-1, Address already in use exception. ## Type-2, NoSuchMethodError ## Type-3, ClassCasteException and many others I am pretty doubt on the order of compilation and test execution. Probably , for running resourcemanager tests, it is not including the modified classes in yarn-api/yarn-common. so NoSuchMethod error is thrown. > ApplicationReport should be able to tell whether the Application is AM > managed or not. > --- > > Key: YARN-3543 > URL: https://issues.apache.org/jira/browse/YARN-3543 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.6.0 >Reporter: Spandan Dutta >Assignee: Rohith > Labels: BB2015-05-TBR > Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, > 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, > 0003-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG > > > Currently we can know whether the application submitted by the user is AM > managed from the applicationSubmissionContext. This can be only done at the > time when the user submits the job. We should have access to this info from > the ApplicationReport as well so that we can check whether an app is AM > managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550852#comment-14550852 ] Wangda Tan commented on YARN-3632: -- I think we can add a set to queue to track apps (schedulableEntity) needs to be changed, we don't need to remove/insert it everytime, we only need to do that once when doing assignContainers next time. Pesudo code may look like: {code} if (schedulableEntity.allocate-container/release-container/update-demand) then: orderingPolicy.markNeedUpdate(schedulableEntity) {code} And {code} orderingPolicy#getAllocateIterator: for (schedulableEntity : needToUpdateEntities): remove-and-insert(schedulableEntity) {code} This can avoid excessive modifications to TreeSet in OrderingPolicy. Thoughts? [~cwelch]. > Ordering policy should be allowed to reorder an application when demand > changes > --- > > Key: YARN-3632 > URL: https://issues.apache.org/jira/browse/YARN-3632 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, > YARN-3632.4.patch, YARN-3632.5.patch > > > At present, ordering policies have the option to have an application > re-ordered (for allocation and preemption) when it is allocated to or a > container is recovered from the application. Some ordering policies may also > need to reorder when demand changes if that is part of the ordering > comparison, this needs to be made available (and used by the > fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550858#comment-14550858 ] Vrushali C commented on YARN-3411: -- Thanks [~djp] ! I will make these changes.. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550894#comment-14550894 ] zhihai xu commented on YARN-3619: - I uploaded a patch YARN-3619.000.patch for review. I added a configuration NM_CONTAINER_METRICS_UNREGISTER_DELAY_MS to configure when to unregister the container metrics after it is finished. Because it may have potential memory leak If I schedule a thread to do unregistration at getMetrics. It looks like getMetrics will be called from two places:MetricsSystemImpl#sampleMetrics and MetricsSourceAdapter#getMBeanInfo. sampleMetrics won't be called if no sinks in MetricsSystemImpl. getMBeanInfo may not be called after registration if JMXJsonServlet#doGet is not called(no http Get request from JMX clients). It looks like there is a possibility that getMetrics won't be called after registration. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3674) YARN application disappears from view
[ https://issues.apache.org/jira/browse/YARN-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550896#comment-14550896 ] Sergey Shelukhin commented on YARN-3674: I don't think so, unless filtering sticks even if you go and explicitly deselect it. Maybe showing the current filter on the page would be a good start... > YARN application disappears from view > - > > Key: YARN-3674 > URL: https://issues.apache.org/jira/browse/YARN-3674 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Sergey Shelukhin > > I have 2 tabs open at exact same URL with RUNNING applications view. There is > an application that is, in fact, running, that is visible in one tab but not > the other. This persists across refreshes. If I open new tab from the tab > where the application is not visible, in that tab it shows up ok. > I didn't change scheduler/queue settings before this behavior happened; on > [~sseth]'s advice I went and tried to click the root node of the scheduler on > scheduler page; the app still does not become visible. > Something got stuck somewhere... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container
[ https://issues.apache.org/jira/browse/YARN-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darrell Taylor reassigned YARN-2355: Assignee: Darrell Taylor > MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container > -- > > Key: YARN-2355 > URL: https://issues.apache.org/jira/browse/YARN-2355 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Darrell Taylor > Labels: newbie > > After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether > it has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be > able to notify the application of the up-to-date remaining retry quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-1012: -- Attachment: YARN-1012-3.patch Trying with the latest trunk. > NM should report resource utilization of running containers to RM in heartbeat > -- > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-2.patch, YARN-1012-3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3534: -- Attachment: YARN-3534-9.patch Removed nodemanager context. Added vmem (I cannot put it inside the Resource though). > Collect node resource utilization > - > > Key: YARN-3534 > URL: https://issues.apache.org/jira/browse/YARN-3534 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.0 >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, > YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, > YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > YARN should be aware of the resource utilization of the nodes when scheduling > containers. For this, this task will implement the NodeResourceMonitor and > send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Collect node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550962#comment-14550962 ] Inigo Goiri commented on YARN-3534: --- Where do you want to put the vmem and where to get the cpu-wall time? Regarding YarnConfiguration, I didn't understand what you wanted to do there. Do you want to put the constants in the class itself? > Collect node resource utilization > - > > Key: YARN-3534 > URL: https://issues.apache.org/jira/browse/YARN-3534 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.0 >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Attachments: YARN-3534-1.patch, YARN-3534-2.patch, YARN-3534-3.patch, > YARN-3534-3.patch, YARN-3534-4.patch, YARN-3534-5.patch, YARN-3534-6.patch, > YARN-3534-7.patch, YARN-3534-8.patch, YARN-3534-9.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > YARN should be aware of the resource utilization of the nodes when scheduling > containers. For this, this task will implement the NodeResourceMonitor and > send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Agarwal updated YARN-3633: Attachment: YARN-3633-1.patch > With Fair Scheduler, cluster can logjam when there are too many queues > -- > > Key: YARN-3633 > URL: https://issues.apache.org/jira/browse/YARN-3633 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Critical > Attachments: YARN-3633-1.patch, YARN-3633.patch > > > It's possible to logjam a cluster by submitting many applications at once in > different queues. > For example, let's say there is a cluster with 20GB of total memory. Let's > say 4 users submit applications at the same time. The fair share of each > queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most > 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the > cluster logjams. Nothing gets scheduled even when 20GB of resources are > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1012) NM should report resource utilization of running containers to RM in heartbeat
[ https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550987#comment-14550987 ] Hadoop QA commented on YARN-1012: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 17s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 2m 23s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733899/YARN-1012-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8860e35 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8004/console | This message was automatically generated. > NM should report resource utilization of running containers to RM in heartbeat > -- > > Key: YARN-1012 > URL: https://issues.apache.org/jira/browse/YARN-1012 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Arun C Murthy >Assignee: Inigo Goiri > Attachments: YARN-1012-1.patch, YARN-1012-2.patch, YARN-1012-3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object
[ https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3647: -- Attachment: 0002-YARN-3647.patch Thank you [~leftnoteasy]. I have addressed the comments in the new patch. > RMWebServices api's should use updated api from CommonNodeLabelsManager to > get NodeLabel object > --- > > Key: YARN-3647 > URL: https://issues.apache.org/jira/browse/YARN-3647 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch > > > After YARN-3579, RMWebServices apis can use the updated version of apis in > CommonNodeLabelsManager which gives full NodeLabel object instead of creating > NodeLabel object from plain label name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551006#comment-14551006 ] Hadoop QA commented on YARN-3619: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 6s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 25m 14s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 41s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 73m 17s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733882/YARN-3619.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c97f32e | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8003/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8003/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8003/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8003/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8003/console | This message was automatically generated. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551018#comment-14551018 ] Zhijie Shen commented on YARN-3411: --- bq. A flow can be uniquely identified with the flow name and run id (and of course cluster and user id). I think in Phoenix impl, we have treated version as part of identifier of a unique flow. bq. Hmm, so schema creation happens more or less once in the lifetime of the hbase cluster like during cluster setup (or perhaps if we decide to drop and recreate it, which is rare in production). I believe writers will come to life and cease to exist with each yarn application lifecycle but cluster is more or less eternal, so adding to this step to the lifecycle of a Writer Impl object seems somewhat out of place to me. Fair point. And this is another place different from Phoenix impl, which creates table if they don't exist. My perspective is more about automation, and it's better to leave fewer steps for users to setup the service. Perhaps we can find somewhere else to invoke the table initialization once if the service is setup for YARN cluster, and HBase/Phoenix is used as the backend. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551018#comment-14551018 ] Zhijie Shen edited comment on YARN-3411 at 5/19/15 7:00 PM: bq. A flow can be uniquely identified with the flow name and run id (and of course cluster and user id). I think in Phoenix impl, we have treated version as part of identifier of a unique flow. bq. Hmm, so schema creation happens more or less once in the lifetime of the hbase cluster like during cluster setup (or perhaps if we decide to drop and recreate it, which is rare in production). I believe writers will come to life and cease to exist with each yarn application lifecycle but cluster is more or less eternal, so adding to this step to the lifecycle of a Writer Impl object seems somewhat out of place to me. Fair point. And this is another place different from Phoenix impl, which creates table if they don't exist. My perspective is more about automation, and it's better to leave fewer steps for users to setup the service. Perhaps we can find somewhere else instead of multiple, distributed writer to invoke the table initialization once if the service is setup for YARN cluster, and HBase/Phoenix is used as the backend. was (Author: zjshen): bq. A flow can be uniquely identified with the flow name and run id (and of course cluster and user id). I think in Phoenix impl, we have treated version as part of identifier of a unique flow. bq. Hmm, so schema creation happens more or less once in the lifetime of the hbase cluster like during cluster setup (or perhaps if we decide to drop and recreate it, which is rare in production). I believe writers will come to life and cease to exist with each yarn application lifecycle but cluster is more or less eternal, so adding to this step to the lifecycle of a Writer Impl object seems somewhat out of place to me. Fair point. And this is another place different from Phoenix impl, which creates table if they don't exist. My perspective is more about automation, and it's better to leave fewer steps for users to setup the service. Perhaps we can find somewhere else to invoke the table initialization once if the service is setup for YARN cluster, and HBase/Phoenix is used as the backend. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, > YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, > YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551090#comment-14551090 ] Hadoop QA commented on YARN-3583: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 17s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 109m 44s | Tests passed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 10s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 2s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 30s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 50m 57s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 215m 11s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733845/0004-YARN-3583.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / de30d66 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8001/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/8001/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8001/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8001/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8001/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8001/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8001/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8001/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8001/console | This message was automatically generated. > Support of NodeLabel object instead of plain String in YarnClient side. > --- > > Key: YARN-3583 > URL: https://issues.apache.org/jira/browse/YARN-3583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, > 0003-YARN-3583.patch, 0004-YARN-3583.patch > > > Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. > getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of > using plain label name. > This will help to bring other label details such as Exclusivity to client > side. -- This message was sent by Atlassian J
[jira] [Commented] (YARN-3560) Not able to navigate to the cluster from tracking url (proxy) generated after submission of job
[ https://issues.apache.org/jira/browse/YARN-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551092#comment-14551092 ] Hadoop QA commented on YARN-3560: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733403/YARN-3560.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e422e76 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8007/console | This message was automatically generated. > Not able to navigate to the cluster from tracking url (proxy) generated after > submission of job > --- > > Key: YARN-3560 > URL: https://issues.apache.org/jira/browse/YARN-3560 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Anushri >Priority: Minor > Attachments: YARN-3560.patch > > > a standalone web proxy server is enabled in the cluster > when a job is submitted the url generated contains proxy > track this url > in the web page , if we try to navigate to the cluster links [about. > applications, or scheduler] it gets redirected to some default port instead > of actual RM web port configured > as such it throws "webpage not available" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-2556: --- Attachment: YARN-2556.9.patch [~sjlee0] thanks a lot for review and pointing me to those two very helpful jiras! I have updated my patch by following the style you did in TimelineServicePerformanceV2, and refactor the entities creation and entities put work into a separate SimpleEntityWriterV1 mapper. I have also enabled switch between v1 and v2. But I haven't import the Job History File Replay Mapper yet, do I also need to? Thanks! > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Chang Li > Labels: BB2015-05-TBR > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, > YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, > YARN-2556.9.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, > yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1814) Better error message when browsing logs in the RM/NM webuis
[ https://issues.apache.org/jira/browse/YARN-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551114#comment-14551114 ] Dustin Cote commented on YARN-1814: --- [~jianhe] indeed it looks like this one already got fixed in a later version. I'm not sure where, but I see that when I test this on 2.6, I get an authorization error instead. This can probably be closed as invalid. > Better error message when browsing logs in the RM/NM webuis > --- > > Key: YARN-1814 > URL: https://issues.apache.org/jira/browse/YARN-1814 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Dustin Cote >Priority: Minor > Attachments: YARN-1814-1.patch, YARN-1814-2.patch > > > Browsing the webUI as a different user than the one who ran an MR job, I > click into host:8088/cluster/app/, then the "logs" link. This > redirects to the NM, but since I don't have permissions it prints out: > bq. Failed redirect for container_1394482121761_0010_01_01 > bq. Failed while trying to construct the redirect url to the log server. Log > Server url may not be configured > bq. Container does not exist. > It'd be nicer to print something about permissions instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551146#comment-14551146 ] Hadoop QA commented on YARN-3633: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 17s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 47s | The applied patch generated 2 new checkstyle issues (total was 120, now 120). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 19s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 40s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 88m 17s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733912/YARN-3633-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fd3cb53 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8005/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8005/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8005/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8005/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8005/console | This message was automatically generated. > With Fair Scheduler, cluster can logjam when there are too many queues > -- > > Key: YARN-3633 > URL: https://issues.apache.org/jira/browse/YARN-3633 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Critical > Attachments: YARN-3633-1.patch, YARN-3633.patch > > > It's possible to logjam a cluster by submitting many applications at once in > different queues. > For example, let's say there is a cluster with 20GB of total memory. Let's > say 4 users submit applications at the same time. The fair share of each > queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most > 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the > cluster logjams. Nothing gets scheduled even when 20GB of resources are > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3044: -- Labels: (was: BB2015-05-TBR) > [Event producers] Implement RM writing app lifecycle events to ATS > -- > > Key: YARN-3044 > URL: https://issues.apache.org/jira/browse/YARN-3044 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3044-YARN-2928.004.patch, > YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, > YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, > YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, > YARN-3044.20150416-1.patch > > > Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2876) In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for subqueues
[ https://issues.apache.org/jira/browse/YARN-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2876: -- Attachment: YARN-2876.v4.patch > In Fair Scheduler, JMX and Scheduler UI display wrong maxResource info for > subqueues > > > Key: YARN-2876 > URL: https://issues.apache.org/jira/browse/YARN-2876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-2876.v1.patch, YARN-2876.v2.patch, > YARN-2876.v3.patch, YARN-2876.v4.patch, screenshot-1.png > > > If a subqueue doesn't have a maxResource set in fair-scheduler.xml, JMX and > Scheduler UI will display the entire cluster capacity as its maxResource > instead of its parent queue's maxResource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1814) Better error message when browsing logs in the RM/NM webuis
[ https://issues.apache.org/jira/browse/YARN-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551181#comment-14551181 ] Jian He commented on YARN-1814: --- thansk [~cotedm], closing this. > Better error message when browsing logs in the RM/NM webuis > --- > > Key: YARN-1814 > URL: https://issues.apache.org/jira/browse/YARN-1814 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Dustin Cote >Priority: Minor > Attachments: YARN-1814-1.patch, YARN-1814-2.patch > > > Browsing the webUI as a different user than the one who ran an MR job, I > click into host:8088/cluster/app/, then the "logs" link. This > redirects to the NM, but since I don't have permissions it prints out: > bq. Failed redirect for container_1394482121761_0010_01_01 > bq. Failed while trying to construct the redirect url to the log server. Log > Server url may not be configured > bq. Container does not exist. > It'd be nicer to print something about permissions instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1814) Better error message when browsing logs in the RM/NM webuis
[ https://issues.apache.org/jira/browse/YARN-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-1814. --- Resolution: Cannot Reproduce > Better error message when browsing logs in the RM/NM webuis > --- > > Key: YARN-1814 > URL: https://issues.apache.org/jira/browse/YARN-1814 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Dustin Cote >Priority: Minor > Attachments: YARN-1814-1.patch, YARN-1814-2.patch > > > Browsing the webUI as a different user than the one who ran an MR job, I > click into host:8088/cluster/app/, then the "logs" link. This > redirects to the NM, but since I don't have permissions it prints out: > bq. Failed redirect for container_1394482121761_0010_01_01 > bq. Failed while trying to construct the redirect url to the log server. Log > Server url may not be configured > bq. Container does not exist. > It'd be nicer to print something about permissions instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3647) RMWebServices api's should use updated api from CommonNodeLabelsManager to get NodeLabel object
[ https://issues.apache.org/jira/browse/YARN-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551212#comment-14551212 ] Hadoop QA commented on YARN-3647: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 11s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 31s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 8 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 46s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 50m 23s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 91m 56s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733914/0002-YARN-3647.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e422e76 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8006/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8006/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8006/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8006/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8006/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8006/console | This message was automatically generated. > RMWebServices api's should use updated api from CommonNodeLabelsManager to > get NodeLabel object > --- > > Key: YARN-3647 > URL: https://issues.apache.org/jira/browse/YARN-3647 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3647.patch, 0002-YARN-3647.patch > > > After YARN-3579, RMWebServices apis can use the updated version of apis in > CommonNodeLabelsManager which gives full NodeLabel object instead of creating > NodeLabel object from plain label name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551213#comment-14551213 ] Rohit Agarwal commented on YARN-3633: - The remaining checkstyle and findbugs issues seem to be preexisting. > With Fair Scheduler, cluster can logjam when there are too many queues > -- > > Key: YARN-3633 > URL: https://issues.apache.org/jira/browse/YARN-3633 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Critical > Attachments: YARN-3633-1.patch, YARN-3633.patch > > > It's possible to logjam a cluster by submitting many applications at once in > different queues. > For example, let's say there is a cluster with 20GB of total memory. Let's > say 4 users submit applications at the same time. The fair share of each > queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most > 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the > cluster logjams. Nothing gets scheduled even when 20GB of resources are > available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)