[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946246#comment-13946246 ] Zhijie Shen commented on YARN-1521: --- Browsed through the newest, and have the following comments: 1. If the application is already in the RM cache, we shouldn't log success. Otherwise, there may be multiple logs for one submission. {code} + RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST, + ClientRMService, applicationId); {code} 2. Should it sleep a while before retry? {code} + try { +client.getApplications(); +return; + } catch (Exception e) { +LOG.error(e.getMessage()); + } finally { +client.stop(); + } {code} 3. Again, sleep 1ms before next try, yielding to the thread of API methods' invoking. And, have max retry when exception? {code} while (keepRunning) { + if (cluster.getStartFailoverFlag()) { +try { + explicitFailover(); + keepRunning = false; +} catch (Exception e) { + // Do Nothing +} finally { + keepRunning = false; +} + } +} {code} Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946397#comment-13946397 ] Hudson commented on YARN-1838: -- FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/520/]) YARN-1838. Enhanced timeline service getEntities API to get entities from a given entity ID or insertion timestamp. Contributed by Billie Rinaldi. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580960) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineReader.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestMemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Fix For: 2.4.0 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, YARN-1838.4.patch, YARN-1838.5.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946404#comment-13946404 ] Hudson commented on YARN-1670: -- FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/520/]) YARN-1670. aggregated log writer can write more log data then it says is the log length (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580957) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 3.0.0, 0.23.11, 2.4.0, 2.5.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs
[ https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946414#comment-13946414 ] Hudson commented on YARN-1852: -- FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/520/]) YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs - Key: YARN-1852 URL: https://issues.apache.org/jira/browse/YARN-1852 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0, 2.4.0 Reporter: Rohith Assignee: Rohith Fix For: 2.4.0 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch Recovering for failed/killed application throw InvalidStateTransitonException. These are logged during recovery of applications. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1850) Make enabling timeline service configurable
[ https://issues.apache.org/jira/browse/YARN-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946412#comment-13946412 ] Hudson commented on YARN-1850: -- FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/520/]) YARN-1850. Introduced the ability to optionally disable sending out timeline-events in the TimelineClient. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581189) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Make enabling timeline service configurable Key: YARN-1850 URL: https://issues.apache.org/jira/browse/YARN-1850 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1850.1.patch Like generic history service, we'd better to make enabling timeline service configurable, in case the timeline server is not up -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1871) We should eliminate writing *PBImpl code in YARN
Wangda Tan created YARN-1871: Summary: We should eliminate writing *PBImpl code in YARN Key: YARN-1871 URL: https://issues.apache.org/jira/browse/YARN-1871 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Currently, We need write PBImpl classes one by one. After running find . -name *PBImpl*.java | xargs wc -l under hadoop source code directory, we can see, there're more than 25,000 LOC. I think we should improve this, which will be very helpful for YARN developers to make changes for YARN protocols. There're only some limited patterns in current *PBImpl, * Simple types, like string, int32, float. * List? types * Map? types * Enum types Code generation should be enough to generate such PBImpl classes. Some other requirements are, * Leave other related code alone, like service implemention (e.g. ContainerManagerImpl). * (If possible) Forward compatibility, developpers can write their own PBImpl or genereate them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1871) We should eliminate writing *PBImpl code in YARN
[ https://issues.apache.org/jira/browse/YARN-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946460#comment-13946460 ] Wangda Tan commented on YARN-1871: -- Some possible methods to eliminate writing PBImpl source code in my head, 1. Using Java annotation processor (RetentionPolicy=SOURCE), an example is [google auto|https://github.com/google/auto] project. We can put an annotation in record classes, like {code} @GeneratePBImpl (protoclass=“org.apache.hadoop.yarn.proto.YarnProtos.ApplicationIdProto”) public abstract class ApplicationId { ... } {code} Then we can implement a GeneratePBImpl annotation processor to generate PBImpl code when compiling. 2. Using ProtocolBuffer parser directly parsing .proto and generate PBImpl code We can get message description, fields, types to get fields in .proto file and generate code by using PB parser. But unfortunately, PB doesn’t provide a java-based parser, we need write a c-based program using such parsers (see [issue-263|https://code.google.com/p/protobuf/issues/detail?id=263]) 3. Similar to @AtMostOnce annotation, make the ser-de as a runtime behavior. In this method, we don’t need generate PBImpl source code or classes, we can create an RetentionPolicy=RUNTIME annotation processor, mark record classes, such as, {code} @RecordClass (protoclass=“org.apache.hadoop.yarn.proto.YarnProtos.ApplicationIdProto”) public abstract class ApplicationId { ... } {code} Similar to annotation, when we need serialize/deserialize this class, we will check if is it a “record class” or not in runtime. If yes, we can simply use its getters/setters and PB generated class (*Proto) doing serialization/deserialization. Any other thoughts on this? Hope to get your ideas. We should eliminate writing *PBImpl code in YARN Key: YARN-1871 URL: https://issues.apache.org/jira/browse/YARN-1871 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Currently, We need write PBImpl classes one by one. After running find . -name *PBImpl*.java | xargs wc -l under hadoop source code directory, we can see, there're more than 25,000 LOC. I think we should improve this, which will be very helpful for YARN developers to make changes for YARN protocols. There're only some limited patterns in current *PBImpl, * Simple types, like string, int32, float. * List? types * Map? types * Enum types Code generation should be enough to generate such PBImpl classes. Some other requirements are, * Leave other related code alone, like service implemention (e.g. ContainerManagerImpl). * (If possible) Forward compatibility, developpers can write their own PBImpl or genereate them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946487#comment-13946487 ] Rohith commented on YARN-1854: -- [~mitdesai], I checked attached logs for while. It is very strange behaviour in attached logs :-( Observe that No removeNode event during processing RECONNECTED event.!!! Does NODE_ADDED event is coming first then NODE_REMOVED?? Probably may be.. But can you run with latest trunk code since added retry for 5 sec.. TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1850) Make enabling timeline service configurable
[ https://issues.apache.org/jira/browse/YARN-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946544#comment-13946544 ] Hudson commented on YARN-1850: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/]) YARN-1850. Introduced the ability to optionally disable sending out timeline-events in the TimelineClient. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581189) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Make enabling timeline service configurable Key: YARN-1850 URL: https://issues.apache.org/jira/browse/YARN-1850 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1850.1.patch Like generic history service, we'd better to make enabling timeline service configurable, in case the timeline server is not up -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs
[ https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946546#comment-13946546 ] Hudson commented on YARN-1852: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/]) YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs - Key: YARN-1852 URL: https://issues.apache.org/jira/browse/YARN-1852 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0, 2.4.0 Reporter: Rohith Assignee: Rohith Fix For: 2.4.0 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch Recovering for failed/killed application throw InvalidStateTransitonException. These are logged during recovery of applications. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946536#comment-13946536 ] Hudson commented on YARN-1670: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/]) YARN-1670. aggregated log writer can write more log data then it says is the log length (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580957) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 3.0.0, 0.23.11, 2.4.0, 2.5.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1850) Make enabling timeline service configurable
[ https://issues.apache.org/jira/browse/YARN-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946572#comment-13946572 ] Hudson commented on YARN-1850: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/]) YARN-1850. Introduced the ability to optionally disable sending out timeline-events in the TimelineClient. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581189) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml Make enabling timeline service configurable Key: YARN-1850 URL: https://issues.apache.org/jira/browse/YARN-1850 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.4.0 Attachments: YARN-1850.1.patch Like generic history service, we'd better to make enabling timeline service configurable, in case the timeline server is not up -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs
[ https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946574#comment-13946574 ] Hudson commented on YARN-1852: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/]) YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs - Key: YARN-1852 URL: https://issues.apache.org/jira/browse/YARN-1852 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0, 2.4.0 Reporter: Rohith Assignee: Rohith Fix For: 2.4.0 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch Recovering for failed/killed application throw InvalidStateTransitonException. These are logged during recovery of applications. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length
[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946564#comment-13946564 ] Hudson commented on YARN-1670: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/]) YARN-1670. aggregated log writer can write more log data then it says is the log length (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580957) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java aggregated log writer can write more log data then it says is the log length Key: YARN-1670 URL: https://issues.apache.org/jira/browse/YARN-1670 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Mit Desai Priority: Critical Fix For: 3.0.0, 0.23.11, 2.4.0, 2.5.0 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 curRead fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946529#comment-13946529 ] Hudson commented on YARN-1838: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/]) YARN-1838. Enhanced timeline service getEntities API to get entities from a given entity ID or insertion timestamp. Contributed by Billie Rinaldi. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580960) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineReader.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestMemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Fix For: 2.4.0 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, YARN-1838.4.patch, YARN-1838.5.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id
[ https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946557#comment-13946557 ] Hudson commented on YARN-1838: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/]) YARN-1838. Enhanced timeline service getEntities API to get entities from a given entity ID or insertion timestamp. Contributed by Billie Rinaldi. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580960) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineReader.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestMemoryTimelineStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java Timeline service getEntities API should provide ability to get entities from given id - Key: YARN-1838 URL: https://issues.apache.org/jira/browse/YARN-1838 Project: Hadoop YARN Issue Type: Sub-task Reporter: Srimanth Gunturi Assignee: Billie Rinaldi Fix For: 2.4.0 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, YARN-1838.4.patch, YARN-1838.5.patch To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11] {{fromid}} should be inclusive of the id given. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1612) Change Fair Scheduler to not disable delay scheduling by default
[ https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946646#comment-13946646 ] Chen He commented on YARN-1612: --- Hi [~sandyr] Would you mind clarify this JIRA a little bit for me? I took a look of the FairScheduler source code. Here are my 2 questions: 1) Since the delay interval will benefits the data locality but also affects map tasks assignment if it is too long, to enable delay scheduling by default, we need a relative reasonable delay interval; 2) This relative reasonable delay interval can change depending on the size of cluster; Change Fair Scheduler to not disable delay scheduling by default Key: YARN-1612 URL: https://issues.apache.org/jira/browse/YARN-1612 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Assignee: Chen He -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1872) TestDistributedShell occasionally fails in trunk
Ted Yu created YARN-1872: Summary: TestDistributedShell occasionally fails in trunk Key: YARN-1872 URL: https://issues.apache.org/jira/browse/YARN-1872 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console : TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and TestDistributedShell#testDSShell timed out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1872) TestDistributedShell occasionally fails in trunk
[ https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated YARN-1872: - Attachment: TestDistributedShell.out Output from console. TestDistributedShell occasionally fails in trunk Key: YARN-1872 URL: https://issues.apache.org/jira/browse/YARN-1872 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Attachments: TestDistributedShell.out From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console : TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and TestDistributedShell#testDSShell timed out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946731#comment-13946731 ] Mayank Bansal commented on YARN-1809: - Thanks [~zjshen] for the patch bq. Yes, I think it should, but I prefer to put it in ApplicationBaseProtocol when ApplicationHistoryClientService has implemented DT related methods. history protocol already have these methods so don't need to wait , as they have dummy iplementation for that. bq. ApplicationBaseProtocol and ApplicationContext are completely different things. ApplicationBaseProtocol is the PRC interface. Previously, I thought we should have a uniformed ApplicationContext: on the RM side, it wraps RMContext; while on the AHS side, it wraps ApplicationHistory. However, inspired by RMWebServices#getApps, I think the RPC interface is a better place to uniform the way of retrieving app info, so I created ApplicationBaseProtocol. And ApplicationContext is no longer useful. ApplicationBaseProtocol would be the base protocol of Client and history however application context is something different. The motivation for context is to wrap RM and AHS application data, SO I think having context make sense as protocol has totally different motivation and methods as well when we add the delegation methods to it. bq. I understand the big patch is desperate for review, but I've to do that because the patch is aiming to refactor the code to avoid duplicate web-UI code for RM and for AHS. The two webUI should share the common code path, and then display similarly. I am fine with this if this is something you want to do. {code} p + * The protocol between clients and the codeResourceManager/code or + * codeApplicationHistoryServer/code to get information on applications, + * application attempts and containers. + * /p This should be= it is a base protocol for application client and history. Shouldn't we add @Idempotent to getallapplications as well? If we add appliction context back then we need to rebase the patch according to that. Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946896#comment-13946896 ] Ted Yu commented on YARN-1873: -- Dup of YARN-1872 ? TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946939#comment-13946939 ] Mit Desai commented on YARN-1873: - I see it is a Timeout issue in YARN-1872. The error reported here is an assertion failure. I think this is a different issue TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1873: Affects Version/s: 2.4.0 TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946971#comment-13946971 ] Mit Desai commented on YARN-1873: - Yes it fails on branch-2.4. I updated the JIRA to reflect that. I am using JDK7. Have not tried with JDK6. But this is clearly a cleanup issue so I assumed it is JDK7 issue. If I ran the test testDSShell independently, it never fails. TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1866) YARN RM fails to load state store with delegation token parsing error
[ https://issues.apache.org/jira/browse/YARN-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946994#comment-13946994 ] Hudson commented on YARN-1866: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5400 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5400/]) YARN-1866. Fixed an issue with renewal of RM-delegation tokens on restart or fail-over. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581448) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewerLifecycle.java YARN RM fails to load state store with delegation token parsing error - Key: YARN-1866 URL: https://issues.apache.org/jira/browse/YARN-1866 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jian He Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1866.1.patch, YARN-1866.2.patch In our secure Nightlies we saw exceptions in the RM log where it failed to parse the deletegation token. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1521: Attachment: YARN-1521.2.patch Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947056#comment-13947056 ] Zhijie Shen commented on YARN-1873: --- I'll work on a patch for it. Take it over. Thanks for finding the issue. [~mdesai]! TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947054#comment-13947054 ] Zhijie Shen commented on YARN-1873: --- Hm, i can image the problem. MiniYARNCluster is static, such that the timeline store is going to have the entities from all test cases in TestDistributedShell. I cannot reproduce it locally, because testDSShell is always executed first. {code} protected static MiniYARNCluster yarnCluster = null; {code} TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-1873: - Assignee: Zhijie Shen (was: Mit Desai) TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Zhijie Shen Labels: java7 testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947059#comment-13947059 ] Xuan Gong commented on YARN-1521: - bq. 1. If the application is already in the RM cache, we shouldn't log success. Otherwise, there may be multiple logs for one submission. Removed bq. Should it sleep a while before retry? DONE bq. Again, sleep 1ms before next try, yielding to the thread of API methods' invoking. And, have max retry when exception? DONE. Did not add max retry. The failover thread will be killed after every testcases Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1873: Attachment: YARN-1873.patch Attaching the patch TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Zhijie Shen Attachments: YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947066#comment-13947066 ] Mit Desai commented on YARN-1873: - [~zjshen], Didn't see your comment. Attached the patch already TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Zhijie Shen Attachments: YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1867: -- Attachment: YARN-1867-20140325.txt The problem is that the web-services cache the acls-managers from the 'previous' RM. The acls-manager are recreated when a transition happens. Here's a patch to fix the issue - Changed web-services to not cache the application and queue acls-managers. I checked other instances in the web-app. These seem like the only two cached objects. - The code in the main ResourceManager has become unmaintenable after the introduction of the active-services. I had to resist cleaning up, quite a few things are broken in more ways than one. For now, moved a couple of things from the top level to be nested inside active-services. Will file a ticket for more cleanup. - Fixed few existing formatting issues - The test case fails without the code change with the same exception printed above and passes with. NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Labels: rest_api Attachments: YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947079#comment-13947079 ] Karthik Kambatla commented on YARN-1867: Thanks Vinod. Looking at the patch. NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Vinod Kumar Vavilapalli Priority: Blocker Labels: rest_api Attachments: YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947087#comment-13947087 ] Hadoop QA commented on YARN-1867: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636769/YARN-1867-20140325.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3454//console This message is automatically generated. NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Vinod Kumar Vavilapalli Priority: Blocker Labels: rest_api Attachments: YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file
Karthik Kambatla created YARN-1874: -- Summary: Cleanup: Move RMActiveServices out of ResourceManager into its own file Key: YARN-1874 URL: https://issues.apache.org/jira/browse/YARN-1874 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We should move RMActiveServices out to make it more manageable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947099#comment-13947099 ] Karthik Kambatla commented on YARN-1867: Good catch. The fix looks good to me. bq. The code in the main ResourceManager has become unmaintenable after the introduction of the active-services. Agree. I have been thinking of it too. I think we should just move RMActiveServices into its own file - that would force us to clean up the unwieldy mess it has become. Filed YARN-1874 to do the same. Feel free to pick it up. I can take a stab, may be in a week or two. That said, it would be nice to address all the cleanup changes there, particularly if they are not related to the bug we are fixing here. {code} +private DelegationTokenRenewer delegationTokenRenewer; +private EventHandlerSchedulerEvent schedulerDispatcher; +private ApplicationMasterLauncher applicationMasterLauncher; +private ContainerAllocationExpirer containerAllocationExpirer; + +private boolean recoveryEnabled; {code} Also, we should probably limit the formatting changes to the files that have non-formatting changes. May be leave out RMContextImpl? NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Vinod Kumar Vavilapalli Priority: Blocker Labels: rest_api Attachments: YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947114#comment-13947114 ] Mit Desai commented on YARN-1854: - [~rohithsharma] : The logs that I have submitted already has the 5secs timeout change. I am creating another jira for the issue. Can you please update the description of this jira so that it describes what it alctually fixes? TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947119#comment-13947119 ] Mit Desai commented on YARN-1854: - Created YARN-1875 to track the issue TestRMHA#testStartAndTransitions Fails -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1875) TestRMHA#testStartAndTransitions is failing
[ https://issues.apache.org/jira/browse/YARN-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1875: Attachment: Log.rtf Attaching the logs for the failure TestRMHA#testStartAndTransitions is failing --- Key: YARN-1875 URL: https://issues.apache.org/jira/browse/YARN-1875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Attachments: Log.rtf {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1875) TestRMHA#testStartAndTransitions is failing
Mit Desai created YARN-1875: --- Summary: TestRMHA#testStartAndTransitions is failing Key: YARN-1875 URL: https://issues.apache.org/jira/browse/YARN-1875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Attachments: Log.rtf {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947121#comment-13947121 ] Hadoop QA commented on YARN-1873: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636768/YARN-1873.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3455//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3455//console This message is automatically generated. TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947125#comment-13947125 ] Hadoop QA commented on YARN-1521: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636766/YARN-1521.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3453//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3453//console This message is automatically generated. Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947139#comment-13947139 ] Xuan Gong commented on YARN-1521: - [~kkambatl] Could you take a look ? Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947170#comment-13947170 ] Zhijie Shen commented on YARN-1873: --- [~mitdesai], no problem. I've reviewed your patch. The approach should work. One comment on the patch: These vars don't need to be static any more. {code} protected static MiniYARNCluster yarnCluster = null; protected static Configuration conf = new YarnConfiguration(); protected static String APPMASTER_JAR = JarFinder.getJar(ApplicationMaster.class); {code} TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947176#comment-13947176 ] Mit Desai commented on YARN-1873: - I'll change them and upload a new patch. Thanks for the review! TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1873: Attachment: YARN-1873.patch Attached the new patch TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch, YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1867: -- Attachment: YARN-1867-20140325-trunk.txt Patch for trunk. NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Vinod Kumar Vavilapalli Priority: Blocker Labels: rest_api Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947199#comment-13947199 ] Karthik Kambatla commented on YARN-1867: bq. A little bit of cleanup per patch should be okay. I'd like to keep them here unless you feel strongly that it is prohibiting the review. I am not very particular about it - the patch is fairly small and those changes don't get in the way of review. I am okay with leaving them in, if you insist. My only concern is the git history wouldn't tell us why we made these changes if we hide them behind a bug fix. NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Vinod Kumar Vavilapalli Priority: Blocker Labels: rest_api Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947224#comment-13947224 ] Zhijie Shen commented on YARN-1873: --- The patch looks almost good to me. One nit: I checked and found APPMASTER_JAR is never changed across the code. I thought it's better to be a final static constant. Sorry for my prior misleading. TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch, YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947244#comment-13947244 ] Mit Desai commented on YARN-1873: - I looked at it. Can you let me know if we need to make it {{protected final String APPMASTER_JAR = JarFinder.getJar(ApplicationMaster.class);}} or {{protected final static String APPMASTER_JAR = JarFinder.getJar(ApplicationMaster.class);}} I think making it final instead of final static will be enough. What do you say? TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch, YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947295#comment-13947295 ] Jian He commented on YARN-1521: --- - we should at least log this event saying this is an earlier submitted application, which is good for debugging. - TestProtocolHA - TestProtocolHABase.java - I ran TestApplicationClientProtocolOnHA locally without core code changes, the whole test eventually takes 15 mins to crash. I observed that the test keeps doing failover even if the test is done. can you investigate ? {code} 14/03/25 15:02:08 WARN resourcemanager.RMAuditLogger: USER=jhe OPERATION=transitionToStandby TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to standby PERMISSIONS=All users are allowed 14/03/25 15:02:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 14/03/25 15:02:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 14/03/25 15:02:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 14/03/25 15:02:23 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 14/03/25 15:02:35 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 14/03/25 15:02:57 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 14/03/25 15:03:30 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 {code} Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947302#comment-13947302 ] Vinod Kumar Vavilapalli commented on YARN-1521: --- bq. - TestProtocolHA - TestProtocolHABase.java Or ProtocolHATestBase. BTW, why is it ProtocolHA? Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails
[ https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947308#comment-13947308 ] Zhijie Shen commented on YARN-1873: --- APPMASTER_JAR is a jar will be taken without modification inall the test cases, therefore, it should be safe to be static. TestDistributedShell#testDSShell fails -- Key: YARN-1873 URL: https://issues.apache.org/jira/browse/YARN-1873 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: YARN-1873.patch, YARN-1873.patch testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec FAILURE! java.lang.AssertionError: expected:1 but was:6 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6 {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947311#comment-13947311 ] Jian He commented on YARN-1521: --- In TestApplicationClientProtocolOnHA , each client API call involves a new miniYarnCluster start and shutdown, how long does it take for the whole test to finish ? If that's too long, we can just reuse one miniYarnCluster. Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947324#comment-13947324 ] Zhijie Shen commented on YARN-1521: --- bq. If that's too long, we can just reuse one miniYarnCluster. Some input: It may or may not run into the case I saw in YARN-1873. Before reusing a single yarn cluster, it's good to make sure the change on miniYarnCluster in the current test case will not disturb the following test cases. Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947392#comment-13947392 ] Karthik Kambatla commented on YARN-1521: Sorry for the delay in getting to this. Haven't look at the patch itself yet. {quote} ApplicationMasterProtocol 1. registerApplicationMaster 2. finishApplicationMaster 3. allocate {quote} Don't remember the source corresponding to these methods off the top of my head, but would think we *should* mark all of them as well. I am okay with doing this in a separate JIRA to unblock this. # Register resets the sequence number - Idempotent. YARN-556 allows a running AM to re-register, albeit on resync. # Finish should be similar to kill - Idempotent. Should be similar to killApplication? # Allocate - Idempotent or AtMostOnce - as the AM heartbeats periodically. Preferably Idempotent. Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947399#comment-13947399 ] Karthik Kambatla commented on YARN-1521: Looked at the non-test source only. Looks good. One nit - it should be okay to fix at commit time. # Reword to it checks whether the application already exists? {code} * pDuring the submission process, it checks whether the application * has already exist. If the application exists, it will simply return * SubmitApplicationResponse/p {code} Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1521: Attachment: YARN-1521.3.patch Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, YARN-1521.3.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947414#comment-13947414 ] Xuan Gong commented on YARN-1521: - bq. In TestApplicationClientProtocolOnHA , each client API call involves a new miniYarnCluster start and shutdown, how long does it take for the whole test to finish ? If that's too long, we can just reuse one miniYarnCluster. It takes about 120s to finish all the test cases. bq. we should at least log this event saying this is an earlier submitted application, which is good for debugging. ADDED bq. TestProtocolHA - TestProtocolHABase.java changed Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, YARN-1521.3.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1452) Document the usage of the generic application history and the timeline data service
[ https://issues.apache.org/jira/browse/YARN-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1452: -- Attachment: YARN-1452.2.patch Mostly the same patch as before with grammar fixes. Will check this in if Jenkins says okay.. Document the usage of the generic application history and the timeline data service --- Key: YARN-1452 URL: https://issues.apache.org/jira/browse/YARN-1452 Project: Hadoop YARN Issue Type: Task Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: documentation Attachments: TimelineServer.html, YARN-1452.1.patch, YARN-1452.2.patch We need to write a bunch of documents to guide users. such as command line tools, configurations and REST APIs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947450#comment-13947450 ] Hadoop QA commented on YARN-1867: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636787/YARN-1867-20140325-trunk.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3458//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3458//console This message is automatically generated. NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Vinod Kumar Vavilapalli Priority: Blocker Labels: rest_api Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947466#comment-13947466 ] Hadoop QA commented on YARN-1521: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636836/YARN-1521.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3459//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3459//console This message is automatically generated. Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, YARN-1521.3.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1452) Document the usage of the generic application history and the timeline data service
[ https://issues.apache.org/jira/browse/YARN-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947480#comment-13947480 ] Hadoop QA commented on YARN-1452: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636842/YARN-1452.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3460//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3460//console This message is automatically generated. Document the usage of the generic application history and the timeline data service --- Key: YARN-1452 URL: https://issues.apache.org/jira/browse/YARN-1452 Project: Hadoop YARN Issue Type: Task Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: documentation Attachments: TimelineServer.html, YARN-1452.1.patch, YARN-1452.2.patch We need to write a bunch of documents to guide users. such as command line tools, configurations and REST APIs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1452) Document the usage of the generic application history and the timeline data service
[ https://issues.apache.org/jira/browse/YARN-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947491#comment-13947491 ] Vinod Kumar Vavilapalli commented on YARN-1452: --- BTW, we will need another doc describing all the REST APIs in the timeline-service. Let's fine another ticket. Document the usage of the generic application history and the timeline data service --- Key: YARN-1452 URL: https://issues.apache.org/jira/browse/YARN-1452 Project: Hadoop YARN Issue Type: Task Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: documentation Fix For: 2.4.0 Attachments: TimelineServer.html, YARN-1452.1.patch, YARN-1452.2.patch We need to write a bunch of documents to guide users. such as command line tools, configurations and REST APIs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947492#comment-13947492 ] Vinod Kumar Vavilapalli commented on YARN-1867: --- Tx Karthik. Let's keep them in, they don't seem risky. Checking this in for now to unblock the release.. NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Vinod Kumar Vavilapalli Priority: Blocker Labels: rest_api Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1521: -- Priority: Blocker (was: Major) Target Version/s: 2.4.0 This is a blocker for 2.4. Without that RM failover will fail in very bad ways. Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, YARN-1521.3.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1452) Document the usage of the generic application history and the timeline data service
[ https://issues.apache.org/jira/browse/YARN-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947505#comment-13947505 ] Hudson commented on YARN-1452: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5402 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5402/]) YARN-1452. Added documentation about the configuration and usage of generic application history and the timeline data service. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581656) * /hadoop/common/trunk/hadoop-project/src/site/site.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm Document the usage of the generic application history and the timeline data service --- Key: YARN-1452 URL: https://issues.apache.org/jira/browse/YARN-1452 Project: Hadoop YARN Issue Type: Task Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: documentation Fix For: 2.4.0 Attachments: TimelineServer.html, YARN-1452.1.patch, YARN-1452.2.patch We need to write a bunch of documents to guide users. such as command line tools, configurations and REST APIs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API
[ https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947504#comment-13947504 ] Hudson commented on YARN-1867: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5402 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5402/]) YARN-1867. Fixed a bug in ResourceManager that was causing invalid ACL checks in the web-services after fail-over. Contributed by Vinod Kumar Vavilapalli. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581662) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java NPE while fetching apps via the REST API Key: YARN-1867 URL: https://issues.apache.org/jira/browse/YARN-1867 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Vinod Kumar Vavilapalli Priority: Blocker Labels: rest_api Fix For: 2.4.0 Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1854) Race condition in TestRMHA#testStartAndTransitions
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1854: - Description: There is race in test. TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately after application is submitted, but QueueMetrics are updated after app attempt is sheduled. Calling verifyClusterMetrics() without verifying app attempt is in Scheduled state cause random test failures. MockRM.submitApp() return when application is in ACCEPTED, but QueueMetrics updated at APP_ATTEMPT_ADDED event. There is high chance of getting queue metrics before app attempt is Scheduled. {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} was: {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} Summary: Race condition in TestRMHA#testStartAndTransitions (was: TestRMHA#testStartAndTransitions Fails) I updated issue description as per fix. Race condition in TestRMHA#testStartAndTransitions -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch There is race in test. TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately after application is submitted, but QueueMetrics are updated after app attempt is sheduled. Calling verifyClusterMetrics() without verifying app attempt is in Scheduled state cause random test failures. MockRM.submitApp() return when application is in ACCEPTED, but QueueMetrics updated at APP_ATTEMPT_ADDED event. There is high chance of getting queue metrics before app attempt is Scheduled. {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1876) Document the REST APIs of timeline and generic history services
Zhijie Shen created YARN-1876: - Summary: Document the REST APIs of timeline and generic history services Key: YARN-1876 URL: https://issues.apache.org/jira/browse/YARN-1876 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1876) Document the REST APIs of timeline and generic history services
[ https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1876: -- Labels: documentaion (was: ) Document the REST APIs of timeline and generic history services --- Key: YARN-1876 URL: https://issues.apache.org/jira/browse/YARN-1876 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: documentaion -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1854) Race condition in TestRMHA#testStartAndTransitions
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947510#comment-13947510 ] Rohith commented on YARN-1854: -- bq. Rohith : The logs that I have submitted already has the 5secs timeout change. I am pretty doubt on this change. The reason for doubt is test failure is not matching new code change. Anyhow need to find real issue for test failure. Issue relays on answer to the question Why NODE_ADDED event triggered before NODE_REMOVED event on RMNodeImpl#ReconnectNodeTransition? Race condition in TestRMHA#testStartAndTransitions -- Key: YARN-1854 URL: https://issues.apache.org/jira/browse/YARN-1854 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Rohith Priority: Blocker Fix For: 2.4.0 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch There is race in test. TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately after application is submitted, but QueueMetrics are updated after app attempt is sheduled. Calling verifyClusterMetrics() without verifying app attempt is in Scheduled state cause random test failures. MockRM.submitApp() return when application is in ACCEPTED, but QueueMetrics updated at APP_ATTEMPT_ADDED event. There is high chance of getting queue metrics before app attempt is Scheduled. {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:2048 but was:4096 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396 Incorrect value for metric availableMB expected:2048 but was:4096 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947511#comment-13947511 ] Jian He commented on YARN-1521: --- LGTM + 1. Regarding ApplicationMasterProtocol, since applications today are anyways killed after RM restarts, no point adding it now. We can add those in YARN-556. Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, YARN-1521.3.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1877) ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root node auth
Karthik Kambatla created YARN-1877: -- Summary: ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root node auth Key: YARN-1877 URL: https://issues.apache.org/jira/browse/YARN-1877 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947512#comment-13947512 ] Jian He commented on YARN-1521: --- checking this in. Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, YARN-1521.3.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1878) Yarn standby RM taking long to transition to active
Arpit Gupta created YARN-1878: - Summary: Yarn standby RM taking long to transition to active Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947513#comment-13947513 ] Arpit Gupta commented on YARN-1878: --- In comparison HDFS Namenode usually can transition much faster than 10s. Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947515#comment-13947515 ] Fengdong Yu commented on YARN-1870: --- [~vinodkv], can you add me as YARN contributor? I am HDFS contributor now, so I don't think we also need send mail to the secretary. FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1876) Document the REST APIs of timeline and generic history services
[ https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1876: -- Issue Type: Improvement (was: Bug) Document the REST APIs of timeline and generic history services --- Key: YARN-1876 URL: https://issues.apache.org/jira/browse/YARN-1876 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: documentaion -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947525#comment-13947525 ] Hudson commented on YARN-1521: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5404 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5404/]) YARN-1521. Mark Idempotent/AtMostOnce annotations to the APIs in ApplicationClientProtcol, ResourceManagerAdministrationProtocol and ResourceTrackerProtocol so that they work in HA scenario. Contributed by Xuan Gong (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581678) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation - Key: YARN-1521 URL: https://issues.apache.org/jira/browse/YARN-1521 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.4.0 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, YARN-1521.3.patch After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947528#comment-13947528 ] Xuan Gong commented on YARN-1878: - In ActiveStandbyElector#process, it waits zkSessionTimeout (The default value is 10s) to process the event. I think this is the reason why it takes so long for standby RM transits to active. To address this issue, we can simply change the default value to 5s, which is used in HDFS. Also, we can adjust this config by setting yarn.resourcemanager.zk-timeout-ms in YARN-SITE.XML Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1878: Attachment: YARN-1878.1.patch change the default value from 10s to 5s. Trivial patch without test case added Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947543#comment-13947543 ] Hadoop QA commented on YARN-1878: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12636856/YARN-1878.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3461//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3461//console This message is automatically generated. Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947550#comment-13947550 ] Jian He commented on YARN-1878: --- Xuan, did you verify in real cluster that this transition does get faster after making this change? Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947558#comment-13947558 ] Tsuyoshi OZAWA commented on YARN-1878: -- Xuan, I'd like to clarify one point: do you intend to speedup ProtocolHATestBase? Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1876) Document the REST APIs of timeline and generic history services
[ https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1876: -- Attachment: YARN-1876.1.patch Upload a half done patch, which has the documentation of the generic history services APIs Document the REST APIs of timeline and generic history services --- Key: YARN-1876 URL: https://issues.apache.org/jira/browse/YARN-1876 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: documentaion Attachments: YARN-1876.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947568#comment-13947568 ] Xuan Gong commented on YARN-1878: - bq. Xuan, did you verify in real cluster that this transition does get faster after making this change? Yes, verified Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol and add tests for it
Jian He created YARN-1879: - Summary: Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol and add tests for it Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947569#comment-13947569 ] Xuan Gong commented on YARN-1878: - bq. Xuan, I'd like to clarify one point: do you intend to speedup ProtocolHATestBase? Sorry, I do not get this. This is not for the testcase. It happens in real cluster Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active
[ https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947575#comment-13947575 ] Fengdong Yu commented on YARN-1878: --- +1 for the patch. HDFS failover is also 5s by default. Yarn standby RM taking long to transition to active --- Key: YARN-1878 URL: https://issues.apache.org/jira/browse/YARN-1878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Xuan Gong Attachments: YARN-1878.1.patch In our HA tests we are noticing that some times it can take upto 10s for the standby RM to transition to active. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1809: -- Attachment: YARN-1809.7.patch Rebase the patch after YARN-1521. And according to Mayank's comments, move DT related methods to ApplicationBaseProtocol as well. Since the history service hasn't implemented DT related methods, it may be safe to mark them as \@Idempotent as well (no effect for repeating). Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1879: -- Summary: Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol (was: Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol and add tests for it) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol --- Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He -- This message was sent by Atlassian JIRA (v6.2#6252)