date:20140325


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946246#comment-13946246
 ] 

Zhijie Shen commented on YARN-1521:
---

Browsed through the newest, and have the following comments:

1. If the application is already in the RM cache, we shouldn't log success. 
Otherwise, there may be multiple logs for one submission.
{code}
+  RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
+  ClientRMService, applicationId);
{code}

2. Should it sleep a while before retry?
{code}
+  try {
+client.getApplications();
+return;
+  } catch (Exception e) {
+LOG.error(e.getMessage());
+  } finally {
+client.stop();
+  }
{code}

3. Again, sleep 1ms before next try, yielding to the thread of API methods' 
invoking. And, have max retry when exception?
{code}
while (keepRunning) {
+  if (cluster.getStartFailoverFlag()) {
+try {
+  explicitFailover();
+  keepRunning = false;
+} catch (Exception e) {
+  // Do Nothing
+} finally {
+  keepRunning = false;
+}
+  }
+}
{code}

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id


[ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946397#comment-13946397
 ] 

Hudson commented on YARN-1838:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/520/])
YARN-1838. Enhanced timeline service getEntities API to get entities from a 
given entity ID or insertion timestamp. Contributed by Billie Rinaldi. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580960)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineReader.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestMemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, 
 YARN-1838.4.patch, YARN-1838.5.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length


[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946404#comment-13946404
 ] 

Hudson commented on YARN-1670:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/520/])
YARN-1670. aggregated log writer can write more log data then it says is the 
log length (Mit Desai via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580957)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 3.0.0, 0.23.11, 2.4.0, 2.5.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, 
 YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs


[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946414#comment-13946414
 ] 

Hudson commented on YARN-1852:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/520/])
YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events 
to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1850) Make enabling timeline service configurable


[ 
https://issues.apache.org/jira/browse/YARN-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946412#comment-13946412
 ] 

Hudson commented on YARN-1850:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/520/])
YARN-1850. Introduced the ability to optionally disable sending out 
timeline-events in the TimelineClient. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581189)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Make enabling timeline service configurable 
 

 Key: YARN-1850
 URL: https://issues.apache.org/jira/browse/YARN-1850
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1850.1.patch


 Like generic history service, we'd better to make enabling timeline service 
 configurable, in case the timeline server is not up



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1871) We should eliminate writing *PBImpl code in YARN

2014-03-25 Thread Wangda Tan (JIRA)

Wangda Tan created YARN-1871:


 Summary: We should eliminate writing *PBImpl code in YARN
 Key: YARN-1871
 URL: https://issues.apache.org/jira/browse/YARN-1871
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan



Currently, We need write PBImpl classes one by one. After running find . -name 
*PBImpl*.java | xargs wc -l under hadoop source code directory, we can see, 
there're more than 25,000 LOC. I think we should improve this, which will be 
very helpful for YARN developers to make changes for YARN protocols.


There're only some limited patterns in current *PBImpl,
* Simple types, like string, int32, float.
* List? types
* Map? types
* Enum types
Code generation should be enough to generate such PBImpl classes.

Some other requirements are,
* Leave other related code alone, like service implemention (e.g. 
ContainerManagerImpl).
* (If possible) Forward compatibility, developpers can write their own PBImpl 
or genereate them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1871) We should eliminate writing *PBImpl code in YARN

2014-03-25 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946460#comment-13946460
 ] 

Wangda Tan commented on YARN-1871:
--

Some possible methods to eliminate writing PBImpl source code in my head,
1. Using Java annotation processor (RetentionPolicy=SOURCE), an example is 
[google auto|https://github.com/google/auto] project. We can put an annotation 
in record classes, like
{code}
@GeneratePBImpl 
(protoclass=“org.apache.hadoop.yarn.proto.YarnProtos.ApplicationIdProto”)
public abstract class ApplicationId {
   ...
}
{code}
Then we can implement a GeneratePBImpl annotation processor to generate PBImpl 
code when compiling.

2. Using ProtocolBuffer parser directly parsing .proto and generate PBImpl code
We can get message description, fields, types to get fields in .proto file and 
generate code by using PB parser. But unfortunately, PB doesn’t provide a 
java-based parser, we need write a c-based program using such parsers (see 
[issue-263|https://code.google.com/p/protobuf/issues/detail?id=263])

3. Similar to @AtMostOnce annotation, make the ser-de as a runtime behavior.
In this method, we don’t need generate PBImpl source code or classes, we can 
create an RetentionPolicy=RUNTIME annotation processor, mark record classes, 
such as,

{code}
@RecordClass 
(protoclass=“org.apache.hadoop.yarn.proto.YarnProtos.ApplicationIdProto”)
public abstract class ApplicationId {
   ...
}
{code} 
Similar to  annotation, when we need serialize/deserialize this class, we will 
check if is it a “record class” or not in runtime. If yes, we can simply use 
its getters/setters and PB generated class (*Proto) doing 
serialization/deserialization.

Any other thoughts on this? Hope to get your ideas.

 We should eliminate writing *PBImpl code in YARN
 

 Key: YARN-1871
 URL: https://issues.apache.org/jira/browse/YARN-1871
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan

 Currently, We need write PBImpl classes one by one. After running find . 
 -name *PBImpl*.java | xargs wc -l under hadoop source code directory, we 
 can see, there're more than 25,000 LOC. I think we should improve this, which 
 will be very helpful for YARN developers to make changes for YARN protocols.
 There're only some limited patterns in current *PBImpl,
 * Simple types, like string, int32, float.
 * List? types
 * Map? types
 * Enum types
 Code generation should be enough to generate such PBImpl classes.
 Some other requirements are,
 * Leave other related code alone, like service implemention (e.g. 
 ContainerManagerImpl).
 * (If possible) Forward compatibility, developpers can write their own PBImpl 
 or genereate them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-25 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946487#comment-13946487
 ] 

Rohith commented on YARN-1854:
--

[~mitdesai], I checked attached logs for while. It is very strange behaviour in 
attached logs :-(

Observe that No removeNode event during processing RECONNECTED event.!!! 

Does NODE_ADDED event is coming first then NODE_REMOVED?? Probably may be.. But 
can you  run with latest trunk code since added retry for 5 sec..


 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1850) Make enabling timeline service configurable


[ 
https://issues.apache.org/jira/browse/YARN-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946544#comment-13946544
 ] 

Hudson commented on YARN-1850:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/])
YARN-1850. Introduced the ability to optionally disable sending out 
timeline-events in the TimelineClient. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581189)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Make enabling timeline service configurable 
 

 Key: YARN-1850
 URL: https://issues.apache.org/jira/browse/YARN-1850
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1850.1.patch


 Like generic history service, we'd better to make enabling timeline service 
 configurable, in case the timeline server is not up



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs


[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946546#comment-13946546
 ] 

Hudson commented on YARN-1852:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/])
YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events 
to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length


[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946536#comment-13946536
 ] 

Hudson commented on YARN-1670:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/])
YARN-1670. aggregated log writer can write more log data then it says is the 
log length (Mit Desai via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580957)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 3.0.0, 0.23.11, 2.4.0, 2.5.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, 
 YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1850) Make enabling timeline service configurable


[ 
https://issues.apache.org/jira/browse/YARN-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946572#comment-13946572
 ] 

Hudson commented on YARN-1850:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/])
YARN-1850. Introduced the ability to optionally disable sending out 
timeline-events in the TimelineClient. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581189)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Make enabling timeline service configurable 
 

 Key: YARN-1850
 URL: https://issues.apache.org/jira/browse/YARN-1850
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1850.1.patch


 Like generic history service, we'd better to make enabling timeline service 
 configurable, in case the timeline server is not up



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs


[ 
https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946574#comment-13946574
 ] 

Hudson commented on YARN-1852:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/])
YARN-1852. Fixed RMAppAttempt to not resend AttemptFailed/AttemptKilled events 
to already recovered Failed/Killed RMApps. Contributed by Rohith Sharmaks 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580997)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Application recovery throws InvalidStateTransitonException for FAILED and 
 KILLED jobs
 -

 Key: YARN-1852
 URL: https://issues.apache.org/jira/browse/YARN-1852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1852.2.patch, YARN-1852.3.patch, YARN-1852.patch


 Recovering for failed/killed application throw InvalidStateTransitonException.
 These are logged during recovery of applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length


[ 
https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946564#comment-13946564
 ] 

Hudson commented on YARN-1670:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/])
YARN-1670. aggregated log writer can write more log data then it says is the 
log length (Mit Desai via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580957)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


 aggregated log writer can write more log data then it says is the log length
 

 Key: YARN-1670
 URL: https://issues.apache.org/jira/browse/YARN-1670
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Mit Desai
Priority: Critical
 Fix For: 3.0.0, 0.23.11, 2.4.0, 2.5.0

 Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, 
 YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, 
 YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, 
 YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch


 We have seen exceptions when using 'yarn logs' to read log files. 
 at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130)
at 
 org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246)
 We traced it down to the reader trying to read the file type of the next file 
 but where it reads is still log data from the previous file.  What happened 
 was the Log Length was written as a certain size but the log data was 
 actually longer then that.  
 Inside of the write() routine in LogValue it first writes what the logfile 
 length is, but then when it goes to write the log itself it just goes to the 
 end of the file.  There is a race condition here where if someone is still 
 writing to the file when it goes to be aggregated the length written could be 
 to small.
 We should have the write() routine stop when it writes whatever it said was 
 the length.  It would be nice if we could somehow tell the user it might be 
 truncated but I'm not sure of a good way to do this.
 We also noticed that a bug in readAContainerLogsForALogType where it is using 
 an int for curRead whereas it should be using a long. 
   while (len != -1  curRead  fileLength) {
 This isn't actually a problem right now as it looks like the underlying 
 decoder is doing the right thing and the len condition exits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id


[ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946529#comment-13946529
 ] 

Hudson commented on YARN-1838:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1737 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1737/])
YARN-1838. Enhanced timeline service getEntities API to get entities from a 
given entity ID or insertion timestamp. Contributed by Billie Rinaldi. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580960)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineReader.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestMemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, 
 YARN-1838.4.patch, YARN-1838.5.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1838) Timeline service getEntities API should provide ability to get entities from given id


[ 
https://issues.apache.org/jira/browse/YARN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946557#comment-13946557
 ] 

Hudson commented on YARN-1838:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1712 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1712/])
YARN-1838. Enhanced timeline service getEntities API to get entities from a 
given entity ID or insertion timestamp. Contributed by Billie Rinaldi. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1580960)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineReader.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestLeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TestMemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline/TimelineStoreTestUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestTimelineWebServices.java


 Timeline service getEntities API should provide ability to get entities from 
 given id
 -

 Key: YARN-1838
 URL: https://issues.apache.org/jira/browse/YARN-1838
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Srimanth Gunturi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1838.1.patch, YARN-1838.2.patch, YARN-1838.3.patch, 
 YARN-1838.4.patch, YARN-1838.5.patch


 To support pagination, we need ability to get entities from a certain ID by 
 providing a new param called {{fromid}}.
 For example on a page of 10 jobs, our first call will be like
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfolimit=11]
 When user hits next, we would like to call
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID11limit=11]
 and continue on for further _Next_ clicks
 On hitting back, we will make similar calls for previous items
 [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfofromid=JID1limit=11]
 {{fromid}} should be inclusive of the id given.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1612) Change Fair Scheduler to not disable delay scheduling by default

2014-03-25 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946646#comment-13946646
 ] 

Chen He commented on YARN-1612:
---

Hi [~sandyr]
Would you mind clarify this JIRA a little bit for me? I took a look of the 
FairScheduler source code. Here are my 2 questions:

1) Since the delay interval will benefits the data locality but also affects 
map tasks assignment if it is too long, to enable delay scheduling by default, 
we need a relative reasonable delay interval;
2) This relative reasonable delay interval can change depending on the size of 
cluster; 



 Change Fair Scheduler to not disable delay scheduling by default
 

 Key: YARN-1612
 URL: https://issues.apache.org/jira/browse/YARN-1612
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Chen He





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1872) TestDistributedShell occasionally fails in trunk

2014-03-25 Thread Ted Yu (JIRA)

Ted Yu created YARN-1872:


 Summary: TestDistributedShell occasionally fails in trunk
 Key: YARN-1872
 URL: https://issues.apache.org/jira/browse/YARN-1872
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu


From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :

TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1872) TestDistributedShell occasionally fails in trunk

2014-03-25 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-1872:
-

Attachment: TestDistributedShell.out

Output from console.

 TestDistributedShell occasionally fails in trunk
 

 Key: YARN-1872
 URL: https://issues.apache.org/jira/browse/YARN-1872
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
 Attachments: TestDistributedShell.out


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
 TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
 TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-25 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946731#comment-13946731
]

Mayank Bansal commented on YARN-1809:
-

Thanks [~zjshen] for the patch

bq. Yes, I think it should, but I prefer to put it in ApplicationBaseProtocol
when ApplicationHistoryClientService has implemented DT related methods.
history protocol already have these methods so don't need to wait , as they
have dummy iplementation for that.

bq. ApplicationBaseProtocol and ApplicationContext are completely different
things. ApplicationBaseProtocol is the PRC interface. Previously, I thought we
should have a uniformed ApplicationContext: on the RM side, it wraps RMContext;
while on the AHS side, it wraps ApplicationHistory. However, inspired by
RMWebServices#getApps, I think the RPC interface is a better place to uniform
the way of retrieving app info, so I created ApplicationBaseProtocol. And
ApplicationContext is no longer useful.
ApplicationBaseProtocol would be the base protocol of Client and history
however application context is something different. The motivation for context
is to wrap RM and AHS application data, SO I think having context make sense as
protocol has totally different motivation and methods as well when we add the
delegation methods to it.

bq. I understand the big patch is desperate for review, but I've to do that
because the patch is aiming to refactor the code to avoid duplicate web-UI code
for RM and for AHS. The two webUI should share the common code path, and then
display similarly.
I am fine with this if this is something you want to do.

{code}
p
+ * The protocol between clients and the codeResourceManager/code or
+ * codeApplicationHistoryServer/code to get information on applications,
+ * application attempts and containers.
+ * /p

This should be= it is a base protocol for application client and history.

Shouldn't we add @Idempotent to getallapplications as well?

If we add appliction context back then we need to rebase the patch according to
that.

Synchronize RM and Generic History Service Web-UIs
--

Key: YARN-1809
URL: https://issues.apache.org/jira/browse/YARN-1809
Project: Hadoop YARN
Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch,
YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch

After YARN-953, the web-UI of generic history service is provide more
information than that of RM, the details about app attempt and container.
It's good to provide similar web-UIs, but retrieve the data from separate
source, i.e., RM cache and history store respectively.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails

2014-03-25 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946896#comment-13946896
 ] 

Ted Yu commented on YARN-1873:
--

Dup of YARN-1872 ?

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946939#comment-13946939
 ] 

Mit Desai commented on YARN-1873:
-

I see it is a Timeout issue in YARN-1872. The error reported here is an 
assertion failure. I think this is a different issue

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails


 [ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1873:


Affects Version/s: 2.4.0

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946971#comment-13946971
 ] 

Mit Desai commented on YARN-1873:
-

Yes it fails on branch-2.4. I updated the JIRA to reflect that.
I am using JDK7. Have not tried with JDK6. But this is clearly a cleanup issue 
so I assumed it is JDK7 issue. If I ran the test testDSShell independently, it 
never fails.

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1866) YARN RM fails to load state store with delegation token parsing error


[ 
https://issues.apache.org/jira/browse/YARN-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946994#comment-13946994
 ] 

Hudson commented on YARN-1866:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5400 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5400/])
YARN-1866. Fixed an issue with renewal of RM-delegation tokens on restart or 
fail-over. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581448)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewerLifecycle.java


 YARN RM fails to load state store with delegation token parsing error
 -

 Key: YARN-1866
 URL: https://issues.apache.org/jira/browse/YARN-1866
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1866.1.patch, YARN-1866.2.patch


 In our secure Nightlies we saw exceptions in the RM log where it failed to 
 parse the deletegation token.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


 [ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1521:


Attachment: YARN-1521.2.patch

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947056#comment-13947056
 ] 

Zhijie Shen commented on YARN-1873:
---

I'll work on a patch for it. Take it over. Thanks for finding the issue. 
[~mdesai]!

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947054#comment-13947054
 ] 

Zhijie Shen commented on YARN-1873:
---

Hm, i can image the problem. MiniYARNCluster is static, such that the timeline 
store is going to have the entities from all test cases in 
TestDistributedShell. I cannot reproduce it locally, because testDSShell is 
always executed first.

{code}
  protected static MiniYARNCluster yarnCluster = null;
{code}


 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1873) TestDistributedShell#testDSShell fails


 [ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-1873:
-

Assignee: Zhijie Shen  (was: Mit Desai)

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Zhijie Shen
  Labels: java7

 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947059#comment-13947059
 ] 

Xuan Gong commented on YARN-1521:
-

bq. 1. If the application is already in the RM cache, we shouldn't log success. 
Otherwise, there may be multiple logs for one submission.

Removed

bq. Should it sleep a while before retry?

DONE

bq. Again, sleep 1ms before next try, yielding to the thread of API methods' 
invoking. And, have max retry when exception?

DONE. Did not add max retry. The failover thread will be killed after every 
testcases

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails


 [ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1873:


Attachment: YARN-1873.patch

Attaching the patch

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Zhijie Shen
 Attachments: YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947066#comment-13947066
 ] 

Mit Desai commented on YARN-1873:
-

[~zjshen], Didn't see your comment. Attached the patch already

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Zhijie Shen
 Attachments: YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1867) NPE while fetching apps via the REST API

[
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated YARN-1867:
--

Attachment: YARN-1867-20140325.txt

The problem is that the web-services cache the acls-managers from the
'previous' RM. The acls-manager are recreated when a transition happens.

Here's a patch to fix the issue
- Changed web-services to not cache the application and queue acls-managers. I
checked other instances in the web-app. These seem like the only two cached
objects.
- The code in the main ResourceManager has become unmaintenable after the
introduction of the active-services. I had to resist cleaning up, quite a few
things are broken in more ways than one. For now, moved a couple of things from
the top level to be nested inside active-services. Will file a ticket for more
cleanup.
- Fixed few existing formatting issues
- The test case fails without the code change with the same exception printed
above and passes with.

NPE while fetching apps via the REST API

Key: YARN-1867
URL: https://issues.apache.org/jira/browse/YARN-1867
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
Labels: rest_api
Attachments: YARN-1867-20140325.txt

We ran into the following NPE when fetching applications using the REST API:
{noformat}
INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
at
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
at
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
{noformat}

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API


[ 
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947079#comment-13947079
 ] 

Karthik Kambatla commented on YARN-1867:


Thanks Vinod. Looking at the patch.

 NPE while fetching apps via the REST API
 

 Key: YARN-1867
 URL: https://issues.apache.org/jira/browse/YARN-1867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: rest_api
 Attachments: YARN-1867-20140325.txt


 We ran into the following NPE when fetching applications using the REST API:
 {noformat}
 INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API


[ 
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947087#comment-13947087
 ] 

Hadoop QA commented on YARN-1867:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12636769/YARN-1867-20140325.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3454//console

This message is automatically generated.

 NPE while fetching apps via the REST API
 

 Key: YARN-1867
 URL: https://issues.apache.org/jira/browse/YARN-1867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: rest_api
 Attachments: YARN-1867-20140325.txt


 We ran into the following NPE when fetching applications using the REST API:
 {noformat}
 INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

Karthik Kambatla created YARN-1874:
--

 Summary: Cleanup: Move RMActiveServices out of ResourceManager 
into its own file
 Key: YARN-1874
 URL: https://issues.apache.org/jira/browse/YARN-1874
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Karthik Kambatla


As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
should move RMActiveServices out to make it more manageable. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API


[ 
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947099#comment-13947099
 ] 

Karthik Kambatla commented on YARN-1867:


Good catch. The fix looks good to me. 

bq. The code in the main ResourceManager has become unmaintenable after the 
introduction of the active-services.
Agree. I have been thinking of it too. I think we should just move 
RMActiveServices into its own file - that would force us to clean up the 
unwieldy mess it has become. Filed YARN-1874 to do the same. Feel free to pick 
it up. I can take a stab, may be in a week or two.

That said, it would be nice to address all the cleanup changes there, 
particularly if they are not related to the bug we are fixing here. 
{code}
+private DelegationTokenRenewer delegationTokenRenewer;
+private EventHandlerSchedulerEvent schedulerDispatcher;
+private ApplicationMasterLauncher applicationMasterLauncher;
+private ContainerAllocationExpirer containerAllocationExpirer;
+
+private boolean recoveryEnabled;
{code}

Also, we should probably limit the formatting changes to the files that have 
non-formatting changes. May be leave out RMContextImpl?

 NPE while fetching apps via the REST API
 

 Key: YARN-1867
 URL: https://issues.apache.org/jira/browse/YARN-1867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: rest_api
 Attachments: YARN-1867-20140325.txt


 We ran into the following NPE when fetching applications using the REST API:
 {noformat}
 INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947114#comment-13947114
 ] 

Mit Desai commented on YARN-1854:
-

[~rohithsharma] : The logs that I have submitted already has the 5secs timeout 
change.
I am creating another jira for the issue. Can you please update the description 
of this jira so that it describes what it alctually fixes?

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947119#comment-13947119
 ] 

Mit Desai commented on YARN-1854:
-

Created YARN-1875 to track the issue

 TestRMHA#testStartAndTransitions Fails
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1875) TestRMHA#testStartAndTransitions is failing


 [ 
https://issues.apache.org/jira/browse/YARN-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1875:


Attachment: Log.rtf

Attaching the logs for the failure

 TestRMHA#testStartAndTransitions is failing
 ---

 Key: YARN-1875
 URL: https://issues.apache.org/jira/browse/YARN-1875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
 Attachments: Log.rtf


 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1875) TestRMHA#testStartAndTransitions is failing

Mit Desai created YARN-1875:
---

 Summary: TestRMHA#testStartAndTransitions is failing
 Key: YARN-1875
 URL: https://issues.apache.org/jira/browse/YARN-1875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mit Desai
 Attachments: Log.rtf

{noformat}
testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) 
 Time elapsed: 5.883 sec   FAILURE!
java.lang.AssertionError: Incorrect value for metric availableMB 
expected:2048 but was:4096
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)


Results :

Failed tests: 
  
TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
 Incorrect value for metric availableMB expected:2048 but was:4096
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947121#comment-13947121
 ] 

Hadoop QA commented on YARN-1873:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636768/YARN-1873.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3455//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3455//console

This message is automatically generated.

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947125#comment-13947125
 ] 

Hadoop QA commented on YARN-1521:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636766/YARN-1521.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3453//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3453//console

This message is automatically generated.

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947139#comment-13947139
 ] 

Xuan Gong commented on YARN-1521:
-

[~kkambatl] Could you take a look ?

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947170#comment-13947170
 ] 

Zhijie Shen commented on YARN-1873:
---

[~mitdesai], no problem. I've reviewed your patch. The approach should work. 
One comment on the patch:
 These vars don't need to be static any more.
{code}
  protected static MiniYARNCluster yarnCluster = null;
  protected static Configuration conf = new YarnConfiguration();

  protected static String APPMASTER_JAR = 
JarFinder.getJar(ApplicationMaster.class);
{code}

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947176#comment-13947176
 ] 

Mit Desai commented on YARN-1873:
-

I'll change them and upload a new patch. Thanks for the review!

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1873) TestDistributedShell#testDSShell fails


 [ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1873:


Attachment: YARN-1873.patch

Attached the new patch

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch, YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1867) NPE while fetching apps via the REST API


 [ 
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1867:
--

Attachment: YARN-1867-20140325-trunk.txt

Patch for trunk.

 NPE while fetching apps via the REST API
 

 Key: YARN-1867
 URL: https://issues.apache.org/jira/browse/YARN-1867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: rest_api
 Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt


 We ran into the following NPE when fetching applications using the REST API:
 {noformat}
 INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API


[ 
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947199#comment-13947199
 ] 

Karthik Kambatla commented on YARN-1867:


bq. A little bit of cleanup per patch should be okay. I'd like to keep them 
here unless you feel strongly that it is prohibiting the review.
I am not very particular about it - the patch is fairly small and those changes 
don't get in the way of review. I am okay with leaving them in, if you insist. 
My only concern is the git history wouldn't tell us why we made these changes 
if we hide them behind a bug fix. 

 NPE while fetching apps via the REST API
 

 Key: YARN-1867
 URL: https://issues.apache.org/jira/browse/YARN-1867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: rest_api
 Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt


 We ran into the following NPE when fetching applications using the REST API:
 {noformat}
 INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947224#comment-13947224
 ] 

Zhijie Shen commented on YARN-1873:
---

The patch looks almost good to me. One nit:

I checked and found APPMASTER_JAR is never changed across the code. I thought 
it's better to be a final static constant. Sorry for my prior misleading.

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch, YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947244#comment-13947244
 ] 

Mit Desai commented on YARN-1873:
-

I looked at it. Can you let me know if we need to make it
{{protected final String APPMASTER_JAR = 
JarFinder.getJar(ApplicationMaster.class);}}
or
{{protected final static String APPMASTER_JAR = 
JarFinder.getJar(ApplicationMaster.class);}}

I think making it final instead of final static will be enough. What do you say?

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch, YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947295#comment-13947295
 ] 

Jian He commented on YARN-1521:
---

- we should at least log this event saying this is an earlier submitted 
application, which is good for debugging.
- TestProtocolHA - TestProtocolHABase.java
- I ran TestApplicationClientProtocolOnHA locally without core code changes, 
the whole test eventually takes 15 mins to crash.
I observed that the test keeps doing failover even if the test is done. can you 
investigate ?
{code}
14/03/25 15:02:08 WARN resourcemanager.RMAuditLogger: USER=jhe  
OPERATION=transitionToStandby TARGET=RMHAProtocolService  RESULT=FAILURE  
DESCRIPTION=Exception transitioning to standby  PERMISSIONS=All users are 
allowed
14/03/25 15:02:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm1
14/03/25 15:02:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
14/03/25 15:02:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm1
14/03/25 15:02:23 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
14/03/25 15:02:35 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm1
14/03/25 15:02:57 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
14/03/25 15:03:30 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm1
{code}

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947302#comment-13947302
 ] 

Vinod Kumar Vavilapalli commented on YARN-1521:
---

bq. - TestProtocolHA - TestProtocolHABase.java
Or ProtocolHATestBase.

BTW, why is it ProtocolHA?

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1873) TestDistributedShell#testDSShell fails


[ 
https://issues.apache.org/jira/browse/YARN-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947308#comment-13947308
 ] 

Zhijie Shen commented on YARN-1873:
---

APPMASTER_JAR is a jar will be taken without modification inall the test cases, 
therefore, it should be safe to be static.

 TestDistributedShell#testDSShell fails
 --

 Key: YARN-1873
 URL: https://issues.apache.org/jira/browse/YARN-1873
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1873.patch, YARN-1873.patch


 testDSShell fails when the tests are run in random order. I see a cleanup 
 issue here.
 {noformat}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 44.127 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:6
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at org.junit.Assert.assertEquals(Assert.java:456)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134)
 Results :
 Failed tests: 
   TestDistributedShell.testOrder:134-testDSShell:204 expected:1 but was:6
 {noformat}
 The Line numbers will be little deviated because I was trying to reproduce 
 the error by running the tests in specific order. But the Line that causes 
 the assert fail is {{Assert.assertEquals(1, 
 entitiesAttempts.getEntities().size());}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947311#comment-13947311
 ] 

Jian He commented on YARN-1521:
---

In TestApplicationClientProtocolOnHA , each client API call involves a new 
miniYarnCluster start and shutdown,  how long does it take for the whole test 
to finish ? If that's too long, we can just reuse one miniYarnCluster.

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947324#comment-13947324
 ] 

Zhijie Shen commented on YARN-1521:
---

bq. If that's too long, we can just reuse one miniYarnCluster.

Some input: It may or may not run into the case I saw in YARN-1873. Before 
reusing a single yarn cluster, it's good to make sure the change on 
miniYarnCluster in the current test case will not disturb the following test 
cases.

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947392#comment-13947392
 ] 

Karthik Kambatla commented on YARN-1521:


Sorry for the delay in getting to this. Haven't look at the patch itself yet. 

{quote} 
ApplicationMasterProtocol
1. registerApplicationMaster
2. finishApplicationMaster
3. allocate
{quote}
Don't remember the source corresponding to these methods off the top of my 
head, but would think we *should* mark all of them as well. I am okay with 
doing this in a separate JIRA to unblock this.
# Register resets the sequence number - Idempotent. YARN-556 allows a running 
AM to re-register, albeit on resync.
# Finish should be similar to kill - Idempotent. Should be similar to 
killApplication?
# Allocate - Idempotent or AtMostOnce - as the AM heartbeats periodically. 
Preferably Idempotent. 



 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947399#comment-13947399
 ] 

Karthik Kambatla commented on YARN-1521:


Looked at the non-test source only. Looks good. One nit - it should be okay to 
fix at commit time. 
# Reword to it checks whether the application already exists?
{code}
   * pDuring the submission process, it checks whether the application
   * has already exist. If the application exists, it will simply return
   * SubmitApplicationResponse/p
{code}


 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


 [ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1521:


Attachment: YARN-1521.3.patch

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, 
 YARN-1521.3.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947414#comment-13947414
 ] 

Xuan Gong commented on YARN-1521:
-

bq. In TestApplicationClientProtocolOnHA , each client API call involves a new 
miniYarnCluster start and shutdown, how long does it take for the whole test to 
finish ? If that's too long, we can just reuse one miniYarnCluster.

It takes about 120s to finish all the test cases.

bq. we should at least log this event saying this is an earlier submitted 
application, which is good for debugging.

ADDED

bq. TestProtocolHA - TestProtocolHABase.java

changed





 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, 
 YARN-1521.3.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1452) Document the usage of the generic application history and the timeline data service


 [ 
https://issues.apache.org/jira/browse/YARN-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1452:
--

Attachment: YARN-1452.2.patch

Mostly the same patch as before with grammar fixes.

Will check this in if Jenkins says okay..

 Document the usage of the generic application history and the timeline data 
 service
 ---

 Key: YARN-1452
 URL: https://issues.apache.org/jira/browse/YARN-1452
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentation
 Attachments: TimelineServer.html, YARN-1452.1.patch, YARN-1452.2.patch


 We need to write a bunch of documents to guide users. such as command line 
 tools, configurations and REST APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API


[ 
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947450#comment-13947450
 ] 

Hadoop QA commented on YARN-1867:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12636787/YARN-1867-20140325-trunk.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3458//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3458//console

This message is automatically generated.

 NPE while fetching apps via the REST API
 

 Key: YARN-1867
 URL: https://issues.apache.org/jira/browse/YARN-1867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: rest_api
 Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt


 We ran into the following NPE when fetching applications using the REST API:
 {noformat}
 INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947466#comment-13947466
 ] 

Hadoop QA commented on YARN-1521:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636836/YARN-1521.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3459//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3459//console

This message is automatically generated.

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, 
 YARN-1521.3.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1452) Document the usage of the generic application history and the timeline data service


[ 
https://issues.apache.org/jira/browse/YARN-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947480#comment-13947480
 ] 

Hadoop QA commented on YARN-1452:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636842/YARN-1452.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3460//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3460//console

This message is automatically generated.

 Document the usage of the generic application history and the timeline data 
 service
 ---

 Key: YARN-1452
 URL: https://issues.apache.org/jira/browse/YARN-1452
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentation
 Attachments: TimelineServer.html, YARN-1452.1.patch, YARN-1452.2.patch


 We need to write a bunch of documents to guide users. such as command line 
 tools, configurations and REST APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1452) Document the usage of the generic application history and the timeline data service


[ 
https://issues.apache.org/jira/browse/YARN-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947491#comment-13947491
 ] 

Vinod Kumar Vavilapalli commented on YARN-1452:
---

BTW, we will need another doc describing all the REST APIs in the 
timeline-service. Let's fine another ticket.

 Document the usage of the generic application history and the timeline data 
 service
 ---

 Key: YARN-1452
 URL: https://issues.apache.org/jira/browse/YARN-1452
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentation
 Fix For: 2.4.0

 Attachments: TimelineServer.html, YARN-1452.1.patch, YARN-1452.2.patch


 We need to write a bunch of documents to guide users. such as command line 
 tools, configurations and REST APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API


[ 
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947492#comment-13947492
 ] 

Vinod Kumar Vavilapalli commented on YARN-1867:
---

Tx Karthik. Let's keep them in, they don't seem risky.

Checking this in for now to unblock the release..

 NPE while fetching apps via the REST API
 

 Key: YARN-1867
 URL: https://issues.apache.org/jira/browse/YARN-1867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: rest_api
 Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt


 We ran into the following NPE when fetching applications using the REST API:
 {noformat}
 INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


 [ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1521:
--

Priority: Blocker  (was: Major)
Target Version/s: 2.4.0

This is a blocker for 2.4. Without that RM failover will fail in very bad ways.

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, 
 YARN-1521.3.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1452) Document the usage of the generic application history and the timeline data service


[ 
https://issues.apache.org/jira/browse/YARN-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947505#comment-13947505
 ] 

Hudson commented on YARN-1452:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5402 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5402/])
YARN-1452. Added documentation about the configuration and usage of generic 
application history and the timeline data service. Contributed by Zhijie Shen. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581656)
* /hadoop/common/trunk/hadoop-project/src/site/site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm


 Document the usage of the generic application history and the timeline data 
 service
 ---

 Key: YARN-1452
 URL: https://issues.apache.org/jira/browse/YARN-1452
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentation
 Fix For: 2.4.0

 Attachments: TimelineServer.html, YARN-1452.1.patch, YARN-1452.2.patch


 We need to write a bunch of documents to guide users. such as command line 
 tools, configurations and REST APIs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1867) NPE while fetching apps via the REST API


[ 
https://issues.apache.org/jira/browse/YARN-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947504#comment-13947504
 ] 

Hudson commented on YARN-1867:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5402 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5402/])
YARN-1867. Fixed a bug in ResourceManager that was causing invalid ACL checks 
in the web-services after fail-over. Contributed by Vinod Kumar Vavilapalli. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581662)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


 NPE while fetching apps via the REST API
 

 Key: YARN-1867
 URL: https://issues.apache.org/jira/browse/YARN-1867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
  Labels: rest_api
 Fix For: 2.4.0

 Attachments: YARN-1867-20140325-trunk.txt, YARN-1867-20140325.txt


 We ran into the following NPE when fetching applications using the REST API:
 {noformat}
 INTERNAL_SERVER_ERROR
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1854) Race condition in TestRMHA#testStartAndTransitions

2014-03-25 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1854:
-

Description: 
There is race in test.
TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately after 
application is submitted, but QueueMetrics are updated after app attempt is 
sheduled. Calling verifyClusterMetrics() without verifying app attempt is in 
Scheduled state cause random test failures.
 MockRM.submitApp() return when application is in ACCEPTED, but QueueMetrics 
updated at APP_ATTEMPT_ADDED event. There is high chance of getting queue 
metrics before app attempt is Scheduled.




{noformat}
testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) 
 Time elapsed: 5.883 sec   FAILURE!
java.lang.AssertionError: Incorrect value for metric availableMB 
expected:2048 but was:4096
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)


Results :

Failed tests: 
  
TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
 Incorrect value for metric availableMB expected:2048 but was:4096
{noformat}

  was:
{noformat}
testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) 
 Time elapsed: 5.883 sec   FAILURE!
java.lang.AssertionError: Incorrect value for metric availableMB 
expected:2048 but was:4096
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)


Results :

Failed tests: 
  
TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
 Incorrect value for metric availableMB expected:2048 but was:4096
{noformat}

Summary: Race condition in TestRMHA#testStartAndTransitions  (was: 
TestRMHA#testStartAndTransitions Fails)

I updated issue description as per fix.

 Race condition in TestRMHA#testStartAndTransitions
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch


 There is race in test.
 TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately 
 after application is submitted, but QueueMetrics are updated after app 
 attempt is sheduled. Calling verifyClusterMetrics() without verifying app 
 attempt is in Scheduled state cause random test failures.
  MockRM.submitApp() return when application is in ACCEPTED, but QueueMetrics 
 updated at APP_ATTEMPT_ADDED event. There is high chance of getting queue 
 metrics before app attempt is Scheduled.
 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1876) Document the REST APIs of timeline and generic history services

Zhijie Shen created YARN-1876:
-

 Summary: Document the REST APIs of timeline and generic history 
services
 Key: YARN-1876
 URL: https://issues.apache.org/jira/browse/YARN-1876
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1876) Document the REST APIs of timeline and generic history services


 [ 
https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1876:
--

Labels: documentaion  (was: )

 Document the REST APIs of timeline and generic history services
 ---

 Key: YARN-1876
 URL: https://issues.apache.org/jira/browse/YARN-1876
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentaion





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1854) Race condition in TestRMHA#testStartAndTransitions

2014-03-25 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947510#comment-13947510
 ] 

Rohith commented on YARN-1854:
--

bq. Rohith : The logs that I have submitted already has the 5secs timeout 
change.
I am pretty doubt on this change. The reason for doubt is test failure is not 
matching new code change.

Anyhow need to find real issue for test failure.  Issue relays on answer to the 
question Why NODE_ADDED event triggered before NODE_REMOVED event on 
RMNodeImpl#ReconnectNodeTransition?



 Race condition in TestRMHA#testStartAndTransitions
 --

 Key: YARN-1854
 URL: https://issues.apache.org/jira/browse/YARN-1854
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Mit Desai
Assignee: Rohith
Priority: Blocker
 Fix For: 2.4.0

 Attachments: Log.rtf, YARN-1854.1.patch, YARN-1854.patch


 There is race in test.
 TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately 
 after application is submitted, but QueueMetrics are updated after app 
 attempt is sheduled. Calling verifyClusterMetrics() without verifying app 
 attempt is in Scheduled state cause random test failures.
  MockRM.submitApp() return when application is in ACCEPTED, but QueueMetrics 
 updated at APP_ATTEMPT_ADDED event. There is high chance of getting queue 
 metrics before app attempt is Scheduled.
 {noformat}
 testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA)
   Time elapsed: 5.883 sec   FAILURE!
 java.lang.AssertionError: Incorrect value for metric availableMB 
 expected:2048 but was:4096
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:472)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160)
 Results :
 Failed tests: 
   
 TestRMHA.testStartAndTransitions:160-verifyClusterMetrics:387-assertMetric:396
  Incorrect value for metric availableMB expected:2048 but was:4096
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947511#comment-13947511
 ] 

Jian He commented on YARN-1521:
---

LGTM + 1.
Regarding ApplicationMasterProtocol, since applications today are anyways 
killed after RM restarts, no point adding it now. We can add those in YARN-556.

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, 
 YARN-1521.3.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1877) ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root node auth

Karthik Kambatla created YARN-1877:
--

 Summary: ZK store: Add 
yarn.resourcemanager.zk-state-store.root-node.auth for root node auth
 Key: YARN-1877
 URL: https://issues.apache.org/jira/browse/YARN-1877
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947512#comment-13947512
 ] 

Jian He commented on YARN-1521:
---

checking this in.

 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, 
 YARN-1521.3.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1878) Yarn standby RM taking long to transition to active

2014-03-25 Thread Arpit Gupta (JIRA)

Arpit Gupta created YARN-1878:
-

 Summary: Yarn standby RM taking long to transition to active
 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong


In our HA tests we are noticing that some times it can take upto 10s for the 
standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active

2014-03-25 Thread Arpit Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947513#comment-13947513
 ] 

Arpit Gupta commented on YARN-1878:
---

In comparison HDFS Namenode usually can transition much faster than 10s.

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong

 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()

2014-03-25 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947515#comment-13947515
 ] 

Fengdong Yu commented on YARN-1870:
---

[~vinodkv], can you add me as YARN contributor?  I am HDFS contributor now, so 
I don't think we also need send mail to the secretary.

 FileInputStream is not closed in 
 ProcfsBasedProcessTree#constructProcessSMAPInfo()
 --

 Key: YARN-1870
 URL: https://issues.apache.org/jira/browse/YARN-1870
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: YARN-1870.patch


 {code}
   ListString lines = IOUtils.readLines(new FileInputStream(file));
 {code}
 FileInputStream is not closed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1876) Document the REST APIs of timeline and generic history services


 [ 
https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1876:
--

Issue Type: Improvement  (was: Bug)

 Document the REST APIs of timeline and generic history services
 ---

 Key: YARN-1876
 URL: https://issues.apache.org/jira/browse/YARN-1876
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentaion





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation


[ 
https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947525#comment-13947525
 ] 

Hudson commented on YARN-1521:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5404 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5404/])
YARN-1521. Mark Idempotent/AtMostOnce annotations to the APIs in 
ApplicationClientProtcol, ResourceManagerAdministrationProtocol and 
ResourceTrackerProtocol so that they work in HA scenario. Contributed by Xuan 
Gong (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1581678)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationClientProtocolOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestSubmitApplicationWithRMHA.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Mark appropriate protocol methods with the idempotent annotation or 
 AtMostOnce annotation
 -

 Key: YARN-1521
 URL: https://issues.apache.org/jira/browse/YARN-1521
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.4.0

 Attachments: YARN-1521.0.patch, YARN-1521.1.patch, YARN-1521.2.patch, 
 YARN-1521.3.patch


 After YARN-1028, we add the automatically failover into RMProxy. This JIRA is 
 to identify whether we need to add idempotent annotation and which methods 
 can be marked as idempotent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active


[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947528#comment-13947528
 ] 

Xuan Gong commented on YARN-1878:
-

In ActiveStandbyElector#process, it waits zkSessionTimeout (The default value 
is 10s) to process the event. I think this is the reason why it takes so long 
for standby RM transits to active.
To address this issue, we can simply change the default value to 5s, which is 
used in HDFS. Also, we can adjust this config by setting 
yarn.resourcemanager.zk-timeout-ms in YARN-SITE.XML

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong

 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active


 [ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1878:


Attachment: YARN-1878.1.patch

change the default value from 10s to 5s.
Trivial patch without test case added 

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active


[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947543#comment-13947543
 ] 

Hadoop QA commented on YARN-1878:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636856/YARN-1878.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3461//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3461//console

This message is automatically generated.

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active


[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947550#comment-13947550
 ] 

Jian He commented on YARN-1878:
---

Xuan, did you verify in real cluster that this transition does get faster after 
making this change?

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active

2014-03-25 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947558#comment-13947558
 ] 

Tsuyoshi OZAWA commented on YARN-1878:
--

Xuan, I'd like to clarify one point: do you intend to speedup 
ProtocolHATestBase? 

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1876) Document the REST APIs of timeline and generic history services


 [ 
https://issues.apache.org/jira/browse/YARN-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1876:
--

Attachment: YARN-1876.1.patch

Upload a half done patch, which has the documentation of the generic history 
services APIs

 Document the REST APIs of timeline and generic history services
 ---

 Key: YARN-1876
 URL: https://issues.apache.org/jira/browse/YARN-1876
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: documentaion
 Attachments: YARN-1876.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active


[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947568#comment-13947568
 ] 

Xuan Gong commented on YARN-1878:
-

bq. Xuan, did you verify in real cluster that this transition does get faster 
after making this change?

Yes, verified

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol and add tests for it

Jian He created YARN-1879:
-

 Summary: Mark Idempotent/AtMostOnce annotations to 
ApplicationMasterProtocol and add tests for it
 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active


[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947569#comment-13947569
 ] 

Xuan Gong commented on YARN-1878:
-

bq. Xuan, I'd like to clarify one point: do you intend to speedup 
ProtocolHATestBase?

Sorry, I do not get this. This is not for the testcase. It happens in real 
cluster

 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1878) Yarn standby RM taking long to transition to active

2014-03-25 Thread Fengdong Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947575#comment-13947575
 ] 

Fengdong Yu commented on YARN-1878:
---

+1 for the patch. 

HDFS failover is also 5s by default.


 Yarn standby RM taking long to transition to active
 ---

 Key: YARN-1878
 URL: https://issues.apache.org/jira/browse/YARN-1878
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Xuan Gong
 Attachments: YARN-1878.1.patch


 In our HA tests we are noticing that some times it can take upto 10s for the 
 standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs


 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1809:
--

Attachment: YARN-1809.7.patch

Rebase the patch after YARN-1521. And according to Mayank's comments, move DT 
related methods to ApplicationBaseProtocol as well. Since the history service 
hasn't implemented DT related methods, it may be safe to mark them as 
\@Idempotent as well (no effect for repeating).

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
 YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, 
 YARN-1809.7.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol