date:20131227


 [ 
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1481:
---

Attachment: yarn-1481-addendum.patch

Addendum patch that fixes the synchronization for AdminService#isRMActive.

 Move internal services logic from AdminService to ResourceManager
 -

 Key: YARN-1481
 URL: https://issues.apache.org/jira/browse/YARN-1481
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, 
 yarn-1481-addendum.patch


 This is something I found while reviewing YARN-1318, but didn't halt that 
 patch as many cycles went there already. Some top level issues
  - Not easy to follow RM's service life cycle
 -- RM adds only AdminService as its service directly.
 -- Other services are added to RM when AdminService's init calls 
 RM.activeServices.init()
  - Overall, AdminService shouldn't encompass all of RM's HA state management. 
 It was originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Reopened] (YARN-1481) Move internal services logic from AdminService to ResourceManager


 [ 
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reopened YARN-1481:



Re-opening to fix the synchronization.

 Move internal services logic from AdminService to ResourceManager
 -

 Key: YARN-1481
 URL: https://issues.apache.org/jira/browse/YARN-1481
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, 
 yarn-1481-addendum.patch


 This is something I found while reviewing YARN-1318, but didn't halt that 
 patch as many cycles went there already. Some top level issues
  - Not easy to follow RM's service life cycle
 -- RM adds only AdminService as its service directly.
 -- Other services are added to RM when AdminService's init calls 
 RM.activeServices.init()
  - Overall, AdminService shouldn't encompass all of RM's HA state management. 
 It was originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

[
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857386#comment-13857386
]

Karthik Kambatla commented on YARN-1029:

bq. Please take care of it wherever appropriate.
Re-opened YARN-1481 to take care of it there. If it isn't too much trouble,
please take a look at it.

bq. Again, if we organize the newly added code such that its a common event for
any module to inform the RM about a fatal error then we are good for the
future. Embedded elector can use that event instead of a custom named event.
Oh! I understand it now - will add a RMFatalErrorEvent, the handler for which
just terminates the RM. And, update RMStateStoreOperationFailedEvent to use
that event instead of calling terminate directly.

bq. I am sorry I could not understand your comment explaining how the test
passes with these timeouts.
# ZK timeout comes from RM_ZK_TIMEOUT_MS (2 seconds), the failover could take
as long as this. MiniYARNCluster#getActiveRMIndex() waits for this duration to
find the active RM.
# NM-RM connection is verified after a successful failover. The timeout there
corresponds to the maximum time taken by failovers until the NM connects to an
RM. 5 seconds seems a long enough time for this.

Allow embedding leader election into the RM
---

Key: YARN-1029
URL: https://issues.apache.org/jira/browse/YARN-1029
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch,
yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch,
yarn-1029-4.patch, yarn-1029-approach.patch

It should be possible to embed common ActiveStandyElector into the RM such
that ZooKeeper based leader election and notification is in-built. In
conjunction with a ZK state store, this configuration will be a simple
deployment option.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager

[
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857390#comment-13857390
]

Hadoop QA commented on YARN-1481:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12620598/yarn-1481-addendum.patch
against trunk revision .

{color:red}-1 patch{color}. Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2737//console

This message is automatically generated.

Move internal services logic from AdminService to ResourceManager
-

Key: YARN-1481
URL: https://issues.apache.org/jira/browse/YARN-1481
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Fix For: 2.4.0

Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt,
yarn-1481-addendum.patch

This is something I found while reviewing YARN-1318, but didn't halt that
patch as many cycles went there already. Some top level issues
- Not easy to follow RM's service life cycle
-- RM adds only AdminService as its service directly.
-- Other services are added to RM when AdminService's init calls
RM.activeServices.init()
- Overall, AdminService shouldn't encompass all of RM's HA state management.
It was originally supposed to be the implementation of just the RPC server.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1474) Make schedulers services


[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857394#comment-13857394
 ] 

Hadoop QA commented on YARN-1474:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620597/YARN-1474.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2736//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2736//console

This message is automatically generated.

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-647) historyServer can't show container's log when aggregation is not enabled

2013-12-27 Thread shenhong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857414#comment-13857414
 ] 

shenhong commented on YARN-647:
---

Thanks Zhijie! 
Like caolong, we also set yarn.nodemanager.log.retain-seconds=259200, so NM 
local logs won't be deleted after container  stop, 
I think if yarn.log-aggregation-enable=false and 
yarn.nodemanager.log.retain-seconds0, we can change the logsLink .

 historyServer can't show container's log when aggregation is not enabled
 

 Key: YARN-647
 URL: https://issues.apache.org/jira/browse/YARN-647
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0
 Environment:  yarn.log-aggregation-enable=false , HistoryServer will 
 show like this:
 Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669
Reporter: shenhong
Assignee: shenhong
 Attachments: yarn-647.patch


 When yarn.log-aggregation-enable is seted to false, after a MR_App complete, 
 we can't view the container's log from the HistoryServer, it shows message 
 like:
 Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669
 Since we don't want to aggregate the container's log, because it will be a 
 pressure to namenode. but sometimes we also want to take a look at 
 container's log.
 Should we show the container's log across HistoryServer even if 
 yarn.log-aggregation-enable is seted to false.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-647) historyServer can't show container's log when aggregation is not enabled

2013-12-27 Thread shenhong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenhong updated YARN-647:
--

Attachment: yarn-647-2.patch

add a new patch

 historyServer can't show container's log when aggregation is not enabled
 

 Key: YARN-647
 URL: https://issues.apache.org/jira/browse/YARN-647
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0
 Environment:  yarn.log-aggregation-enable=false , HistoryServer will 
 show like this:
 Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669
Reporter: shenhong
Assignee: shenhong
 Attachments: yarn-647-2.patch, yarn-647.patch


 When yarn.log-aggregation-enable is seted to false, after a MR_App complete, 
 we can't view the container's log from the HistoryServer, it shows message 
 like:
 Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669
 Since we don't want to aggregate the container's log, because it will be a 
 pressure to namenode. but sometimes we also want to take a look at 
 container's log.
 Should we show the container's log across HistoryServer even if 
 yarn.log-aggregation-enable is seted to false.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-647) historyServer can't show container's log when aggregation is not enabled


[ 
https://issues.apache.org/jira/browse/YARN-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857431#comment-13857431
 ] 

Hadoop QA commented on YARN-647:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620602/yarn-647-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2738//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2738//console

This message is automatically generated.

 historyServer can't show container's log when aggregation is not enabled
 

 Key: YARN-647
 URL: https://issues.apache.org/jira/browse/YARN-647
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0
 Environment:  yarn.log-aggregation-enable=false , HistoryServer will 
 show like this:
 Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669
Reporter: shenhong
Assignee: shenhong
 Attachments: yarn-647-2.patch, yarn-647.patch


 When yarn.log-aggregation-enable is seted to false, after a MR_App complete, 
 we can't view the container's log from the HistoryServer, it shows message 
 like:
 Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669
 Since we don't want to aggregate the container's log, because it will be a 
 pressure to namenode. but sometimes we also want to take a look at 
 container's log.
 Should we show the container's log across HistoryServer even if 
 yarn.log-aggregation-enable is seted to false.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1527) yarn rmadmin command prints wrong usage info:


[ 
https://issues.apache.org/jira/browse/YARN-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857438#comment-13857438
 ] 

Hudson commented on YARN-1527:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #434 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/434/])
YARN-1527. Fixed yarn rmadmin command to print the correct usage info. 
Contributed by Akira AJISAKA. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553596)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 yarn rmadmin command prints wrong usage info:
 -

 Key: YARN-1527
 URL: https://issues.apache.org/jira/browse/YARN-1527
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jian He
Assignee: Akira AJISAKA
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1527.patch


 The usage should be: yarn rmadmin, instead of java RMAdmin, and the 
 -refreshQueues should be in the second line.
 {code} Usage: java RMAdmin   -refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1523) Use StandbyException instead of RMNotYetReadyException


[ 
https://issues.apache.org/jira/browse/YARN-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857436#comment-13857436
 ] 

Hudson commented on YARN-1523:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #434 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/434/])
YARN-1523. Use StandbyException instead of RMNotYetReadyException (kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553616)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/RMNotYetActiveException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


 Use StandbyException instead of RMNotYetReadyException
 --

 Key: YARN-1523
 URL: https://issues.apache.org/jira/browse/YARN-1523
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: yarn-1523-1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1527) yarn rmadmin command prints wrong usage info:


[ 
https://issues.apache.org/jira/browse/YARN-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857485#comment-13857485
 ] 

Hudson commented on YARN-1527:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1625 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1625/])
YARN-1527. Fixed yarn rmadmin command to print the correct usage info. 
Contributed by Akira AJISAKA. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553596)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 yarn rmadmin command prints wrong usage info:
 -

 Key: YARN-1527
 URL: https://issues.apache.org/jira/browse/YARN-1527
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jian He
Assignee: Akira AJISAKA
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1527.patch


 The usage should be: yarn rmadmin, instead of java RMAdmin, and the 
 -refreshQueues should be in the second line.
 {code} Usage: java RMAdmin   -refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1523) Use StandbyException instead of RMNotYetReadyException


[ 
https://issues.apache.org/jira/browse/YARN-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857483#comment-13857483
 ] 

Hudson commented on YARN-1523:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1625 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1625/])
YARN-1523. Use StandbyException instead of RMNotYetReadyException (kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553616)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/RMNotYetActiveException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


 Use StandbyException instead of RMNotYetReadyException
 --

 Key: YARN-1523
 URL: https://issues.apache.org/jira/browse/YARN-1523
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: yarn-1523-1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1523) Use StandbyException instead of RMNotYetReadyException


[ 
https://issues.apache.org/jira/browse/YARN-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857507#comment-13857507
 ] 

Hudson commented on YARN-1523:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1651 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1651/])
YARN-1523. Use StandbyException instead of RMNotYetReadyException (kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553616)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/RMNotYetActiveException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


 Use StandbyException instead of RMNotYetReadyException
 --

 Key: YARN-1523
 URL: https://issues.apache.org/jira/browse/YARN-1523
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Fix For: 2.4.0

 Attachments: yarn-1523-1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1527) yarn rmadmin command prints wrong usage info:


[ 
https://issues.apache.org/jira/browse/YARN-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857509#comment-13857509
 ] 

Hudson commented on YARN-1527:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1651 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1651/])
YARN-1527. Fixed yarn rmadmin command to print the correct usage info. 
Contributed by Akira AJISAKA. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553596)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 yarn rmadmin command prints wrong usage info:
 -

 Key: YARN-1527
 URL: https://issues.apache.org/jira/browse/YARN-1527
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jian He
Assignee: Akira AJISAKA
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1527.patch


 The usage should be: yarn rmadmin, instead of java RMAdmin, and the 
 -refreshQueues should be in the second line.
 {code} Usage: java RMAdmin   -refreshQueues 
-refreshNodes 
-refreshSuperUserGroupsConfiguration 
-refreshUserToGroupsMappings 
-refreshAdminAcls 
-refreshServiceAcl 
-getGroups [username]
-help [cmd]
-transitionToActive serviceId
-transitionToStandby serviceId
-failover [--forcefence] [--forceactive] serviceId serviceId
-getServiceState serviceId
-checkHealth serviceId
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1481) Move internal services logic from AdminService to ResourceManager


 [ 
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1481:
---

Attachment: yarn-1481-addendum.patch

 Move internal services logic from AdminService to ResourceManager
 -

 Key: YARN-1481
 URL: https://issues.apache.org/jira/browse/YARN-1481
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, 
 yarn-1481-addendum.patch, yarn-1481-addendum.patch


 This is something I found while reviewing YARN-1318, but didn't halt that 
 patch as many cycles went there already. Some top level issues
  - Not easy to follow RM's service life cycle
 -- RM adds only AdminService as its service directly.
 -- Other services are added to RM when AdminService's init calls 
 RM.activeServices.init()
  - Overall, AdminService shouldn't encompass all of RM's HA state management. 
 It was originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager


[ 
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857594#comment-13857594
 ] 

Hadoop QA commented on YARN-1481:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12620625/yarn-1481-addendum.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2739//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2739//console

This message is automatically generated.

 Move internal services logic from AdminService to ResourceManager
 -

 Key: YARN-1481
 URL: https://issues.apache.org/jira/browse/YARN-1481
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, 
 yarn-1481-addendum.patch, yarn-1481-addendum.patch


 This is something I found while reviewing YARN-1318, but didn't halt that 
 patch as many cycles went there already. Some top level issues
  - Not easy to follow RM's service life cycle
 -- RM adds only AdminService as its service directly.
 -- Other services are added to RM when AdminService's init calls 
 RM.activeServices.init()
  - Overall, AdminService shouldn't encompass all of RM's HA state management. 
 It was originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager

2013-12-27 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857611#comment-13857611
 ] 

Sandy Ryza commented on YARN-1481:
--

+1 to the addendum patch

 Move internal services logic from AdminService to ResourceManager
 -

 Key: YARN-1481
 URL: https://issues.apache.org/jira/browse/YARN-1481
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, 
 yarn-1481-addendum.patch, yarn-1481-addendum.patch


 This is something I found while reviewing YARN-1318, but didn't halt that 
 patch as many cycles went there already. Some top level issues
  - Not easy to follow RM's service life cycle
 -- RM adds only AdminService as its service directly.
 -- Other services are added to RM when AdminService's init calls 
 RM.activeServices.init()
  - Overall, AdminService shouldn't encompass all of RM's HA state management. 
 It was originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1545) [Umbrella] Prevent DoS of YARN components by putting in limits

Vinod Kumar Vavilapalli created YARN-1545:
-

 Summary: [Umbrella] Prevent DoS of YARN components by putting in 
limits
 Key: YARN-1545
 URL: https://issues.apache.org/jira/browse/YARN-1545
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli


I did a pass and found many places that can cause DoS on various YARN services. 
Need to fix them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1546) Prevent DoS of ApplicationClientProtocol by putting in limits

Vinod Kumar Vavilapalli created YARN-1546:
-

 Summary: Prevent DoS of ApplicationClientProtocol by putting in 
limits
 Key: YARN-1546
 URL: https://issues.apache.org/jira/browse/YARN-1546
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli


Points of DoS in ApplicationClientProtocol
 - Get new AppId: App IDs are ints and so can be exhausted.
 - Unbounded submit-app playload: queueName, ContainerLaunchContext fields etc 
in ApplicationSubmissionContext
 - Unbounded byte-buffers as part of tokens during renew/cancel Delegation Token



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1547) Prevent DoS of ApplicationMasterProtocol by putting in limits

Vinod Kumar Vavilapalli created YARN-1547:
-

 Summary: Prevent DoS of ApplicationMasterProtocol by putting in 
limits
 Key: YARN-1547
 URL: https://issues.apache.org/jira/browse/YARN-1547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli


Points of DoS in ApplicationMasterProtocol
 - Host and trackingURL in RegisterApplicationMasterRequest
 - Diagnostics, final trackingURL in FinishApplicationMasterRequest
 - Unlimited number of resourceAsks, containersToBeReleased and 
resourceBlacklistRequest in AllocateRequest
-- Unbounded number of priorities and/or resourceRequests in each ask.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1548) Prevent DoS of ContainerManagementProtocol by putting in limits

Vinod Kumar Vavilapalli created YARN-1548:
-

 Summary: Prevent DoS of ContainerManagementProtocol by putting in 
limits
 Key: YARN-1548
 URL: https://issues.apache.org/jira/browse/YARN-1548
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli


Points of concern in ContainerManagementProtocol
 - Unbounded number of requests in StartContainersRequest
 - Similarly, StartContainerRequest can have uncontrolled token buffers and 
localResources, environment, commands, serviceData  acls as part of 
ContainerLaunchContext
 - Unlimited number of requests in StopContainersRequest and 
GetContainerStatusesRequest



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


 [ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1541:
--

Attachment: YARN-1541.2.patch

Added test in TestRM to test AM host and roc port are invalidated after attempt 
 is done.

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1493) Schedulers don't recognize apps separately from app-attempts


 [ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1493:
--

Attachment: YARN-1493.7.patch

Upload the patch to only make schedulers aware of the app, this patch is the 
same as YARN-1493.4.patch

 Schedulers don't recognize apps separately from app-attempts
 

 Key: YARN-1493
 URL: https://issues.apache.org/jira/browse/YARN-1493
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
 YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch


 Today, scheduler is tied to attempt only.
 We need to separate app-level handling logic in scheduler. We can add new 
 app-level events to the scheduler and separate the app-level logic out. This 
 is good for work-preserving AM restart, RM restart, and also needed for 
 differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1493) Schedulers don't recognize apps separately from app-attempts


 [ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1493:
--

Attachment: YARN-1493.7.patch

 Schedulers don't recognize apps separately from app-attempts
 

 Key: YARN-1493
 URL: https://issues.apache.org/jira/browse/YARN-1493
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
 YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch


 Today, scheduler is tied to attempt only.
 We need to separate app-level handling logic in scheduler. We can add new 
 app-level events to the scheduler and separate the app-level logic out. This 
 is good for work-preserving AM restart, RM restart, and also needed for 
 differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts


[ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857672#comment-13857672
 ] 

Jian He commented on YARN-1493:
---

New patch changes:
- Make schedulers send the App_accepted/App_rejected event to the RMApp instead 
of RMAppAttempt.
- Create two new events AppAddedSchedulerEvent and AppRemovedSchedulerEvent for 
adding and removing apps in the schedulers.
- Change the state transition to start a new attempt until the app is accepted 
by the scheduler.
- Rename SchedulerApplication to SchedulerApplicationAttempt, and created a new 
SchedulerApplication for keeping track of the app-specific info.

 Schedulers don't recognize apps separately from app-attempts
 

 Key: YARN-1493
 URL: https://issues.apache.org/jira/browse/YARN-1493
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
 YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch


 Today, scheduler is tied to attempt only.
 We need to separate app-level handling logic in scheduler. We can add new 
 app-level events to the scheduler and separate the app-level logic out. This 
 is good for work-preserving AM restart, RM restart, and also needed for 
 differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


[ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857674#comment-13857674
 ] 

Hadoop QA commented on YARN-1541:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620632/YARN-1541.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2740//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2740//console

This message is automatically generated.

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts


[ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857677#comment-13857677
 ] 

Hadoop QA commented on YARN-1493:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620638/YARN-1493.7.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2741//console

This message is automatically generated.

 Schedulers don't recognize apps separately from app-attempts
 

 Key: YARN-1493
 URL: https://issues.apache.org/jira/browse/YARN-1493
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
 YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch


 Today, scheduler is tied to attempt only.
 We need to separate app-level handling logic in scheduler. We can add new 
 app-level events to the scheduler and separate the app-level logic out. This 
 is good for work-preserving AM restart, RM restart, and also needed for 
 differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1539) Queue admin ACLs should NOT be similar to submit-acls w.r.t hierarchy.


 [ 
https://issues.apache.org/jira/browse/YARN-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1539:
--

Priority: Major  (was: Critical)

Hm.. you are right. I check my notes again and cross-verified in code. The real 
bug my notes had was about queue-admins needing acls across the hierarchy to be 
able to submit jobs.

Irrespective of the operation (submit/kill), if I am a queue-admin for a 
leaf-queue, I should be able to simply perform the operation irrespective of my 
permissions on the parent queue.

Lowering priority given this. Can you please confirm FairScheduler's behaviour 
w.r.r this? We should refactoring this to be non-scheduler specific.

 Queue admin ACLs should NOT be similar to submit-acls w.r.t hierarchy.
 --

 Key: YARN-1539
 URL: https://issues.apache.org/jira/browse/YARN-1539
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli

 Today, Queue admin ACLs are similar to submit-acls w.r.t hierarchy in that if 
 one has to be able to administer a queue, he/she should be an admin of all 
 the queues in the ancestry - an unnecessary burden.
 This was added in YARN-899 and I believe is wrong semantics as well as 
 implementation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


[ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857695#comment-13857695
 ] 

Vinod Kumar Vavilapalli commented on YARN-1541:
---

It's interesting to realize via the test that we are invalidating host/port 
information for finished apps. We shouldn't perhaps be doing that for succeeded 
apps? Apps may use this similar to final-tracking URL after YARN-1225.

In addition one more place where this invalidation is really helpful is when an 
AM crashed or expired and client doesn't get the same address while the RM is 
in the process of launching a new AM. Can you add a test for that?

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts

2013-12-27 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857702#comment-13857702
 ] 

Sandy Ryza commented on YARN-1493:
--

Thanks for clarifying about the FinalTransition, Jian.  Took another look at 
the patch and I'm +1 once it applies and the findbugs issues are resolved.

 Schedulers don't recognize apps separately from app-attempts
 

 Key: YARN-1493
 URL: https://issues.apache.org/jira/browse/YARN-1493
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
 YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch


 Today, scheduler is tied to attempt only.
 We need to separate app-level handling logic in scheduler. We can add new 
 app-level events to the scheduler and separate the app-level logic out. This 
 is good for work-preserving AM restart, RM restart, and also needed for 
 differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857726#comment-13857726
 ] 

Jian He commented on YARN-1121:
---

Now that we have the isThreadAlive check, I also believe it will prevent 
serviceStop from getting stuck.

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, 
 YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, 
 YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, 
 YARN-1121.8.patch, YARN-1121.9.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager


[ 
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857740#comment-13857740
 ] 

Hudson commented on YARN-1481:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4931 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4931/])
YARN-1481. Addendum patch to fix synchronization in AdminService (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1553738)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


 Move internal services logic from AdminService to ResourceManager
 -

 Key: YARN-1481
 URL: https://issues.apache.org/jira/browse/YARN-1481
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, 
 yarn-1481-addendum.patch, yarn-1481-addendum.patch


 This is something I found while reviewing YARN-1318, but didn't halt that 
 patch as many cycles went there already. Some top level issues
  - Not easy to follow RM's service life cycle
 -- RM adds only AdminService as its service directly.
 -- Other services are added to RM when AdminService's init calls 
 RM.activeServices.init()
  - Overall, AdminService shouldn't encompass all of RM's HA state management. 
 It was originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1029) Allow embedding leader election into the RM


 [ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1029:
---

Attachment: yarn-1029-5.patch

Patch that adds RMFatalEvent to handle all fatal RM events, embedded elector 
uses this.

[~vinodkv], I believe the latest patch addresses all of Bikas' comments. Please 
take a look at the patch (at least MiniYARNCluster changes) when possible. 

 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, 
 yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, 
 yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-approach.patch


 It should be possible to embed common ActiveStandyElector into the RM such 
 that ZooKeeper based leader election and notification is in-built. In 
 conjunction with a ZK state store, this configuration will be a simple 
 deployment option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1474) Make schedulers services


[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857746#comment-13857746
 ] 

Karthik Kambatla commented on YARN-1474:


I don't think the approach used in YARN-1172 is the best approach for 
schedulers. We have different schedulers with different internal states and 
hence differ in their setup and cleanup. Also, given they are all part of YARN, 
we could force them all individually to be services. Here, it might make more 
sense to convert existing schedulers to services before changing the 
corresponding instantiation code in the ResourceManager. 

[~sandyr], [~vinodkv] - thoughts? 

 Make schedulers services
 

 Key: YARN-1474
 URL: https://issues.apache.org/jira/browse/YARN-1474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1474.1.patch


 Schedulers currently have a reinitialize but no start and stop.  Fitting them 
 into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


[ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857761#comment-13857761
 ] 

Jian He commented on YARN-1541:
---

bq. We shouldn't perhaps be doing that for succeeded apps?
Yup, new patch not invalidating the host and port for succeeded apps

Also added the test when between AM restarts.

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


[ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857764#comment-13857764
 ] 

Hadoop QA commented on YARN-1541:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620659/YARN-1541.3.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2743//console

This message is automatically generated.

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


 [ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1541:
--

Attachment: YARN-1541.3.patch

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, 
 YARN-1541.3.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857776#comment-13857776
 ] 

Hadoop QA commented on YARN-1463:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12620590/YARN-1463-20131226.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2742//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2742//console

This message is automatically generated.

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1463-20131226.txt, YARN-1463.v1.patch, 
 YARN-1463.v2.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857785#comment-13857785
 ] 

Karthik Kambatla commented on YARN-1463:


Thanks for taking this up, Vinod. The changes look good to me. +1. 

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1463-20131226.txt, YARN-1463.v1.patch, 
 YARN-1463.v2.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1463) TestContainerManagerSecurity#testContainerManager fails


[ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857786#comment-13857786
 ] 

Karthik Kambatla commented on YARN-1463:


Will commit this later in the day, if no one has any objections. 

 TestContainerManagerSecurity#testContainerManager fails
 ---

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1463-20131226.txt, YARN-1463.v1.patch, 
 YARN-1463.v2.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


[ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857798#comment-13857798
 ] 

Hadoop QA commented on YARN-1541:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620662/YARN-1541.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2744//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2744//console

This message is automatically generated.

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, 
 YARN-1541.3.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts

[
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857831#comment-13857831
]

Jian He commented on YARN-1493:
---

bq. When submission is rejected by a parent queue, you need to call
removeApplication. This existed before but your patch removed it.

The earlier addApplication is renamed to addApplicationAttempt, this
addApplicationAttempt is called when the SchedulerAttemptAddedEvent comes.
So we are not adding or removing any application data structure in the leaf
queue at all, we are adding/removing attempt in the leaf queue.

bq. finishApplicationAttempt: Should Inform the parent queue so that it can
call finishApplicationAttempt itself. Similarly for submitApplicationAttempt.
ParetQueue’s finishApplicationAttempt and submitApplicationAttempt logic is
empty, ParetQueue only deal with app-specific logic in the current
implementation. Do we still want to call parentQueue in attempt-specific APIs?

bq. We shouldn’t move to ACCEPTED directly before informing scheduler in case
of recovery?
YARN-1507 is saving the application after app is accepted. So after YARN-1507,
an app is saved meaning it is accepted. Maybe leave it for now and fix it in
YARN-1507 ?

bq. RMAppEventType.ATTEMPT_FAILED event should not come in at ACCEPTED state?
This is possible because, RMAppRecoveredTransition is changed to return
ACCEPTED state, and waiting for the AttemptFailed event to come (waiting for
the previous AM to exit)
I changed it to ACCEPTED state instead of RUNNING because as said after
YARN-1507, an app is saved meaning it is ACCEPTED. the app may not necessarily
be at RUNNING state earlier.

bq. When can this happen? During recovery? May be we should fix that correctly?
This can happen because I changed app to return ACCEPTED state on recovery, and
on recovery the app once again go through the scheduler and triggers one more
APP_ACCEPTED event at ACCEPTED state.

bq. TestFairScheduler: Why the conditional?
because testAclSubmitApplication is testing app2 to be null (AssertNull(The
application was allowed, app2)), the app is rejected and no app exists.

Schedulers don't recognize apps separately from app-attempts

Key: YARN-1493
URL: https://issues.apache.org/jira/browse/YARN-1493
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch,
YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch

Today, scheduler is tied to attempt only.
We need to separate app-level handling logic in scheduler. We can add new
app-level events to the scheduler and separate the app-level logic out. This
is good for work-preserving AM restart, RM restart, and also needed for
differentiating app-level metrics and attempt-level metrics.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts


[ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857839#comment-13857839
 ] 

Jian He commented on YARN-1493:
---

bq. The information about submission to various queues is lost? It is not 
needed?
Investigated that the queue passed in the earlier code is only used in 
ParentQueue.submitApplication for comparing the queue's name with the 
parentQueue's name for validation. None of the test is using this requirement.
And Also, for every single test, the test is using the same queue.

 Schedulers don't recognize apps separately from app-attempts
 

 Key: YARN-1493
 URL: https://issues.apache.org/jira/browse/YARN-1493
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
 YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch


 Today, scheduler is tied to attempt only.
 We need to separate app-level handling logic in scheduler. We can add new 
 app-level events to the scheduler and separate the app-level logic out. This 
 is good for work-preserving AM restart, RM restart, and also needed for 
 differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


 [ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1541:
--

Attachment: YARN-1541.4.patch

The test was failing because, the unit test timeout was set too small, 
Upload a new patch.

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, 
 YARN-1541.3.patch, YARN-1541.4.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1461) RM API and RM changes to handle tags for running jobs


 [ 
https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1461:
---

Attachment: yarn-1461-3.patch

Updated patch for REST based accesses to work. Verified on a pseudo-dist 
cluster along with the patch posted for MAPREDUCE-5699.

 RM API and RM changes to handle tags for running jobs
 -

 Key: YARN-1461
 URL: https://issues.apache.org/jira/browse/YARN-1461
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1461) RM API and RM changes to handle tags for running jobs


[ 
https://issues.apache.org/jira/browse/YARN-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857860#comment-13857860
 ] 

Hadoop QA commented on YARN-1461:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620677/yarn-1461-3.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2746//console

This message is automatically generated.

 RM API and RM changes to handle tags for running jobs
 -

 Key: YARN-1461
 URL: https://issues.apache.org/jira/browse/YARN-1461
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1461-1.patch, yarn-1461-2.patch, yarn-1461-3.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1541) Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.


[ 
https://issues.apache.org/jira/browse/YARN-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857863#comment-13857863
 ] 

Hadoop QA commented on YARN-1541:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12620676/YARN-1541.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2745//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2745//console

This message is automatically generated.

 Invalidate AM Host/Port when app attempt is done so that in the mean-while 
 client doesn’t get wrong information.
 

 Key: YARN-1541
 URL: https://issues.apache.org/jira/browse/YARN-1541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1541.1.patch, YARN-1541.2.patch, YARN-1541.3.patch, 
 YARN-1541.3.patch, YARN-1541.4.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts

[
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857888#comment-13857888
]

Vinod Kumar Vavilapalli commented on YARN-1493:
---

bq. The earlier addApplication is renamed to addApplicationAttempt, this
addApplicationAttempt is called when the SchedulerAttemptAddedEvent comes. So
we are not adding or removing any application data structure in the leaf queue
at all, we are adding/removing attempt in the leaf queue.
Yeah, that is the current state. Let's do the same to future-proof it.

bq. ParetQueue’s finishApplicationAttempt and submitApplicationAttempt logic is
empty, ParetQueue only deal with app-specific logic in the current
implementation. Do we still want to call parentQueue in attempt-specific APIs?
Again, that's today. Let's do it the way one would see it in future.

bq. YARN-1507 is saving the application after app is accepted. So after
YARN-1507, an app is saved meaning it is accepted. Maybe leave it for now and
fix it in YARN-1507 ?
Sure, you should leave a code-comment nonetheless pointing to the JIRA.

bq. I changed it to ACCEPTED state instead of RUNNING because as said after
YARN-1507, an app is saved meaning it is ACCEPTED. the app may not necessarily
be at RUNNING state earlier.
Again, let's leave a code comment saying the same.

bq. This can happen because I changed app to return ACCEPTED state on recovery,
and on recovery the app once again go through the scheduler and triggers one
more APP_ACCEPTED event at ACCEPTED state.
Ditto.

Schedulers don't recognize apps separately from app-attempts

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1493) Schedulers don't recognize apps separately from app-attempts


[ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857904#comment-13857904
 ] 

Vinod Kumar Vavilapalli commented on YARN-1493:
---

bq. When submission is rejected by a parent queue, you need to call 
removeApplication. This existed before but your patch removed it.
I withdraw this comment, misread it.

Also, in LeafQueue, ParentQueue etc, please put @Override annotations where 
necessary. It avoids confusion about internal and implemented methods.

 Schedulers don't recognize apps separately from app-attempts
 

 Key: YARN-1493
 URL: https://issues.apache.org/jira/browse/YARN-1493
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1493.1.patch, YARN-1493.2.patch, YARN-1493.3.patch, 
 YARN-1493.4.patch, YARN-1493.5.patch, YARN-1493.6.patch, YARN-1493.7.patch


 Today, scheduler is tied to attempt only.
 We need to separate app-level handling logic in scheduler. We can add new 
 app-level events to the scheduler and separate the app-level logic out. This 
 is good for work-preserving AM restart, RM restart, and also needed for 
 differentiating app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857915#comment-13857915
 ] 

Vinod Kumar Vavilapalli commented on YARN-1121:
---

This is unnecessary locking to go into the heart of our dispatcher loop. Let's 
do it in the loop only if drainEventsOnStop is enabled?

Even otherwise, I don't see how the code in the dispatcher loop is useful other 
than optimizing for the 1 sec wait in stop. I think the main change is checking 
in stop if the dispatcher thread is alive. Just doing that should be enough?

{code}
while (!drained  eventHandlingThread.isAlive()) {
  Thread.sleep(1000);
}
{code}
This should take care of all races and pay a small 1 second cost during stop 
for some race conditions. Instead of the locking cost for every event. Some 
JVMs do bias away the lock, but it can be avoided altogether..

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, 
 YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, 
 YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, 
 YARN-1121.8.patch, YARN-1121.9.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags


[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857916#comment-13857916
 ] 

Vinod Kumar Vavilapalli commented on YARN-1399:
---

Tags are a way to filter or search for applications. Let's not conflate with 
their display and allow users to give arbitrary formats.

I'd push for case-insensitive, limited-length, and may be just ASCII charset. 
It seems limiting, but that's more than enough.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

[
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857918#comment-13857918
]

Vinod Kumar Vavilapalli commented on YARN-1399:
---

One thing that just occurred to me. Tags or the source/group originally
proposed won't help the oozie case as described on YARN-1390. Or to be more
accurate, they make it unwieldy. Let's say oozie uses a tag workflow_123_566
for all apps in a workflow, any other application from any other user SHOULD
not set that tag. Or run the risk of getting killed by oozie. That seems like
unintended behavior. To avoid it, we'll need to depend on oozie to not kill as
a privileged user.

Further, I could make any other user's application-search cumbersome by reusing
his/her tags for my own applications. Seems like the tag-search should be
linked to and limited by some other entity like user - search for apps matching
a tag for a given user/queue etc.

Allow users to annotate an application with multiple tags
-

Key: YARN-1399
URL: https://issues.apache.org/jira/browse/YARN-1399
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

Nowadays, when submitting an application, users can fill the applicationType
field to facilitate searching it later. IMHO, it's good to accept multiple
tags to allow users to describe their applications in multiple aspects,
including the application type. Then, searching by tags may be more efficient
for users to reach their desired application collection. It's pretty much
like the tag system of online photo/video/music and etc.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-12-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857956#comment-13857956
 ] 

Bikas Saha commented on YARN-1121:
--

From the patch is wasnt clear to me that the lock was in the main loop. I 
should have applied the patch to the code and seen the adjoining code. I agree 
it does not make sense to put a lock in the inner loop for every event. It may 
not be a 1 sec wait. These are all state store operations to an external 
system with an unbounded time to complete and one thread will be spinning all 
that time. Perhaps we could only to the notify when blockNewEvents is set to 
true. That would be when we are actually waiting for the drain signal.

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1121.1.patch, YARN-1121.10.patch, 
 YARN-1121.2.patch, YARN-1121.2.patch, YARN-1121.3.patch, YARN-1121.4.patch, 
 YARN-1121.5.patch, YARN-1121.6.patch, YARN-1121.6.patch, YARN-1121.7.patch, 
 YARN-1121.8.patch, YARN-1121.9.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1463) Tests should avoid starting http-server where possible or creates spnego keytab/principals


 [ 
https://issues.apache.org/jira/browse/YARN-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1463:
---

Summary: Tests should avoid starting http-server where possible or creates 
spnego keytab/principals  (was: 
TestContainerManagerSecurity#testContainerManager fails)

 Tests should avoid starting http-server where possible or creates spnego 
 keytab/principals
 --

 Key: YARN-1463
 URL: https://issues.apache.org/jira/browse/YARN-1463
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1463-20131226.txt, YARN-1463.v1.patch, 
 YARN-1463.v2.patch


 Here is stack trace:
 {code}
 testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)
   Time elapsed: 1.756 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
 ResourceManager failed to start. Final state is STOPPED
   at 
 org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

2013-12-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857975#comment-13857975
 ] 

Bikas Saha commented on YARN-1029:
--

Thanks for addressing the comments. I was expecting 
RMStateStoreOperationFailedEvent to be replaced by the new RMFatalErrorEvent 
just like the Embedded elector event got replaced. Not much use in the store 
sending an event to the RM and then the RM sending an event to itself again, 
right?

 Allow embedding leader election into the RM
 ---

 Key: YARN-1029
 URL: https://issues.apache.org/jira/browse/YARN-1029
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, 
 yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-2.patch, yarn-1029-3.patch, 
 yarn-1029-4.patch, yarn-1029-5.patch, yarn-1029-approach.patch


 It should be possible to embed common ActiveStandyElector into the RM such 
 that ZooKeeper based leader election and notification is in-built. In 
 conjunction with a ZK state store, this configuration will be a simple 
 deployment option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager

2013-12-27 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857977#comment-13857977
 ] 

Bikas Saha commented on YARN-1481:
--

Its not clear why removing the synchronization is safe. This private method is 
called from multiple public methods. If the earlier behavior was that the 
Active state of the RM would not be visible until the locking setter method had 
completely transitioned the RM to active then the new code might change that 
behavior. The callers would see the RM as active as soon as the state variable 
changes, even though the RM might be in the process of becoming active.

 Move internal services logic from AdminService to ResourceManager
 -

 Key: YARN-1481
 URL: https://issues.apache.org/jira/browse/YARN-1481
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.4.0

 Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt, 
 yarn-1481-addendum.patch, yarn-1481-addendum.patch


 This is something I found while reviewing YARN-1318, but didn't halt that 
 patch as many cycles went there already. Some top level issues
  - Not easy to follow RM's service life cycle
 -- RM adds only AdminService as its service directly.
 -- Other services are added to RM when AdminService's init calls 
 RM.activeServices.init()
  - Overall, AdminService shouldn't encompass all of RM's HA state management. 
 It was originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1481) Move internal services logic from AdminService to ResourceManager

[
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857979#comment-13857979
]

Karthik Kambatla commented on YARN-1481:

RMContextImpl has the HA state. Both getter and setter methods are synchronized
on the HA state.

bq. The callers would see the RM as active as soon as the state variable
changes, even though the RM might be in the process of becoming active.
The setter is called only after the corresponding transition. Ref:
ResourceManager#transitionTo*

Am I missing something?

Move internal services logic from AdminService to ResourceManager
-

Key: YARN-1481
URL: https://issues.apache.org/jira/browse/YARN-1481
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Fix For: 2.4.0

Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt,
yarn-1481-addendum.patch, yarn-1481-addendum.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM