[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue infos

2014-10-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184870#comment-14184870
 ] 

Sunil G commented on YARN-2647:
---

testResourceTrackerOnHA is not caused by this fix, seems connection exception 
from  registerNodeManager.

 Add yarn queue CLI to get queue infos
 -

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G
 Attachments: 0001-YARN-2647.patch, 0002-YARN-2647.patch, 
 0003-YARN-2647.patch, 0004-YARN-2647.patch, 0005-YARN-2647.patch, 
 0006-YARN-2647.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue infos

2014-10-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184892#comment-14184892
 ] 

Wangda Tan commented on YARN-2647:
--

Hi [~sunilg],
Latest patch LGTM, +1.
Thanks,
Wangda

 Add yarn queue CLI to get queue infos
 -

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G
 Attachments: 0001-YARN-2647.patch, 0002-YARN-2647.patch, 
 0003-YARN-2647.patch, 0004-YARN-2647.patch, 0005-YARN-2647.patch, 
 0006-YARN-2647.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2750) Allow StateMachine has callback when transition fail

2014-10-27 Thread Jeff Zhang (JIRA)
Jeff Zhang created YARN-2750:


 Summary: Allow StateMachine has callback when transition fail
 Key: YARN-2750
 URL: https://issues.apache.org/jira/browse/YARN-2750
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jeff Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail

2014-10-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-2750:
-
Description: We have a situation that sometimes Transition may fail, but we 
don't want to handle the fail in each Transition, we'd like to handle it in one 
centralized place, Allow StateMachine has a callback would be good for us.

 Allow StateMachine has callback when transition fail
 

 Key: YARN-2750
 URL: https://issues.apache.org/jira/browse/YARN-2750
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jeff Zhang

 We have a situation that sometimes Transition may fail, but we don't want to 
 handle the fail in each Transition, we'd like to handle it in one centralized 
 place, Allow StateMachine has a callback would be good for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail

2014-10-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-2750:
-
Affects Version/s: 2.5.1

 Allow StateMachine has callback when transition fail
 

 Key: YARN-2750
 URL: https://issues.apache.org/jira/browse/YARN-2750
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Jeff Zhang

 We have a situation that sometimes Transition may fail, but we don't want to 
 handle the fail in each Transition, we'd like to handle it in one centralized 
 place, Allow StateMachine has a callback would be good for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail

2014-10-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-2750:
-
Attachment: YARN-2750.patch

Attach a patch for initial review. 

 Allow StateMachine has callback when transition fail
 

 Key: YARN-2750
 URL: https://issues.apache.org/jira/browse/YARN-2750
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Jeff Zhang
 Attachments: YARN-2750.patch


 We have a situation that sometimes Transition may fail, but we don't want to 
 handle the fail in each Transition, we'd like to handle it in one centralized 
 place, Allow StateMachine has a callback would be good for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2743) Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185078#comment-14185078
 ] 

Hudson commented on YARN-2743:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #725 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/725/])
YARN-2743. Fixed a bug in ResourceManager that was causing RMDelegationToken 
identifiers to be tampered and thus causing app submission failures in secure 
mode. Contributed by Jian He. (vinodkv: rev 
018664550507981297fd9f91e29408e6b7801ea9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMDelegationTokenIdentifierData.java


 Yarn jobs via oozie fail with failed to renew token (secure) or digest 
 mismatch (unsecure) errors when RM is being killed
 -

 Key: YARN-2743
 URL: https://issues.apache.org/jira/browse/YARN-2743
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2743.1.patch, YARN-2743.2.patch


 During our HA testing we have seen yarn jobs run via oozie fail with failed 
 to renew delegation token errors on secure clusters and digest mismatch 
 errors on un secure clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185077#comment-14185077
 ] 

Hudson commented on YARN-2734:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #725 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/725/])
YARN-2734. Skipped sub-folders in the local log dir when aggregating logs. 
Contributed by Xuan Gong. (zjshen: rev caecd9fffe7c6216be31f3ab65349182045451fa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


 If a sub-folder is encountered by log aggregator it results in invalid 
 aggregated file
 --

 Key: YARN-2734
 URL: https://issues.apache.org/jira/browse/YARN-2734
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2734.1.patch, YARN-2734.2.patch


 See YARN-2724 for some more context on how the error surfaces during yarn 
 logs call.
 If aggregator sees a sub-folder today it results in the following error when 
 reading the logs:
 {noformat}
 Container: container_1413512973198_0019_01_02 on 
 c6401.ambari.apache.org_45454
 
 LogType: cmd_data
 LogLength: 4096
 Log Contents:
 Error aggregating log file. Log file : 
 /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data
  (Is a directory)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-27 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185104#comment-14185104
 ] 

Varun Vasudev commented on YARN-2741:
-

[~cwelch] we should add some sort of unit test to confirm the behavior. I can 
see this bug getting re-introduced by mistake if someone is adding 
functionality or re-factoring code.

 Windows: Node manager cannot serve up log files via the web user interface 
 when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
 drive that nodemanager is running on)
 --

 Key: YARN-2741
 URL: https://issues.apache.org/jira/browse/YARN-2741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2741.1.patch


 PROBLEM: User is getting No Logs available for Container Container_number 
 when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
 STEPS TO REPRODUCE:
 On Windows
 1) Run NodeManager on C:
 2) Create two local drive partitions D: and E:
 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
 4) Run a MR job that will last at least 5 minutes
 5) While the job is in flight, log into the Yarn web ui , 
 resource_manager_server:8088/cluster
 6) Click on the application_idnumber
 7) Click on the logs link, you will get No Logs available for Container 
 Container_number
 ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
 EXPECTED BEHAVIOR: Able to use different drive letters in 
 yarn.nodemanager.log-dirs and not get error
 NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
 to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail

2014-10-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated YARN-2750:
-
Attachment: YARN-2750-2.patch

 Allow StateMachine has callback when transition fail
 

 Key: YARN-2750
 URL: https://issues.apache.org/jira/browse/YARN-2750
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Jeff Zhang
 Attachments: YARN-2750-2.patch, YARN-2750.patch


 We have a situation that sometimes Transition may fail, but we don't want to 
 handle the fail in each Transition, we'd like to handle it in one centralized 
 place, Allow StateMachine has a callback would be good for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185167#comment-14185167
 ] 

Hudson commented on YARN-2734:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1914/])
YARN-2734. Skipped sub-folders in the local log dir when aggregating logs. 
Contributed by Xuan Gong. (zjshen: rev caecd9fffe7c6216be31f3ab65349182045451fa)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


 If a sub-folder is encountered by log aggregator it results in invalid 
 aggregated file
 --

 Key: YARN-2734
 URL: https://issues.apache.org/jira/browse/YARN-2734
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2734.1.patch, YARN-2734.2.patch


 See YARN-2724 for some more context on how the error surfaces during yarn 
 logs call.
 If aggregator sees a sub-folder today it results in the following error when 
 reading the logs:
 {noformat}
 Container: container_1413512973198_0019_01_02 on 
 c6401.ambari.apache.org_45454
 
 LogType: cmd_data
 LogLength: 4096
 Log Contents:
 Error aggregating log file. Log file : 
 /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data
  (Is a directory)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2743) Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185168#comment-14185168
 ] 

Hudson commented on YARN-2743:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1914/])
YARN-2743. Fixed a bug in ResourceManager that was causing RMDelegationToken 
identifiers to be tampered and thus causing app submission failures in secure 
mode. Contributed by Jian He. (vinodkv: rev 
018664550507981297fd9f91e29408e6b7801ea9)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMDelegationTokenIdentifierData.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java


 Yarn jobs via oozie fail with failed to renew token (secure) or digest 
 mismatch (unsecure) errors when RM is being killed
 -

 Key: YARN-2743
 URL: https://issues.apache.org/jira/browse/YARN-2743
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2743.1.patch, YARN-2743.2.patch


 During our HA testing we have seen yarn jobs run via oozie fail with failed 
 to renew delegation token errors on secure clusters and digest mismatch 
 errors on un secure clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2678) Recommended improvements to Yarn Registry

2014-10-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2678:
-
Attachment: YARN-2678-006.patch

Patch 006
# javadoc and javac warnings believed fixed
# purged the no-longer-used header logic
# improved data checks on deserialization

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, 
 YARN-2678-003.patch, YARN-2678-006.patch, yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2683) registry config options: document and move to core-default

2014-10-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2683:
-
Summary: registry config options: document and move to core-default  (was: 
document registry config options)

 registry config options: document and move to core-default
 --

 Key: YARN-2683
 URL: https://issues.apache.org/jira/browse/YARN-2683
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2683-001.patch, YARN-2683-002.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Add to {{yarn-site}} a page on registry configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2683) registry config options: document and move to core-default

2014-10-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185213#comment-14185213
 ] 

Steve Loughran commented on YARN-2683:
--

I've added one more action to this JIRA: move the config defaults to 
{{core-default.xml}}

This may seem odd for a YARN project but the registry was written so as to 
allow applications without any YARN artifacts on their classpath to resolve 
records. That is: the service is expected to be YARN app, (though not 
exclusively); clients may have a leaner classpath.

While this works —the client applications do not get the default values from 
{{yarn-default.xml}}

 registry config options: document and move to core-default
 --

 Key: YARN-2683
 URL: https://issues.apache.org/jira/browse/YARN-2683
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2683-001.patch, YARN-2683-002.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Add to {{yarn-site}} a page on registry configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185230#comment-14185230
 ] 

Hadoop QA commented on YARN-2678:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677307/YARN-2678-006.patch
  against trunk revision 0058ead.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5576//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5576//console

This message is automatically generated.

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, 
 YARN-2678-003.patch, YARN-2678-006.patch, yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : 

[jira] [Commented] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185231#comment-14185231
 ] 

Hudson commented on YARN-2734:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1939 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1939/])
YARN-2734. Skipped sub-folders in the local log dir when aggregating logs. 
Contributed by Xuan Gong. (zjshen: rev caecd9fffe7c6216be31f3ab65349182045451fa)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java


 If a sub-folder is encountered by log aggregator it results in invalid 
 aggregated file
 --

 Key: YARN-2734
 URL: https://issues.apache.org/jira/browse/YARN-2734
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2734.1.patch, YARN-2734.2.patch


 See YARN-2724 for some more context on how the error surfaces during yarn 
 logs call.
 If aggregator sees a sub-folder today it results in the following error when 
 reading the logs:
 {noformat}
 Container: container_1413512973198_0019_01_02 on 
 c6401.ambari.apache.org_45454
 
 LogType: cmd_data
 LogLength: 4096
 Log Contents:
 Error aggregating log file. Log file : 
 /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data
  (Is a directory)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2743) Yarn jobs via oozie fail with failed to renew token (secure) or digest mismatch (unsecure) errors when RM is being killed

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185232#comment-14185232
 ] 

Hudson commented on YARN-2743:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1939 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1939/])
YARN-2743. Fixed a bug in ResourceManager that was causing RMDelegationToken 
identifiers to be tampered and thus causing app submission failures in secure 
mode. Contributed by Jian He. (vinodkv: rev 
018664550507981297fd9f91e29408e6b7801ea9)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/RMDelegationTokenIdentifierData.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java


 Yarn jobs via oozie fail with failed to renew token (secure) or digest 
 mismatch (unsecure) errors when RM is being killed
 -

 Key: YARN-2743
 URL: https://issues.apache.org/jira/browse/YARN-2743
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2743.1.patch, YARN-2743.2.patch


 During our HA testing we have seen yarn jobs run via oozie fail with failed 
 to renew delegation token errors on secure clusters and digest mismatch 
 errors on un secure clusters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2683) registry config options: document and move to core-default

2014-10-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2683:
-
Attachment: YARN-2683-003.patch

Patch -003. The configuration document is available [[rendered on 
github|https://github.com/steveloughran/hadoop-trunk/blob/YARN-913/trunk-YARN-2683-docs/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md]]

Along with the docs, this patch moves all the configuration options into 
{{core-default}}. They are only used in YARN applications today, but it is 
essential to place them there so that non-YARN clients pick up the default 
values. Configuration should go into {{core-site.xml}} too

 registry config options: document and move to core-default
 --

 Key: YARN-2683
 URL: https://issues.apache.org/jira/browse/YARN-2683
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2683-001.patch, YARN-2683-002.patch, 
 YARN-2683-003.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Add to {{yarn-site}} a page on registry configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2186) Node Manager uploader service for cache manager

2014-10-27 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2186:
--
Attachment: YARN-2186-trunk-v4.patch

Posted v4. Up to date with YARN-2183.

To see the diffs on github, see 
https://github.com/ctrezzo/hadoop/compare/ctrezzo:trunk...sharedcache-4-YARN-2186-uploader

 Node Manager uploader service for cache manager
 ---

 Key: YARN-2186
 URL: https://issues.apache.org/jira/browse/YARN-2186
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch, 
 YARN-2186-trunk-v3.patch, YARN-2186-trunk-v4.patch


 Implement the node manager uploader service for the cache manager. This 
 service is responsible for communicating with the node manager when it 
 uploads resources to the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2683) registry config options: document and move to core-default

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185356#comment-14185356
 ] 

Hadoop QA commented on YARN-2683:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677320/YARN-2683-003.patch
  against trunk revision 0058ead.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5577//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5577//console

This message is automatically generated.

 registry config options: document and move to core-default
 --

 Key: YARN-2683
 URL: https://issues.apache.org/jira/browse/YARN-2683
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2683-001.patch, YARN-2683-002.patch, 
 YARN-2683-003.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Add to {{yarn-site}} a page on registry configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2186) Node Manager uploader service for cache manager

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185366#comment-14185366
 ] 

Hadoop QA commented on YARN-2186:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677322/YARN-2186-trunk-v4.patch
  against trunk revision 0058ead.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5578//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5578//console

This message is automatically generated.

 Node Manager uploader service for cache manager
 ---

 Key: YARN-2186
 URL: https://issues.apache.org/jira/browse/YARN-2186
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch, 
 YARN-2186-trunk-v3.patch, YARN-2186-trunk-v4.patch


 Implement the node manager uploader service for the cache manager. This 
 service is responsible for communicating with the node manager when it 
 uploads resources to the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-10-27 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185368#comment-14185368
 ] 

bc Wong commented on YARN-2194:
---

container-executor.c
* L1188: If initialize_user() fails, do you not need to cleanup?
* L1194: Same for create_log_dirs(). Seems that goto cleanup is still warranted.
* L1207: Missing space before S_IRWXU.
* L1243: Nit. Hardcoding 55 here is error-prone. You could allocate a 4K buffer 
here, and use snprintf.
* L1244: You need to check the return value from malloc(). Since you're running 
as root here, everything has to be extra careful.
* L1255: On failure, would log the command being executed.


 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2194-1.patch


 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2014-10-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185372#comment-14185372
 ] 

Wangda Tan commented on YARN-2729:
--

Hi [~Naganarasimha],
For 1.
I think what I meant is just check label name locally in NM. 
If NM register/heartbeat with RM failed with labels, I should have commented in 
YARN-2495: 
https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14184146page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14184146.
bq. use a flag say if the last sync about node labels is success or not
This should be also your proposal

For 2.
I think for now, let's keep it simple, I just don't want to change too much for 
what we have in NodeLabelsManager :). As our discussion in YARN-2495, in the 
future we might need return rejected labels. So we can change this that that 
time. What do you think?

Thanks,
Wangda

 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup
 ---

 Key: YARN-2729
 URL: https://issues.apache.org/jira/browse/YARN-2729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch


 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity

2014-10-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185449#comment-14185449
 ] 

Xuan Gong commented on YARN-2726:
-

+1 lgtm. Will commit it.

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor
 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch


 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist

2014-10-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185458#comment-14185458
 ] 

Wangda Tan commented on YARN-2744:
--

As offline discussed with [~vinodkv], what this patch has done is not only fix 
for memory-based-config-store, it is still possible that when we use 
filesystem-based-config-store, some labels will not be validated. We should fix 
that.

 Under some scenario, it is possible to end up with capacity scheduler 
 configuration that uses labels that no longer exist
 -

 Key: YARN-2744
 URL: https://issues.apache.org/jira/browse/YARN-2744
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch


 Use the following steps:
 * Ensure default in-memory storage is configured for labels
 * Define some labels and assign nodes to labels (e.g. define two labels and 
 assign both labels to the host on a one host cluster)
 * Invoke refreshQueues
 * Modify capacity scheduler to create two top level queues and allow access 
 to the labels from both the queues
 * Assign appropriate label + queue specific capacities
 * Restart resource manager
 Noticed that RM starts without any issues. The labels are not preserved 
 across restart and thus the capacity-scheduler ends up using labels that are 
 no longer present.
 At this point submitting an application to YARN will not succeed as there are 
 no resources available with the labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity

2014-10-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185462#comment-14185462
 ] 

Xuan Gong commented on YARN-2726:
-

Committed to trunk, branch-2 and branch-2.6. Thanks wangda !

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch


 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185463#comment-14185463
 ] 

Hudson commented on YARN-2726:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6354 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6354/])
YARN-2726. CapacityScheduler should explicitly log when an accessible label has 
no capacity. Contributed by Wangda Tan (xgong: rev 
ce1a4419a6c938447a675c416567db56bf9cb29e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java


 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch


 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity

2014-10-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185464#comment-14185464
 ] 

Wangda Tan commented on YARN-2726:
--

Thanks [~xgong]'s review and commit!

 CapacityScheduler should explicitly log when an accessible label has no 
 capacity
 

 Key: YARN-2726
 URL: https://issues.apache.org/jira/browse/YARN-2726
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Phil D'Amore
Assignee: Wangda Tan
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch


 Given:
 - Node label defined: test-label
 - Two queues defined: a, b
 - label accessibility and and capacity defined as follows (properties 
 abbreviated for readability):
 root.a.accessible-node-labels = test-label
 root.a.accessible-node-labels.test-label.capacity = 100
 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack 
 trace with the following error buried within:
 Illegal capacity of -1.0 for label=test-label in queue=root.b
 This of course occurs because test-label is accessible to b due to 
 inheritance from the root, and -1 is the UNDEFINED value.  To my mind this 
 might not be obvious to the admin, and the error message which results does 
 not help guide someone to the source of the issue.
 I propose that this situation be updated so that when the capacity on an 
 accessible label is undefined, it is explicitly called out instead of falling 
 through to the illegal capacity check.  Something like:
 {code}
 if (capacity == UNDEFINED) {
 throw new IllegalArgumentException(Configuration issue:  +  label= + 
 label +  is accessible from queue= + queue +  but has no capacity set.);
 }
 {code}
 I'll leave it to better judgement than mine as to whether I'm throwing the 
 appropriate exception there.  I think this check should be added to both 
 getNodeLabelCapacities and getMaximumNodeLabelCapacities in 
 CapacitySchedulerConfiguration.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist

2014-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2744:
-
Priority: Critical  (was: Major)

 Under some scenario, it is possible to end up with capacity scheduler 
 configuration that uses labels that no longer exist
 -

 Key: YARN-2744
 URL: https://issues.apache.org/jira/browse/YARN-2744
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch


 Use the following steps:
 * Ensure default in-memory storage is configured for labels
 * Define some labels and assign nodes to labels (e.g. define two labels and 
 assign both labels to the host on a one host cluster)
 * Invoke refreshQueues
 * Modify capacity scheduler to create two top level queues and allow access 
 to the labels from both the queues
 * Assign appropriate label + queue specific capacities
 * Restart resource manager
 Noticed that RM starts without any issues. The labels are not preserved 
 across restart and thus the capacity-scheduler ends up using labels that are 
 no longer present.
 At this point submitting an application to YARN will not succeed as there are 
 no resources available with the labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2751) HdfsConstants#MEMORY_STORAGE_POLICY_ID and HdfsConstants#MEMORY_STORAGE_POLICY_ID are missing in branch-2

2014-10-27 Thread Jing Zhao (JIRA)
Jing Zhao created YARN-2751:
---

 Summary: HdfsConstants#MEMORY_STORAGE_POLICY_ID and 
HdfsConstants#MEMORY_STORAGE_POLICY_ID are missing in branch-2
 Key: YARN-2751
 URL: https://issues.apache.org/jira/browse/YARN-2751
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jing Zhao
Priority: Minor


When HDFS-6581 was merged to branch-2 and branch-2.6, 
HdfsConstants#MEMORY_STORAGE_POLICY_ID and 
HdfsConstants#HdfsConstants#MEMORY_STORAGE_POLICY_ID, which were defined in 
HDFS-7228, are missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data

2014-10-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185485#comment-14185485
 ] 

Jian He commented on YARN-2591:
---

looks good, +1

 AHSWebServices should return FORBIDDEN(403) if the request user doesn't have 
 access to the history data
 ---

 Key: YARN-2591
 URL: https://issues.apache.org/jira/browse/YARN-2591
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 3.0.0, 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2591.1.patch, YARN-2591.2.patch


 AHSWebServices should return FORBIDDEN(403) if the request user doesn't have 
 access to the history data. Currently, it is going to return 
 INTERNAL_SERVER_ERROR(500).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2751) HdfsConstants#MEMORY_STORAGE_POLICY_ID and HdfsConstants#MEMORY_STORAGE_POLICY_ID are missing in branch-2

2014-10-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185490#comment-14185490
 ] 

Jing Zhao commented on YARN-2751:
-

Oops, created in the wrong project...

 HdfsConstants#MEMORY_STORAGE_POLICY_ID and 
 HdfsConstants#MEMORY_STORAGE_POLICY_ID are missing in branch-2
 -

 Key: YARN-2751
 URL: https://issues.apache.org/jira/browse/YARN-2751
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jing Zhao
Priority: Minor

 When HDFS-6581 was merged to branch-2 and branch-2.6, 
 HdfsConstants#MEMORY_STORAGE_POLICY_ID and 
 HdfsConstants#HdfsConstants#MEMORY_STORAGE_POLICY_ID, which were defined in 
 HDFS-7228, are missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185500#comment-14185500
 ] 

Hudson commented on YARN-2591:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6355 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6355/])
YARN-2591. Fixed AHSWebServices to return FORBIDDEN(403) if the request user 
doesn't have access to the history data. Contributed by Zhijie Shen (jianhe: 
rev c05b581a5522eed499d3ba16af9fa6dc694563f6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/AuthorizationException.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java


 AHSWebServices should return FORBIDDEN(403) if the request user doesn't have 
 access to the history data
 ---

 Key: YARN-2591
 URL: https://issues.apache.org/jira/browse/YARN-2591
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 3.0.0, 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2591.1.patch, YARN-2591.2.patch


 AHSWebServices should return FORBIDDEN(403) if the request user doesn't have 
 access to the history data. Currently, it is going to return 
 INTERNAL_SERVER_ERROR(500).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time

2014-10-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185534#comment-14185534
 ] 

Jian He commented on YARN-2704:
---

bq. if (this.token==null || this.applicationId==null || this.conf==null) {“
because this null checks can not happen, and was causing previous find bugs 
warning.

  Localization and log-aggregation will fail if hdfs delegation token expired 
 after token-max-life-time
 --

 Key: YARN-2704
 URL: https://issues.apache.org/jira/browse/YARN-2704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, 
 YARN-2704.3.patch


 In secure mode, YARN requires the hdfs-delegation token to do localization 
 and log aggregation on behalf of the user. But the hdfs delegation token will 
 eventually expire after max-token-life-time.  So,  localization and log 
 aggregation will fail after the token expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time

2014-10-27 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2704:
--
Attachment: YARN-2704.4.patch

Fixed previous comments

  Localization and log-aggregation will fail if hdfs delegation token expired 
 after token-max-life-time
 --

 Key: YARN-2704
 URL: https://issues.apache.org/jira/browse/YARN-2704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, 
 YARN-2704.3.patch, YARN-2704.4.patch


 In secure mode, YARN requires the hdfs-delegation token to do localization 
 and log aggregation on behalf of the user. But the hdfs delegation token will 
 eventually expire after max-token-life-time.  So,  localization and log 
 aggregation will fail after the token expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-443) allow OS scheduling priority of NM to be different than the containers it launches

2014-10-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185542#comment-14185542
 ] 

Xuan Gong commented on YARN-443:


[~tgraves]
I found that the function ContainerExecutor.getRunCommand() in the trunk patch 
is different from that in branch-2/branch-0.23 patch. Is there any reason why 
we are doing that ?

 allow OS scheduling priority of NM to be different than the containers it 
 launches
 --

 Key: YARN-443
 URL: https://issues.apache.org/jira/browse/YARN-443
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 0.23.7, 2.0.4-alpha

 Attachments: YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, 
 YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, 
 YARN-443-branch-2.patch, YARN-443-branch-2.patch, YARN-443-branch-2.patch, 
 YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, 
 YARN-443.patch, YARN-443.patch, YARN-443.patch


 It would be nice if we could have the nodemanager run at a different OS 
 scheduling priority than the containers so that you can still communicate 
 with the nodemanager if the containers out of control.  
 On linux we could launch the nodemanager at a higher priority, but then all 
 the containers it launches would also be at that higher priority, so we need 
 a way for the container executor to launch them at a lower priority.
 I'm not sure how this applies to windows if at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-27 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185572#comment-14185572
 ] 

Craig Welch commented on YARN-2741:
---

Will do, making patch available so I can see the change run against existing 
tests.

 Windows: Node manager cannot serve up log files via the web user interface 
 when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
 drive that nodemanager is running on)
 --

 Key: YARN-2741
 URL: https://issues.apache.org/jira/browse/YARN-2741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2741.1.patch


 PROBLEM: User is getting No Logs available for Container Container_number 
 when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
 STEPS TO REPRODUCE:
 On Windows
 1) Run NodeManager on C:
 2) Create two local drive partitions D: and E:
 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
 4) Run a MR job that will last at least 5 minutes
 5) While the job is in flight, log into the Yarn web ui , 
 resource_manager_server:8088/cluster
 6) Click on the application_idnumber
 7) Click on the logs link, you will get No Logs available for Container 
 Container_number
 ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
 EXPECTED BEHAVIOR: Able to use different drive letters in 
 yarn.nodemanager.log-dirs and not get error
 NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
 to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-443) allow OS scheduling priority of NM to be different than the containers it launches

2014-10-27 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185578#comment-14185578
 ] 

Thomas Graves commented on YARN-443:


Can you be more specific, what is different about it and why it is a problem? 
The trunk patch shows that there was an existing getRunCommand() routine 
(before this change) where as the other didn't have one before (it looks like 
for windows support).

 allow OS scheduling priority of NM to be different than the containers it 
 launches
 --

 Key: YARN-443
 URL: https://issues.apache.org/jira/browse/YARN-443
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Thomas Graves
Assignee: Thomas Graves
 Fix For: 0.23.7, 2.0.4-alpha

 Attachments: YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, 
 YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, 
 YARN-443-branch-2.patch, YARN-443-branch-2.patch, YARN-443-branch-2.patch, 
 YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, 
 YARN-443.patch, YARN-443.patch, YARN-443.patch


 It would be nice if we could have the nodemanager run at a different OS 
 scheduling priority than the containers so that you can still communicate 
 with the nodemanager if the containers out of control.  
 On linux we could launch the nodemanager at a higher priority, but then all 
 the containers it launches would also be at that higher priority, so we need 
 a way for the container executor to launch them at a lower priority.
 I'm not sure how this applies to windows if at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2571) RM to support YARN registry

2014-10-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Attachment: YARN-2571-006.patch

Patch 006:  upgrade use a URI in the api from a SHOULD to a MUST; fix 
examples and tests accordingly

 RM to support YARN registry 
 

 Key: YARN-2571
 URL: https://issues.apache.org/jira/browse/YARN-2571
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
 YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-006.patch


 The RM needs to (optionally) integrate with the YARN registry:
 # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
 principals)
 # app-launch: create the user directory /users/$username with the relevant 
 permissions (CRD) for them to create subnodes.
 # attempt, container, app completion: remove service records with the 
 matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2571) RM to support YARN registry

2014-10-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Attachment: (was: YARN-2571-006.patch)

 RM to support YARN registry 
 

 Key: YARN-2571
 URL: https://issues.apache.org/jira/browse/YARN-2571
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
 YARN-2571-003.patch, YARN-2571-005.patch


 The RM needs to (optionally) integrate with the YARN registry:
 # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
 principals)
 # app-launch: create the user directory /users/$username with the relevant 
 permissions (CRD) for them to create subnodes.
 # attempt, container, app completion: remove service records with the 
 matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2678) Recommended improvements to Yarn Registry

2014-10-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2678:
-
Attachment: YARN-2678-007.patch

patch 007:  upgrade use a URI in the api from a SHOULD to a MUST; fix 
examples and tests accordingly

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, 
 YARN-2678-003.patch, YARN-2678-006.patch, YARN-2678-007.patch, 
 yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185636#comment-14185636
 ] 

Hadoop QA commented on YARN-2741:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677052/YARN-2741.1.patch
  against trunk revision c05b581.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
13 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/5580//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5580//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5580//console

This message is automatically generated.

 Windows: Node manager cannot serve up log files via the web user interface 
 when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
 drive that nodemanager is running on)
 --

 Key: YARN-2741
 URL: https://issues.apache.org/jira/browse/YARN-2741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2741.1.patch


 PROBLEM: User is getting No Logs available for Container Container_number 
 when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
 STEPS TO REPRODUCE:
 On Windows
 1) Run NodeManager on C:
 2) Create two local drive partitions D: and E:
 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
 4) Run a MR job that will last at least 5 minutes
 5) While the job is in flight, log into the Yarn web ui , 
 resource_manager_server:8088/cluster
 6) Click on the application_idnumber
 7) Click on the logs link, you will get No Logs available for Container 
 Container_number
 ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
 EXPECTED BEHAVIOR: Able to use different drive letters in 
 yarn.nodemanager.log-dirs and not get error
 NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
 to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185637#comment-14185637
 ] 

Hadoop QA commented on YARN-2678:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677366/YARN-2678-007.patch
  against trunk revision c05b581.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5581//console

This message is automatically generated.

 Recommended improvements to Yarn Registry
 -

 Key: YARN-2678
 URL: https://issues.apache.org/jira/browse/YARN-2678
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Gour Saha
Assignee: Steve Loughran
 Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, 
 YARN-2678-003.patch, YARN-2678-006.patch, YARN-2678-007.patch, 
 yarnregistry.pdf


 In the process of binding to Slider AM from Slider agent python code here are 
 some of the items I stumbled upon and would recommend as improvements.
 This is how the Slider's registry looks today -
 {noformat}
 jsonservicerec{
   description : Slider Application Master,
   external : [ {
 api : org.apache.slider.appmaster,
 addressType : host/port,
 protocolType : hadoop/protobuf,
 addresses : [ [ c6408.ambari.apache.org, 34837 ] ]
   }, {
 api : org.apache.http.UI,
 addressType : uri,
 protocolType : webui,
 addresses : [ [ http://c6408.ambari.apache.org:43314; ] ]
   }, {
 api : org.apache.slider.management,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ]
   }, {
 api : org.apache.slider.publisher,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ]
   }, {
 api : org.apache.slider.registry,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ]
   }, {
 api : org.apache.slider.publisher.configurations,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ]
   } ],
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ]
   }, {
 api : org.apache.slider.agents.oneway,
 addressType : uri,
 protocolType : REST,
 addresses : [ [ 
 https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ]
   } ],
   yarn:persistence : application,
   yarn:id : application_1412974695267_0015
 }
 {noformat}
 Recommendations:
 1. I would suggest to either remove the string 
 {color:red}jsonservicerec{color} or if it is desirable to have a non-null 
 data at all times then loop the string into the json structure as a top-level 
 attribute to ensure that the registry data is always a valid json document. 
 2. The {color:red}addresses{color} attribute is currently a list of list. I 
 would recommend to convert it to a list of dictionary objects. In the 
 dictionary object it would be nice to have the host and port portions of 
 objects of addressType uri as separate key-value pairs to avoid parsing on 
 the client side. The URI should also be retained as a key say uri to avoid 
 clients trying to generate it by concatenating host, port, resource-path, 
 etc. Here is a proposed structure -
 {noformat}
 {
   ...
   internal : [ {
 api : org.apache.slider.agents.secure,
 addressType : uri,
 protocolType : REST,
 addresses : [ 
{ uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;,
  host : c6408.ambari.apache.org,
  port: 46958
}
 ]
   } 
   ],
 }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2

2014-10-27 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2752:
---

 Summary: TestContainerExecutor.testRunCommandNoPriority fails in 
branch-2
 Key: YARN-2752
 URL: https://issues.apache.org/jira/browse/YARN-2752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2

2014-10-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2752:

Description: 
TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it passed 
in trunk. 
The function code ContainerExecutor.getRunCommand() in trunk is different from 
that in branch-2.

 TestContainerExecutor.testRunCommandNoPriority fails in branch-2
 

 Key: YARN-2752
 URL: https://issues.apache.org/jira/browse/YARN-2752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong

 TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it 
 passed in trunk. 
 The function code ContainerExecutor.getRunCommand() in trunk is different 
 from that in branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2

2014-10-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2752:

Attachment: YARN-2752.1-branch-2.patch

 TestContainerExecutor.testRunCommandNoPriority fails in branch-2
 

 Key: YARN-2752
 URL: https://issues.apache.org/jira/browse/YARN-2752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2752.1-branch-2.patch


 TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it 
 passed in trunk. 
 The function code ContainerExecutor.getRunCommand() in trunk is different 
 from that in branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml

2014-10-27 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2669:
--
Attachment: YARN-2669-3.patch

Thanks for the comments, [~bcwalrus]. How about replace the . with _dot_?

 FairScheduler: queueName shouldn't allow periods the allocation.xml
 ---

 Key: YARN-2669
 URL: https://issues.apache.org/jira/browse/YARN-2669
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch


 For an allocation file like:
 {noformat}
 allocations
   queue name=root.q1
 minResources4096mb,4vcores/minResources
   /queue
 /allocations
 {noformat}
 Users may wish to config minResources for a queue with full path root.q1. 
 However, right now, fair scheduler will treat this configureation for the 
 queue with full name root.root.q1. We need to print out a warning msg to 
 notify users about this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2745) YARN new pluggable scheduler which does multi-resource packing

2014-10-27 Thread Robert Grandl (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Grandl updated YARN-2745:

Attachment: (was: tetris_design_doc.docx)

 YARN new pluggable scheduler which does multi-resource packing
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_paper.pdf


 In this umbrella JIRA we propose a new pluggable scheduler, which accounts 
 for all resources used by a task (CPU, memory, disk, network) and it is able 
 to achieve three competing objectives: fairness, improve cluster utilization 
 and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185686#comment-14185686
 ] 

Hadoop QA commented on YARN-2704:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677352/YARN-2704.4.patch
  against trunk revision c05b581.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5579//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5579//console

This message is automatically generated.

  Localization and log-aggregation will fail if hdfs delegation token expired 
 after token-max-life-time
 --

 Key: YARN-2704
 URL: https://issues.apache.org/jira/browse/YARN-2704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, 
 YARN-2704.3.patch, YARN-2704.4.patch


 In secure mode, YARN requires the hdfs-delegation token to do localization 
 and log aggregation on behalf of the user. But the hdfs delegation token will 
 eventually expire after max-token-life-time.  So,  localization and log 
 aggregation will fail after the token expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2683) registry config options: document and move to core-default

2014-10-27 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185730#comment-14185730
 ] 

Gour Saha commented on YARN-2683:
-

Steve,
It looks great, just one minor thing -

Orig:
It is configured by way of {color:red}Hadoop a{color} Configuration class...

I think you meant:
It is configured by way of {color:blue}a Hadoop{color} Configuration class...

-Gour

 registry config options: document and move to core-default
 --

 Key: YARN-2683
 URL: https://issues.apache.org/jira/browse/YARN-2683
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2683-001.patch, YARN-2683-002.patch, 
 YARN-2683-003.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Add to {{yarn-site}} a page on registry configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2752) TestContainerExecutor.testRunCommandNoPriority fails in branch-2

2014-10-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185732#comment-14185732
 ] 

Xuan Gong commented on YARN-2752:
-

Initially, The function ContainerExecutor.getRunCommand has been implemented a 
little difference between trunk and branch-2 in YARN-443. Then in HADOOP-8562, 
we did a big merge from trunk which did not merge this function correctly. That 
is why right now, the code in trunk is different from that in branch-2. And 
this testcase passed in trunk, but failed in branch-2. 

 TestContainerExecutor.testRunCommandNoPriority fails in branch-2
 

 Key: YARN-2752
 URL: https://issues.apache.org/jira/browse/YARN-2752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2752.1-branch-2.patch


 TestContainerExecutor.testRunCommandNoPriority fails in branch-2. But it 
 passed in trunk. 
 The function code ContainerExecutor.getRunCommand() in trunk is different 
 from that in branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185813#comment-14185813
 ] 

Hadoop QA commented on YARN-2669:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677381/YARN-2669-3.patch
  against trunk revision 5b1dfe7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5582//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5582//console

This message is automatically generated.

 FairScheduler: queueName shouldn't allow periods the allocation.xml
 ---

 Key: YARN-2669
 URL: https://issues.apache.org/jira/browse/YARN-2669
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch


 For an allocation file like:
 {noformat}
 allocations
   queue name=root.q1
 minResources4096mb,4vcores/minResources
   /queue
 /allocations
 {noformat}
 Users may wish to config minResources for a queue with full path root.q1. 
 However, right now, fair scheduler will treat this configureation for the 
 queue with full name root.root.q1. We need to print out a warning msg to 
 notify users about this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2010) If RM fails to recover an app, it can never transition to active again

2014-10-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185845#comment-14185845
 ] 

Karthik Kambatla commented on YARN-2010:


[~vinodkv], [~jlowe], [~jianhe] - will any of you be able to review this? 
Thanks. 

 If RM fails to recover an app, it can never transition to active again
 --

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-2010.1.patch, YARN-2010.patch, 
 issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, 
 yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, yarn-2010-6.patch


 Sometimes, the RM fails to recover an application. It could be because of 
 turning security on, token expiry, or issues connecting to HDFS etc. The 
 causes could be classified into (1) transient, (2) specific to one 
 application, and (3) permanent and apply to multiple (all) applications. 
 Today, the RM fails to transition to Active and ends up in STOPPED state and 
 can never be transitioned to Active again.
 The initial stacktrace reported is at 
 https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager

2014-10-27 Thread zhihai xu (JIRA)
zhihai xu created YARN-2753:
---

 Summary: potential NPE in checkRemoveLabelsFromNode of 
CommonNodeLabelsManager
 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
CommonNodeLabelsManager.
It because when a Node is created, Node.labels can be null.
In this case, nm.labels; may be null.
So we need check originalLabels not null before use 
it(originalLabels.containsAll).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager

2014-10-27 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2753:

Attachment: YARN-2753.000.patch

 potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch


 potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 It because when a Node is created, Node.labels can be null.
 In this case, nm.labels; may be null.
 So we need check originalLabels not null before use 
 it(originalLabels.containsAll).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2502) Changes in distributed shell to support specify labels

2014-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2502:
-
Attachment: YARN-2502-20141027-2.patch

 Changes in distributed shell to support specify labels
 --

 Key: YARN-2502
 URL: https://issues.apache.org/jira/browse/YARN-2502
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, 
 YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, 
 YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2010) If RM fails to recover an app, it can never transition to active again

2014-10-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185883#comment-14185883
 ] 

Jian He commented on YARN-2010:
---

sorry, was caught up with something. I'll review today. 

 If RM fails to recover an app, it can never transition to active again
 --

 Key: YARN-2010
 URL: https://issues.apache.org/jira/browse/YARN-2010
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-2010.1.patch, YARN-2010.patch, 
 issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, 
 yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, yarn-2010-6.patch


 Sometimes, the RM fails to recover an application. It could be because of 
 turning security on, token expiry, or issues connecting to HDFS etc. The 
 causes could be classified into (1) transient, (2) specific to one 
 application, and (3) permanent and apply to multiple (all) applications. 
 Today, the RM fails to transition to Active and ends up in STOPPED state and 
 can never be transitioned to Active again.
 The initial stacktrace reported is at 
 https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk

2014-10-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185924#comment-14185924
 ] 

Xuan Gong commented on YARN-2749:
-

This is because of race condition and it is the testcase issue. 
This happens when we call AppLogAggregatorImpl.doLogAggregationOutOfBand which 
will notify and abort the wait to start to aggregate logs. The notify action 
might be happen be fore the AppLogAggregatorImpl thread starts. The simplest 
fix could be adding some thread.sleep before we call 
AppLogAggregatorImpl.doLogAggregationOutOfBand to make sure 
AppLogAggregatorImpl thread starts.

 Some testcases from TestLogAggregationService fails in trunk
 

 Key: YARN-2749
 URL: https://issues.apache.org/jira/browse/YARN-2749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong

 Some testcases from TestLogAggregationService fails in trunk. 
 Those can be reproduced in centos
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2502) Changes in distributed shell to support specify labels

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185930#comment-14185930
 ] 

Hadoop QA commented on YARN-2502:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677419/YARN-2502-20141027-2.patch
  against trunk revision 5b1dfe7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5583//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5583//console

This message is automatically generated.

 Changes in distributed shell to support specify labels
 --

 Key: YARN-2502
 URL: https://issues.apache.org/jira/browse/YARN-2502
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, 
 YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, 
 YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185931#comment-14185931
 ] 

Hadoop QA commented on YARN-2753:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677417/YARN-2753.000.patch
  against trunk revision 5b1dfe7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1287 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5584//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5584//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5584//console

This message is automatically generated.

 potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch


 potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 It because when a Node is created, Node.labels can be null.
 In this case, nm.labels; may be null.
 So we need check originalLabels not null before use 
 it(originalLabels.containsAll).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)
Siqi Li created YARN-2755:
-

 Summary: NM fails to clean up usercache_DEL_timestamp dirs after 
YARN-661
 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reassigned YARN-2755:
-

Assignee: Siqi Li

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2755:
--
Description: 
When NM restarts frequently due to some reason, a large number of directories 
like these left in /data/disk$num/yarn/local/:
/data/disk1/yarn/local/usercache_DEL_1414372756105
/data/disk1/yarn/local/usercache_DEL_1413557901696
/data/disk1/yarn/local/usercache_DEL_1413657004894
/data/disk1/yarn/local/usercache_DEL_1413675321860
/data/disk1/yarn/local/usercache_DEL_1414093167936
/data/disk1/yarn/local/usercache_DEL_1413565841271
These directories are empty, but take up 100M+ due to the number of them. There 
were 38714 on the machine I looked at per data disk.

It appears to be a regression introduced by YARN-661

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li

 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185948#comment-14185948
 ] 

Siqi Li commented on YARN-2755:
---

[~sjlee0] mentioned that NM's initialization is taking a LONG time because of 
this (this one's been doing this for 1 hour 41 minutes, and not all done), but 
monit isn't restarting it. To me, the ill effect is that the NM startup is 
taking a long time (and probably will get longer each time), and the 
directories are not getting cleaned up.

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li

 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time

2014-10-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185954#comment-14185954
 ] 

Vinod Kumar Vavilapalli commented on YARN-2704:
---

+1, looks good. Checking this in.

  Localization and log-aggregation will fail if hdfs delegation token expired 
 after token-max-life-time
 --

 Key: YARN-2704
 URL: https://issues.apache.org/jira/browse/YARN-2704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, 
 YARN-2704.3.patch, YARN-2704.4.patch


 In secure mode, YARN requires the hdfs-delegation token to do localization 
 and log aggregation on behalf of the user. But the hdfs delegation token will 
 eventually expire after max-token-life-time.  So,  localization and log 
 aggregation will fail after the token expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time

2014-10-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2704:
--
Priority: Critical  (was: Major)

This is critical for long running services. Getting this in as a critical item 
for 2.6.

  Localization and log-aggregation will fail if hdfs delegation token expired 
 after token-max-life-time
 --

 Key: YARN-2704
 URL: https://issues.apache.org/jira/browse/YARN-2704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
Priority: Critical
 Attachments: YARN-2704.1.patch, YARN-2704.2.patch, YARN-2704.2.patch, 
 YARN-2704.3.patch, YARN-2704.4.patch


 In secure mode, YARN requires the hdfs-delegation token to do localization 
 and log aggregation on behalf of the user. But the hdfs delegation token will 
 eventually expire after max-token-life-time.  So,  localization and log 
 aggregation will fail after the token expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time

2014-10-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185963#comment-14185963
 ] 

Hudson commented on YARN-2704:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6357 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6357/])
YARN-2704. Changed ResourceManager to optionally obtain tokens itself for the 
sake of localization and log-aggregation for long-running services. Contributed 
by Jian He. (vinodkv: rev a16d022ca4313a41425c8e97841c841a2d6f2f54)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestProtocolRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java


  Localization and log-aggregation will fail if hdfs delegation token expired 
 after token-max-life-time
 --

 Key: YARN-2704
 URL: https://issues.apache.org/jira/browse/YARN-2704
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
Priority: 

[jira] [Commented] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager

2014-10-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185977#comment-14185977
 ] 

Wangda Tan commented on YARN-2753:
--

[~zxu],
Nice finding! Thanks for the patch, looks good to me. And could you remove 
{code}
labels == null
{code}

Of the check, 

{code}
  if (labels == null || labels.isEmpty()) {
continue;
  }
{code}

In the patch together? Since the labels will never be null.

Thanks,
Wangda

 potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch


 potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 It because when a Node is created, Node.labels can be null.
 In this case, nm.labels; may be null.
 So we need check originalLabels not null before use 
 it(originalLabels.containsAll).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager

2014-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2753:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2492

 potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch


 potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 It because when a Node is created, Node.labels can be null.
 In this case, nm.labels; may be null.
 So we need check originalLabels not null before use 
 it(originalLabels.containsAll).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2755:
--
Attachment: YARN-2755.v1.patch

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2755.v1.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2755:
--
Priority: Critical  (was: Major)

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2754) addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java.

2014-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2754:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2492

 addToCluserNodeLabels should be protected by writeLock in 
 RMNodeLabelsManager.java.
 ---

 Key: YARN-2754
 URL: https://issues.apache.org/jira/browse/YARN-2754
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2754.000.patch


 addToCluserNodeLabels should be protected by writeLock in 
 RMNodeLabelsManager.java. because we should protect labelCollections in 
 RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2754) addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java.

2014-10-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185982#comment-14185982
 ] 

Wangda Tan commented on YARN-2754:
--

Zhihai,
Thanks reporting and the fix,
Patch LGTM +1,

Wangda

 addToCluserNodeLabels should be protected by writeLock in 
 RMNodeLabelsManager.java.
 ---

 Key: YARN-2754
 URL: https://issues.apache.org/jira/browse/YARN-2754
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2754.000.patch


 addToCluserNodeLabels should be protected by writeLock in 
 RMNodeLabelsManager.java. because we should protect labelCollections in 
 RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.

2014-10-27 Thread zhihai xu (JIRA)
zhihai xu created YARN-2756:
---

 Summary: use static variable (Resources.none()) for not-running 
Node.resource in CommonNodeLabelsManager to save memory.
 Key: YARN-2756
 URL: https://issues.apache.org/jira/browse/YARN-2756
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor


use static variable (Resources.none()) for not-running Node.resource in 
CommonNodeLabelsManager to save memory. When a Node is not activated, the 
resource is never used, When a Node is activated, a new resource will be 
assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource;) So 
it would be better to use static variable Resources.none() instead of 
allocating a new variable(Resource.newInstance(0, 0)) for each node 
deactivation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.

2014-10-27 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2756:

Description: use static variable (Resources.none()) for not-running 
Node.resource in CommonNodeLabelsManager to save memory. When a Node is not 
activated, the resource is never used, When a Node is activated, a new resource 
will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = 
resource) So it would be better to use static variable Resources.none() instead 
of allocating a new variable(Resource.newInstance(0, 0)) for each node 
deactivation.  (was: use static variable (Resources.none()) for not-running 
Node.resource in CommonNodeLabelsManager to save memory. When a Node is not 
activated, the resource is never used, When a Node is activated, a new resource 
will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = 
resource;) So it would be better to use static variable Resources.none() 
instead of allocating a new variable(Resource.newInstance(0, 0)) for each node 
deactivation.)

 use static variable (Resources.none()) for not-running Node.resource in 
 CommonNodeLabelsManager to save memory.
 ---

 Key: YARN-2756
 URL: https://issues.apache.org/jira/browse/YARN-2756
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor

 use static variable (Resources.none()) for not-running Node.resource in 
 CommonNodeLabelsManager to save memory. When a Node is not activated, the 
 resource is never used, When a Node is activated, a new resource will be 
 assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) 
 So it would be better to use static variable Resources.none() instead of 
 allocating a new variable(Resource.newInstance(0, 0)) for each node 
 deactivation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.

2014-10-27 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2756:

Attachment: YARN-2756.000.patch

 use static variable (Resources.none()) for not-running Node.resource in 
 CommonNodeLabelsManager to save memory.
 ---

 Key: YARN-2756
 URL: https://issues.apache.org/jira/browse/YARN-2756
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Minor
 Attachments: YARN-2756.000.patch


 use static variable (Resources.none()) for not-running Node.resource in 
 CommonNodeLabelsManager to save memory. When a Node is not activated, the 
 resource is never used, When a Node is activated, a new resource will be 
 assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) 
 So it would be better to use static variable Resources.none() instead of 
 allocating a new variable(Resource.newInstance(0, 0)) for each node 
 deactivation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager

2014-10-27 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2741:
--
Attachment: YARN-2741.6.patch

Added unit tests

 Windows: Node manager cannot serve up log files via the web user interface 
 when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
 drive that nodemanager is running on)
 --

 Key: YARN-2741
 URL: https://issues.apache.org/jira/browse/YARN-2741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2741.1.patch, YARN-2741.6.patch


 PROBLEM: User is getting No Logs available for Container Container_number 
 when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
 STEPS TO REPRODUCE:
 On Windows
 1) Run NodeManager on C:
 2) Create two local drive partitions D: and E:
 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
 4) Run a MR job that will last at least 5 minutes
 5) While the job is in flight, log into the Yarn web ui , 
 resource_manager_server:8088/cluster
 6) Click on the application_idnumber
 7) Click on the logs link, you will get No Logs available for Container 
 Container_number
 ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
 EXPECTED BEHAVIOR: Able to use different drive letters in 
 yarn.nodemanager.log-dirs and not get error
 NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
 to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2502) Changes in distributed shell to support specify labels

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186009#comment-14186009
 ] 

Hadoop QA commented on YARN-2502:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677419/YARN-2502-20141027-2.patch
  against trunk revision a16d022.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5586//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5586//console

This message is automatically generated.

 Changes in distributed shell to support specify labels
 --

 Key: YARN-2502
 URL: https://issues.apache.org/jira/browse/YARN-2502
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, 
 YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, 
 YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2755:
--
Attachment: (was: YARN-2755.v1.patch)

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical

 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2755:
--
Attachment: YARN-2755.v1.patch

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager

2014-10-27 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2753:

Attachment: YARN-2753.001.patch

 potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch, YARN-2753.001.patch


 potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 It because when a Node is created, Node.labels can be null.
 In this case, nm.labels; may be null.
 So we need check originalLabels not null before use 
 it(originalLabels.containsAll).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186048#comment-14186048
 ] 

Hadoop QA commented on YARN-2755:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677439/YARN-2755.v1.patch
  against trunk revision a16d022.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5587//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5587//console

This message is automatically generated.

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2754) addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java.

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186049#comment-14186049
 ] 

Hadoop QA commented on YARN-2754:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677428/YARN-2754.000.patch
  against trunk revision 5b1dfe7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5585//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5585//console

This message is automatically generated.

 addToCluserNodeLabels should be protected by writeLock in 
 RMNodeLabelsManager.java.
 ---

 Key: YARN-2754
 URL: https://issues.apache.org/jira/browse/YARN-2754
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2754.000.patch


 addToCluserNodeLabels should be protected by writeLock in 
 RMNodeLabelsManager.java. because we should protect labelCollections in 
 RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager

2014-10-27 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186047#comment-14186047
 ] 

zhihai xu commented on YARN-2753:
-

[~leftnoteasy], thanks to review the patch, I remove labels == null in the new 
patch YARN-2753.001.patch.

 potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch, YARN-2753.001.patch


 potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 It because when a Node is created, Node.labels can be null.
 In this case, nm.labels; may be null.
 So we need check originalLabels not null before use 
 it(originalLabels.containsAll).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk

2014-10-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2749:

Attachment: YARN-2749.1.patch

 Some testcases from TestLogAggregationService fails in trunk
 

 Key: YARN-2749
 URL: https://issues.apache.org/jira/browse/YARN-2749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2749.1.patch


 Some testcases from TestLogAggregationService fails in trunk. 
 Those can be reproduced in centos
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186079#comment-14186079
 ] 

Sangjin Lee commented on YARN-2755:
---

@siqi, it would be good to elaborate on the nature of the bug a little more?


 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186094#comment-14186094
 ] 

Hadoop QA commented on YARN-2741:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677445/YARN-2741.6.patch
  against trunk revision 00b4e44.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5589//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5589//console

This message is automatically generated.

 Windows: Node manager cannot serve up log files via the web user interface 
 when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
 drive that nodemanager is running on)
 --

 Key: YARN-2741
 URL: https://issues.apache.org/jira/browse/YARN-2741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2741.1.patch, YARN-2741.6.patch


 PROBLEM: User is getting No Logs available for Container Container_number 
 when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
 STEPS TO REPRODUCE:
 On Windows
 1) Run NodeManager on C:
 2) Create two local drive partitions D: and E:
 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
 4) Run a MR job that will last at least 5 minutes
 5) While the job is in flight, log into the Yarn web ui , 
 resource_manager_server:8088/cluster
 6) Click on the application_idnumber
 7) Click on the logs link, you will get No Logs available for Container 
 Container_number
 ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
 EXPECTED BEHAVIOR: Able to use different drive letters in 
 yarn.nodemanager.log-dirs and not get error
 NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
 to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186105#comment-14186105
 ] 

Siqi Li commented on YARN-2755:
---

When NM starts up, it moves usercache to usercache_DEL_timestamp, and it 
tries delete everything inside usercache_DEL_timestamp, and then it will 
delete usercache_DEL_timestamp.

However, when there is nothing in usercache when NM starts up, the 
usercache_DEL_timestamp will not be delete properly.

The reason is that FileContext.listStatus(userDirPath) will not return null if 
userDirPath is a valid empty directory. That's why empty 
usercache_DEL_timestamp directories are not cleaned up properly.

Especially, when DN/NM is flapping, a large number of empty 
usercache_DEL_timestamp directories will be generated, which will take a 
certain amount of space and slow down NM start up process

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2757) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.

2014-10-27 Thread zhihai xu (JIRA)
zhihai xu created YARN-2757:
---

 Summary: potential NPE in checkNodeLabelExpression of 
SchedulerUtils for nodeLabels.
 Key: YARN-2757
 URL: https://issues.apache.org/jira/browse/YARN-2757
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: zhihai xu
Assignee: zhihai xu


pontential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
since we check the nodeLabels null at 
{code}
if (!str.trim().isEmpty()
 (nodeLabels == null || !nodeLabels.contains(str.trim( {
  return false;
}
{code}
We should also check nodeLabels null at 
{code}
  if (!nodeLabels.isEmpty()) {
return false;
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2757) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.

2014-10-27 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2757:

Issue Type: Bug  (was: Improvement)

 potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
 ---

 Key: YARN-2757
 URL: https://issues.apache.org/jira/browse/YARN-2757
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu

 pontential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
 since we check the nodeLabels null at 
 {code}
 if (!str.trim().isEmpty()
  (nodeLabels == null || !nodeLabels.contains(str.trim( {
   return false;
 }
 {code}
 We should also check nodeLabels null at 
 {code}
   if (!nodeLabels.isEmpty()) {
 return false;
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186115#comment-14186115
 ] 

Hadoop QA commented on YARN-2755:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677456/YARN-2755.v1.patch
  against trunk revision 00b4e44.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5590//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5590//console

This message is automatically generated.

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2698:
-
Target Version/s: 2.6.0

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Mayank Bansal
Priority: Critical

 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2698:
-
Priority: Critical  (was: Major)

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Mayank Bansal
Priority: Critical

 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store

2014-10-27 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2758:
-

 Summary: Update TestApplicationHistoryClientService to use the new 
generic history store
 Key: YARN-2758
 URL: https://issues.apache.org/jira/browse/YARN-2758
 Project: Hadoop YARN
  Issue Type: Test
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


TestApplicationHistoryClientService is still testing against the mock data in 
the old MemoryApplicationHistoryStore. hence it needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2753) potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186125#comment-14186125
 ] 

Hadoop QA commented on YARN-2753:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677455/YARN-2753.001.patch
  against trunk revision 00b4e44.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5591//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5591//console

This message is automatically generated.

 potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch, YARN-2753.001.patch


 potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 It because when a Node is created, Node.labels can be null.
 In this case, nm.labels; may be null.
 So we need check originalLabels not null before use 
 it(originalLabels.containsAll).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-10-27 Thread sidharta seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186126#comment-14186126
 ] 

sidharta seethana commented on YARN-1964:
-

Hi Abin,

I applied the patch to a recent snapshot of 2.6 and tested it out. The patch 
needs to be re-based (the changes are minor) to fix some minor compilation 
issues. Also, docker pull for the image mentioned in the example ( 
altiscale/hadoop-docker ) appears to be large and takes long time to pull - 
what is the reason for this? Is there a way we can bring down the size of the 
image?  

thanks,
-Sid




 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
 Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
 YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk

2014-10-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186139#comment-14186139
 ] 

Hadoop QA commented on YARN-2749:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677462/YARN-2749.1.patch
  against trunk revision 00b4e44.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5592//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5592//console

This message is automatically generated.

 Some testcases from TestLogAggregationService fails in trunk
 

 Key: YARN-2749
 URL: https://issues.apache.org/jira/browse/YARN-2749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2749.1.patch


 Some testcases from TestLogAggregationService fails in trunk. 
 Those can be reproduced in centos
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2759) addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset.

2014-10-27 Thread zhihai xu (JIRA)
zhihai xu created YARN-2759:
---

 Summary: addToCluserNodeLabels should not change the value in 
labelCollections if the key already exists to avoid the Label.resource is reset.
 Key: YARN-2759
 URL: https://issues.apache.org/jira/browse/YARN-2759
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


addToCluserNodeLabels should not change the value in labelCollections if the 
key already exists to avoid the Label.resource is reset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2759) addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset.

2014-10-27 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2759:

Attachment: YARN-2759.000.patch

 addToCluserNodeLabels should not change the value in labelCollections if the 
 key already exists to avoid the Label.resource is reset.
 -

 Key: YARN-2759
 URL: https://issues.apache.org/jira/browse/YARN-2759
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2759.000.patch


 addToCluserNodeLabels should not change the value in labelCollections if the 
 key already exists to avoid the Label.resource is reset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >