[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields

2014-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141791#comment-14141791
 ] 

Hadoop QA commented on YARN-668:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670165/YARN-668.patch
  against trunk revision f85cc14.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 22 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.TestApplication
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5058//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5058//artifact/PreCommit-HADOOP-Build-patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5058//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5058//console

This message is automatically generated.

 TokenIdentifier serialization should consider Unknown fields
 

 Key: YARN-668
 URL: https://issues.apache.org/jira/browse/YARN-668
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-668-demo.patch, YARN-668.patch


 This would allow changing of the TokenIdentifier between versions. The 
 current serialization is Writable. A simple way to achieve this would be to 
 have a Proto object as the payload for TokenIdentifiers, instead of 
 individual fields.
 TokenIdentifier continues to implement Writable to work with the RPC layer - 
 but the payload itself is serialized using PB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore

2014-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141931#comment-14141931
 ] 

Hudson commented on YARN-2565:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #686 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/686/])
YARN-2565. Fixed RM to not use FileSystemApplicationHistoryStore unless 
explicitly set. Contributed by Zhijie Shen (jianhe: rev 
444acf8ea795e4bc782f1ce3b5ef7a1a47d1d27d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


 RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting 
 FileSystemApplicationHistoryStore
 ---

 Key: YARN-2565
 URL: https://issues.apache.org/jira/browse/YARN-2565
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.6.0
 Environment: Secure cluster with ATS (timeline server enabled) and 
 yarn.resourcemanager.system-metrics-publisher.enabled=true
 so that RM can send Application history to Timeline Store
Reporter: Karam Singh
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2565.1.patch, YARN-2565.2.patch, YARN-2565.3.patch


 Observed that RM fails to start in Secure mode when GenericeHistoryService is 
 enabled and ResourceManager is set to use Timeline Store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141930#comment-14141930
 ] 

Hudson commented on YARN-2460:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #686 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/686/])
YARN-2460. Remove obsolete entries from yarn-default.xml (Ray Chiang via aw) 
(aw: rev aa1052c34b78b5b8b6a1498c8c842d21b07fceca)
* hadoop-tools/hadoop-sls/src/main/data/2jobs2min-rumen-jh.json
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_1329348432655_0001_conf.xml


 Remove obsolete entries from yarn-default.xml
 -

 Key: YARN-2460
 URL: https://issues.apache.org/jira/browse/YARN-2460
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2460-01.patch, YARN-2460-02.patch


 The following properties are defined in yarn-default.xml, but do not exist in 
 YarnConfiguration.
   mapreduce.job.hdfs-servers
   mapreduce.job.jar
   yarn.ipc.exception.factory.class
   yarn.ipc.serializer.type
   yarn.nodemanager.aux-services.mapreduce_shuffle.class
   yarn.nodemanager.hostname
   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
   yarn.nodemanager.resourcemanager.connect.wait.secs
   yarn.resourcemanager.amliveliness-monitor.interval-ms
   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.container.liveness-monitor.interval-ms
   yarn.resourcemanager.nm.liveness-monitor.interval-ms
   yarn.timeline-service.hostname
   yarn.timeline-service.http-authentication.simple.anonymous.allowed
   yarn.timeline-service.http-authentication.type
 Presumably, the mapreduce.* properties are okay.  Similarly, the 
 yarn.timeline-service.* properties are for the future TimelineService.  
 However, the rest are likely fully deprecated.
 Submitting bug for comment/feedback about which other properties should be 
 kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2561) MR job client cannot reconnect to AM after NM restart.

2014-09-20 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2561:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-666

 MR job client cannot reconnect to AM after NM restart.
 --

 Key: YARN-2561
 URL: https://issues.apache.org/jira/browse/YARN-2561
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Tassapol Athiapinya
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, 
 YARN-2561-v4.patch, YARN-2561-v5.patch, YARN-2561.patch


 Work-preserving NM restart is disabled.
 Submit a job. Restart the only NM and found that Job will hang with connect 
 retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2289) ApplicationHistoryStore should be versioned

2014-09-20 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-2289:


Assignee: Junping Du

 ApplicationHistoryStore should be versioned
 ---

 Key: YARN-2289
 URL: https://issues.apache.org/jira/browse/YARN-2289
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications
Reporter: Junping Du
Assignee: Junping Du





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2464) Provide Hadoop as a local resource (on HDFS) which can be used by other projcets

2014-09-20 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2464:
-
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-666

 Provide Hadoop as a local resource (on HDFS) which can be used by other 
 projcets
 

 Key: YARN-2464
 URL: https://issues.apache.org/jira/browse/YARN-2464
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Junping Du

 DEFAULT_YARN_APPLICATION_CLASSPATH are used by YARN projects to setup their 
 AM / task classpaths if they have a dependency on Hadoop libraries.
 It'll be useful to provide similar access to a Hadoop tarball (Hadoop libs, 
 native libraries) etc, which could be used instead - for applications which 
 do not want to rely upon Hadoop versions from a cluster node. This would also 
 require functionality to update the classpath/env for the apps based on the 
 structure of the tar.
 As an example, MR has support for a full tar (for rolling upgrades). 
 Similarly, Tez ships hadoop libraries along with it's build. I'm not sure 
 about the Spark / Storm / HBase model for this - but using a common copy 
 instead of everyone localizing Hadoop libraries would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore

2014-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141973#comment-14141973
 ] 

Hudson commented on YARN-2565:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1877 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1877/])
YARN-2565. Fixed RM to not use FileSystemApplicationHistoryStore unless 
explicitly set. Contributed by Zhijie Shen (jianhe: rev 
444acf8ea795e4bc782f1ce3b5ef7a1a47d1d27d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java


 RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting 
 FileSystemApplicationHistoryStore
 ---

 Key: YARN-2565
 URL: https://issues.apache.org/jira/browse/YARN-2565
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.6.0
 Environment: Secure cluster with ATS (timeline server enabled) and 
 yarn.resourcemanager.system-metrics-publisher.enabled=true
 so that RM can send Application history to Timeline Store
Reporter: Karam Singh
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2565.1.patch, YARN-2565.2.patch, YARN-2565.3.patch


 Observed that RM fails to start in Secure mode when GenericeHistoryService is 
 enabled and ResourceManager is set to use Timeline Store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141972#comment-14141972
 ] 

Hudson commented on YARN-2460:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1877 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1877/])
YARN-2460. Remove obsolete entries from yarn-default.xml (Ray Chiang via aw) 
(aw: rev aa1052c34b78b5b8b6a1498c8c842d21b07fceca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_1329348432655_0001_conf.xml
* hadoop-tools/hadoop-sls/src/main/data/2jobs2min-rumen-jh.json


 Remove obsolete entries from yarn-default.xml
 -

 Key: YARN-2460
 URL: https://issues.apache.org/jira/browse/YARN-2460
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2460-01.patch, YARN-2460-02.patch


 The following properties are defined in yarn-default.xml, but do not exist in 
 YarnConfiguration.
   mapreduce.job.hdfs-servers
   mapreduce.job.jar
   yarn.ipc.exception.factory.class
   yarn.ipc.serializer.type
   yarn.nodemanager.aux-services.mapreduce_shuffle.class
   yarn.nodemanager.hostname
   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
   yarn.nodemanager.resourcemanager.connect.wait.secs
   yarn.resourcemanager.amliveliness-monitor.interval-ms
   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.container.liveness-monitor.interval-ms
   yarn.resourcemanager.nm.liveness-monitor.interval-ms
   yarn.timeline-service.hostname
   yarn.timeline-service.http-authentication.simple.anonymous.allowed
   yarn.timeline-service.http-authentication.type
 Presumably, the mapreduce.* properties are okay.  Similarly, the 
 yarn.timeline-service.* properties are for the future TimelineService.  
 However, the rest are likely fully deprecated.
 Submitting bug for comment/feedback about which other properties should be 
 kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.trunk.8.patch

.trunk.8.patch is rebased to new repo current trunk and has the vcxproj/sln 
hunks manually fixed to CRLF

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, 
 YARN-2198.trunk.crlf.6.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141987#comment-14141987
 ] 

Hadoop QA commented on YARN-2198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12670215/YARN-2198.trunk.8.patch
  against trunk revision f85cc14.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5059//console

This message is automatically generated.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, 
 YARN-2198.trunk.crlf.6.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-09-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142001#comment-14142001
 ] 

Wangda Tan commented on YARN-2496:
--

Craig, still about #2, I think what you commented is make sense to me, AM can 
get a more precise headroom to plan its following resource usage, but I think:
1) It may not enough as what you said:
bq. For this reason, headroom should reflect the labels in the last resource 
request from the application, not the queue's labels.
It is possible an AM sent resource requests with different label expression, so 
what we will response headroom back to AM?
I think maybe we need a new field in AllocateRequest to request different 
headrooms under different label expression.

2) Even with 1), I cannot think of a good way to fast computing random label 
expression in an acceptable time complexity, it is possible thousands of 
different label expression existed in a big cluster at the same time. Our 
current implementation can make sure resource of labels of a queue will 
up-to-date whenever resource change happened.

With 1) and 2). I suggest to make it as a pending task, and we can deal it in 
the future.

About 
bq. -re 5, I though * could be in requests, if no, then should not be an issue.
Yes, we doesn't support specify * in requests, because it may cause some 
possible resource wastage. AM should clearly know what resource it needed.

Thanks,
Wangda

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: (was: YARN-2198.trunk.8.patch)

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.crlf.6.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: (was: YARN-2198.trunk.crlf.6.patch)

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.trunk.8.patch

Fix -Project^M

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore

2014-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142067#comment-14142067
 ] 

Hudson commented on YARN-2565:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1902 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1902/])
YARN-2565. Fixed RM to not use FileSystemApplicationHistoryStore unless 
explicitly set. Contributed by Zhijie Shen (jianhe: rev 
444acf8ea795e4bc782f1ce3b5ef7a1a47d1d27d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java


 RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting 
 FileSystemApplicationHistoryStore
 ---

 Key: YARN-2565
 URL: https://issues.apache.org/jira/browse/YARN-2565
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.6.0
 Environment: Secure cluster with ATS (timeline server enabled) and 
 yarn.resourcemanager.system-metrics-publisher.enabled=true
 so that RM can send Application history to Timeline Store
Reporter: Karam Singh
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2565.1.patch, YARN-2565.2.patch, YARN-2565.3.patch


 Observed that RM fails to start in Secure mode when GenericeHistoryService is 
 enabled and ResourceManager is set to use Timeline Store



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml

2014-09-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142066#comment-14142066
 ] 

Hudson commented on YARN-2460:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1902 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1902/])
YARN-2460. Remove obsolete entries from yarn-default.xml (Ray Chiang via aw) 
(aw: rev aa1052c34b78b5b8b6a1498c8c842d21b07fceca)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_1329348432655_0001_conf.xml
* hadoop-yarn-project/CHANGES.txt
* hadoop-tools/hadoop-sls/src/main/data/2jobs2min-rumen-jh.json


 Remove obsolete entries from yarn-default.xml
 -

 Key: YARN-2460
 URL: https://issues.apache.org/jira/browse/YARN-2460
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2460-01.patch, YARN-2460-02.patch


 The following properties are defined in yarn-default.xml, but do not exist in 
 YarnConfiguration.
   mapreduce.job.hdfs-servers
   mapreduce.job.jar
   yarn.ipc.exception.factory.class
   yarn.ipc.serializer.type
   yarn.nodemanager.aux-services.mapreduce_shuffle.class
   yarn.nodemanager.hostname
   yarn.nodemanager.resourcemanager.connect.retry_interval.secs
   yarn.nodemanager.resourcemanager.connect.wait.secs
   yarn.resourcemanager.amliveliness-monitor.interval-ms
   yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.container.liveness-monitor.interval-ms
   yarn.resourcemanager.nm.liveness-monitor.interval-ms
   yarn.timeline-service.hostname
   yarn.timeline-service.http-authentication.simple.anonymous.allowed
   yarn.timeline-service.http-authentication.type
 Presumably, the mapreduce.* properties are okay.  Similarly, the 
 yarn.timeline-service.* properties are for the future TimelineService.  
 However, the rest are likely fully deprecated.
 Submitting bug for comment/feedback about which other properties should be 
 kept in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142086#comment-14142086
 ] 

Steve Loughran commented on YARN-913:
-

bq. I have some concern around 'naked' zookeeper.* config option

This something that I do think needs changing in ZK; being driven by JVM 
properties can work for standalone JVM servers, but not for clients. The client 
here sets the properties just before needed (e.g. the SASL auth details), and I 
was thinking of making the set-connect operation class synchronized. 
But...curator does some session restarting and if those JVM-wide settings are 
changed, there may be problems. Summary: need to fix ZK client and then have 
curator configure it, so the rest of us don't have to care.

bq.  if a user kills the ZK used for app registry through some action, what 
happens to the RM and other user's bits that are running

# The RM isn't depending on the ZK cluster for information; it just sets up the 
paths for a user, and does purges of container  app lifespan parts on their 
completion. I've made both the setup and teardown operations async; the 
{{RMRegistryOperationsService}} class gets the RM event and schedules the work 
on its executor. If ZK is offline then these will block until the quorum is 
back, but it should not delay RM operations. It could block the clients and the 
AM starting up.

# Curator supports different {{EnsembleProviders}} .. classes which provide the 
data needed for the client to reconnect to ZK. The code is currently only 
hooked up to one -the {{FixedEnsembleProvider}}, which uses a classic static ZK 
quorum. There's an alternative, the {{ExhibitorProvider}}, which hooks up to 
[Netflix Exhibitor|https://github.com/Netflix/exhibitor/wiki|] and can do 
things like [[Rolling Ensemble 
Change|https://github.com/Netflix/exhibitor/wiki/Rolling-Ensemble-Change]]. 
This is designed for cloud deployments where a ZK server failure results in a 
new host coming up, with new hostname/address ... exhibitor handles the details 
of rebinding.  

I haven't added explicit support for that (straightforward) or got a test setup 
(harder). If you want to play with it though ...


bq. Why doesn't the hostname component allow for FQDNs?

do you mean in the endpoint fields? It should ... let me clarify that in the 
example.

bq. Are we prepared for more backlash when another component requires working 
DNS?

The reason the initial patches here weren't building is a helper method to 
build up an endpoint address from an {{InetSocketAddress}} called 
{{getHostString()}} to get the host/FQDN, without doing any DNS work. I had to 
switch to {{getHostName()}}, which can try to do rDNS, and so rely on DNS 
working.

bq. Is ZK the right thing to use here?


# ZK gives us availability; I do plan to add a REST API later on, one that 
works long-haul. It's why there is deliberately no support for ephemeral nodes 
... the {{RegistryOperations}} interface is designed to implementable by a REST 
client, for which there won't be any sessions to tie ephemeral nodes to. 

# By deliberately publishing nothing but endpoints to services, we're trying to 
keep the content in the store down, with the bulk data being served up by other 
means. In slider, we are publishing dynamically generated config files from the 
AM REST API; all the registry entry does is list the API + URL for that 
service. 

# I do like your idea about just sticking stuff into HDFS, S3, etc.; that's a 
way to share content too, including config data. It'll fit into the general 
category of URL formatted endpoint —maybe I should add it as an explicit 
address type, filesystem? 



 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register 

[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-09-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142090#comment-14142090
 ] 

Steve Loughran commented on YARN-913:
-

Oh, one more thing, that {{MicroZookeeperService}} which is used in tests? It's 
a YARN service-wrapped ZK microservice (based on Twill's test one), which can 
publish its ensemble information to registry clients running in-VM. This would 
make it straightforward to be deployed *inside* the RM ... in a small 1-2 node 
cluster it wouldn't be a load problem, and as the lifespan of the ZK == 
lifespan of RM, no worry about having a single ZK quorum outage impacting the 
RM.

I've not put the service under the RM. Someone is free to at some point in the 
future.

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-09-20 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1492:
--
Priority: Critical  (was: Major)
Target Version/s: 2.6.0

Tx for the notes [~ctrezzo]!

I am marking this as critical for 2.6, given how long it's been out in the 
open. Started reviewing the patches.

 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2168) SCM/Client/NM/Admin protocols

2014-09-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142139#comment-14142139
 ] 

Vinod Kumar Vavilapalli commented on YARN-2168:
---

Few comments on the APIs
 - Let's mark all the APIs as evolving, or may be even unstable.
 - The setters for responses and objects that are supposed to be only created 
by the server should be marked Private - we don't expect users to use them. For 
e.g. UseSharedCacheResourceResponse.setPath()
 - Let's move SCMAdminProtocol  and all related records to 
org.apache.hadoop.yarn.server.api  
org.apache.hadoop.yarn.server.api.protocolrecords packages.
 - We are using checksum, key, resource-key to all refer to the same entity. 
Shall we standardize on resource-key?

 SCM/Client/NM/Admin protocols
 -

 Key: YARN-2168
 URL: https://issues.apache.org/jira/browse/YARN-2168
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch


 This jira is meant to be used to review the main shared cache APIs. They are 
 as follows:
 * ClientSCMProtocol - The protocol between the yarn client and the cache 
 manager. This protocol controls how resources in the cache are claimed and 
 released.
 ** UseSharedCacheResourceRequest
 ** UseSharedCacheResourceResponse
 ** ReleaseSharedCacheResourceRequest
 ** ReleaseSharedCacheResourceResponse
 * SCMAdminProtocol - This is an administrative protocol for the cache 
 manager. It allows administrators to manually trigger cleaner runs.
 ** RunSharedCacheCleanerTaskRequest
 ** RunSharedCacheCleanerTaskResponse
 * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the 
 cache manager. This allows the NodeManager to coordinate with the cache 
 manager when uploading new resources to the shared cache.
 ** NotifySCMRequest
 ** NotifySCMResponse



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2179) Initial cache manager structure and context

2014-09-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142157#comment-14142157
 ] 

Vinod Kumar Vavilapalli commented on YARN-2179:
---

Some comments on the patch
 - Rename config yarn.sharedcache.root to root-path or root-dir?
 - I cannot see why sharedcachemanager depends on resourcemanager module, given 
I haven't seen the entire feature related code yet. Ideally
-- sharedcachemanager simply uses yarn-client
-- scharedcachemanager is its own module
-- ResourceManager can embed shared-cache-manager by making it a run-time 
dependency (and thus not depend on it at compile time)
 - AppChecker.appIsActive() - isApplicationActive() and getAllActiveApps() - 
getActiveApplications()? (I tend to favor controlled verbosity :) )
 - RemoteAppChecker won't work in when RM-failover is enabled. You are better 
off simply using YarnClient instead of building all of that functionality from 
scratch again. Similarly, for getAllActiveApps(), we can just use 
{{ListApplicationReport getApplications(EnumSetYarnApplicationState 
applicationStates)}} from YarnClient.

bq. Would it make more sense to leverage getFinalApplicationStatus() instead of 
getYarnApplicationState()? That way we can just say if the 
FinalApplicationStatus is undefined don't clean it up, otherwise we are safe to 
delete the appId.
FinalApplicationStatus is filled in by user-APIs and some applications may 
chose to leave it as UNDEFINED, so we cannot depend on it. I propose that we 
leave the usage of ApplicationState and add an API in YarnClient/RM to detect 
active states in a follow up.

 Initial cache manager structure and context
 ---

 Key: YARN-2179
 URL: https://issues.apache.org/jira/browse/YARN-2179
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v2.patch, 
 YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, YARN-2179-trunk-v5.patch


 Implement the initial shared cache manager structure and context. The 
 SCMContext will be used by a number of manager services (i.e. the backing 
 store and the cleaner service). The AppChecker is used to gather the 
 currently running applications on SCM startup (necessary for an scm that is 
 backed by an in-memory store).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2180) In-memory backing store for cache manager

2014-09-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142167#comment-14142167
 ] 

Vinod Kumar Vavilapalli commented on YARN-2180:
---

The patch looks fine overall, some comments
 - yarn.sharedcache.manager.store.impl - yarn.sharedcache.store or store-class
 - YarnConfiguration.SCM_STORE_IMPL - SCM_STORE/SCM_STORE_CLASS
 - We already have Resource, LocalResource. To avoid confuse, shall we use 
SharedCacheResource and hence ResourceReference - SharedCacheResourceReference 
and so on everywhere?
 - InMemoryStore.bootstrap() can be done as part of serviceInit()
 - Synchronization is missing from the InMemoryStore operations? You are using 
a ConcurrentHashMap but to insert correctly (multiple apps adding a cache-entry 
to the same path) you'll need to use {{putIfAbsent}}? Surprised the test is 
presumably passing.

 In-memory backing store for cache manager
 -

 Key: YARN-2180
 URL: https://issues.apache.org/jira/browse/YARN-2180
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
 YARN-2180-trunk-v3.patch


 Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142173#comment-14142173
 ] 

Remus Rusanu commented on YARN-2198:


Build error is  
[exec] 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c:1444:12:
 error: 'INVALID_HANDLE_VALUE' undeclared (first use in this function)
 [exec]  return INVALID_HANDLE_VALUE; 

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.trunk.8.patch

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: (was: YARN-2198.trunk.8.patch)

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling

2014-09-20 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142198#comment-14142198
 ] 

Wei Yan commented on YARN-2252:
---

+1 for the proposal, [~kasha].

 Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
 

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn
 Attachments: YARN-2252-1.patch


 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142201#comment-14142201
 ] 

Hadoop QA commented on YARN-2198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12670247/YARN-2198.trunk.8.patch
  against trunk revision db890ee.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/5061//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5061//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5061//artifact/PreCommit-HADOOP-Build-patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5061//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5061//console

This message is automatically generated.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, 
 YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, 
 YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-20 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-2554:
-
Attachment: YARN-2554.3.patch

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-20 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-2554:
-
Attachment: YARN-2554.3.patch

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, 
 YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142212#comment-14142212
 ] 

Hadoop QA commented on YARN-2554:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670250/YARN-2554.3.patch
  against trunk revision 84a0a62.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5062//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5062//console

This message is automatically generated.

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, 
 YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142216#comment-14142216
 ] 

Hadoop QA commented on YARN-2554:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670251/YARN-2554.3.patch
  against trunk revision 84a0a62.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5063//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5063//console

This message is automatically generated.

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, 
 YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2180) In-memory backing store for cache manager

2014-09-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142228#comment-14142228
 ] 

Sangjin Lee commented on YARN-2180:
---

bq. Synchronization is missing from the InMemoryStore operations? You are using 
a ConcurrentHashMap but to insert correctly (multiple apps adding a cache-entry 
to the same path) you'll need to use putIfAbsent? Surprised the test is 
presumably passing.

I can answer this question as I wrote the in-memory store. :)

Actually all operations (both access and mutation) on {{map}} are synchronized 
on the interned key. Since keys are unique, there can be no concurrent 
operations operating on the same key when synchronized on the key. Therefore, 
{{putIfAbsent()}} is not necessary (as it is needed if there are concurrent 
operations on the same key).

There can be concurrent operations on the map on *different* keys, and that 
thread-safety is addressed by the {{ConcurrentHashMap}}.

There are a couple of exceptions, and those are {{bootstrap()}} and 
{{clearCache()}}. The {{bootstrap()}} method is an exception because it 
operates on the map before it accepts any reads/writes. The {{clearCache()}} 
method is provided only for test purposes.

Hope this helps.

 In-memory backing store for cache manager
 -

 Key: YARN-2180
 URL: https://issues.apache.org/jira/browse/YARN-2180
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
 YARN-2180-trunk-v3.patch


 Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142253#comment-14142253
 ] 

Vinod Kumar Vavilapalli commented on YARN-2554:
---

Sorry, for jumping in late.

You could fix the webapp proxy in theory. But the set up to make AM web UIs 
accept Https is impractical. AMs can launch on any machine in a cluster. They 
can be run by different users. Enabling SSL through distribution of keys per 
application, per user across the cluster is not a great solution. This is the 
reason why chose to not fix it and thus not enable the same for MapReduce.

The better solution is either
 - to keep the status quo (AM webUIs don't enable SSL) or
 - to get rid of AM UIs altogether and move to a client-side UI on top of 
Timeline server (YARN-1530) - it has its own limitations though.

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, 
 YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-20 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142256#comment-14142256
 ] 

Jonathan Maron commented on YARN-2554:
--

I'm not certain I understand your comment about the keys. The client trust 
store configured via ssl-client.xml generally contains the certificates for the 
cluster hosts,  it is not specific to users or applications. In any usage 
scenario it would by necessity contain those certificates.   

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, 
 YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2056) Disable preemption at Queue level

2014-09-20 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2056:
-
Attachment: YARN-2056.201409210049.txt

Hi [~leftnoteasy]. Thank you for spending the time to look at this patch and 
provide helpful suggestions.

{quote}
IMHO, the right place to put reserving resource logic for un-preemptable queue 
is not {{resetCapacity}}, it should in {{computeFixpointAllocation}}.
...
Does this make sense to you?
{quote}

Yes, that makes sense, and I think it is a simpler algorithm. I updated the 
patch, so please have a look.

I have made a conscious decision to only allow disable preemption at the leaf 
queue level. This is because there may be a use case where you want to disable 
preemption at the parent level, and have other queue hierarchies leave it 
alone, but then allow preemption between children of the disabled parent. So, 
rather than solve that problem with this fix, I only allow leaf queues to 
disable preemption. Even if a leaf queue could inherit it's parent's disable 
preemption value, there will likely be cases where part of the parent queue's 
over-capacity resources are untouchable and part of them are preemptable.

So, I adjusted your suggested algorithm somewhat. 

- I collected untouchableExtra instead of preemptableExtra at the TempQueue 
level. in {{computeFixpointAllocation}},
- I looped through each queue, and if one has any untouchableExtra, then the 
queue's {{idealAssigned = guaranteed + untouchableExtra}}
- In {{TempQueue#offer}}, one of the calculations is {{current + pending - 
idealAssigned}}. I had to take into consideration that if the queue has over 
capacity, some of it may be untouchable and some may be preemptable. If some of 
it is preemptable, then {{current}} could be greater than {{idealAssigned}}, 
and {{TempQueue#offer}} would end up assigning more to that queue than it 
should.


 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142268#comment-14142268
 ] 

Vinod Kumar Vavilapalli commented on YARN-2554:
---

I am talking about the server-side i.e. AMs. To use ssl to AM webapps,
 - the key-store needs to present on all machine to distribute certificates: 
AMs may come up anywhere.
 - the key-store used by Hadoop daemons *CANNOT* be shared with AMs: AMs run 
user-code as the user
 - the key-store cannot be shared across AMs of different users: Assuming I am 
running three different Slider apps as three different users, you don't want to 
have a single key-store instance accessible by all Slider AMs.
 - And distributing/installing/managing it per user is complex.

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron
 Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, 
 YARN-2554.3.patch


 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142290#comment-14142290
 ] 

Hadoop QA commented on YARN-2056:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12670263/YARN-2056.201409210049.txt
  against trunk revision 84a0a62.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5064//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5064//console

This message is automatically generated.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2577) Clarify ACL delimiter and how to configure ACL groups only

2014-09-20 Thread Miklos Christine (JIRA)
Miklos Christine created YARN-2577:
--

 Summary: Clarify ACL delimiter and how to configure ACL groups only
 Key: YARN-2577
 URL: https://issues.apache.org/jira/browse/YARN-2577
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, fairscheduler
Affects Versions: 2.5.1
Reporter: Miklos Christine
Priority: Trivial


Reading through the Fair Scheduler documentation, it would be great to 
explicitly state that the delimiter for the fair scheduler ACLs is the space 
character.
If specifying only ACL groups, users should begin the value with the space 
character. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2577) Clarify ACL delimiter and how to configure ACL groups only

2014-09-20 Thread Miklos Christine (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Christine updated YARN-2577:
---
Attachment: YARN-2577.patch

 Clarify ACL delimiter and how to configure ACL groups only
 --

 Key: YARN-2577
 URL: https://issues.apache.org/jira/browse/YARN-2577
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, fairscheduler
Affects Versions: 2.5.1
Reporter: Miklos Christine
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2577.patch


 Reading through the Fair Scheduler documentation, it would be great to 
 explicitly state that the delimiter for the fair scheduler ACLs is the space 
 character.
 If specifying only ACL groups, users should begin the value with the space 
 character. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2577) Clarify ACL delimiter and how to configure ACL groups only

2014-09-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142311#comment-14142311
 ] 

Hadoop QA commented on YARN-2577:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670266/YARN-2577.patch
  against trunk revision 84a0a62.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5065//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5065//console

This message is automatically generated.

 Clarify ACL delimiter and how to configure ACL groups only
 --

 Key: YARN-2577
 URL: https://issues.apache.org/jira/browse/YARN-2577
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, fairscheduler
Affects Versions: 2.5.1
Reporter: Miklos Christine
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2577.patch


 Reading through the Fair Scheduler documentation, it would be great to 
 explicitly state that the delimiter for the fair scheduler ACLs is the space 
 character.
 If specifying only ACL groups, users should begin the value with the space 
 character. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)