[jira] [Updated] (YARN-4762) NMs failing on DelegatingLinuxContainerRuntime init with LCE on

2016-03-03 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4762:

Priority: Blocker  (was: Critical)

> NMs failing on DelegatingLinuxContainerRuntime init with LCE on
> ---
>
> Key: YARN-4762
> URL: https://issues.apache.org/jira/browse/YARN-4762
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sidharta Seethana
>Priority: Blocker
>
> Seeing this exception and the NMs crash.
> {code}
> 2016-03-03 16:47:57,807 DEBUG org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> is started
> 2016-03-03 16:47:58,027 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> checkLinuxExecutorSetup: 
> [/hadoop/hadoop-yarn-nodemanager/bin/container-executor, --checksetup]
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Mount point Based on mtab file: /proc/mounts. Controller mount point not 
> writable for: cpu
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Unable to get cgroups handle.
> 2016-03-03 16:47:58,044 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to 
> initialize container executor
> 2016-03-03 16:47:58,044 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: NodeManager entered state STOPPED
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.CompositeService: 
> NodeManager: stopping services, size=0
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> entered state STOPPED
> 2016-03-03 16:47:58,047 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4553) Add cgroups support for docker containers

2016-03-03 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4553:

Fix Version/s: 2.9.0

> Add cgroups support for docker containers
> -
>
> Key: YARN-4553
> URL: https://issues.apache.org/jira/browse/YARN-4553
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Fix For: 2.9.0
>
> Attachments: YARN-4553.001.patch, YARN-4553.002.patch, 
> YARN-4553.003.patch
>
>
> Currently, cgroups-based resource isolation does not work with docker 
> containers under YARN. The processes in these containers are launched by the 
> docker daemon and they are not children of a container-executor process. 
> Docker supports a --cgroup-parent flag which can be used to point to the 
> container-specific cgroups that are created by the nodemanager. This will 
> allow the Nodemanager to manage cgroups (as it does today) while allowing 
> resource isolation to work with docker containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4761:
--
Attachment: YARN-4761.02.patch

Thanks for the pointer [~rohithsharma]. It's good to know.

Posted patch v.2. I moved the unit test from {{TestCapacityScheduler}} to 
{{TestAbstractYarnScheduler}}. I can confirm that the test fails before the 
fair scheduler changes and passes after.

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations on fair scheduler
> 
>
> Key: YARN-4761
> URL: https://issues.apache.org/jira/browse/YARN-4761
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4761.01.patch, YARN-4761.02.patch
>
>
> YARN-3802 uncovered an issue with the scheduler where the resource 
> calculation can be incorrect due to async event handling. It was subsequently 
> fixed by YARN-4344, but it was never fixed for the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179316#comment-15179316
 ] 

Hadoop QA commented on YARN-4761:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 52s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 6s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 31s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791373/YARN-4761.01.patch |
| JIRA Issue 

[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179208#comment-15179208
 ] 

Hadoop QA commented on YARN-2883:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 33s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
29s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 1s 
{color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
0s {color} | {color:green} yarn-2877 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 45s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common in 
yarn-2877 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 7s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in yarn-2877 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 2s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 30 new + 
233 unchanged - 2 fixed = 263 total (was 235) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 33 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | 

[jira] [Updated] (YARN-4762) NMs failing on DelegatingLinuxContainerRuntime init with LCE on

2016-03-03 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-4762:

Priority: Critical  (was: Major)

> NMs failing on DelegatingLinuxContainerRuntime init with LCE on
> ---
>
> Key: YARN-4762
> URL: https://issues.apache.org/jira/browse/YARN-4762
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sidharta Seethana
>Priority: Critical
>
> Seeing this exception and the NMs crash.
> {code}
> 2016-03-03 16:47:57,807 DEBUG org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> is started
> 2016-03-03 16:47:58,027 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> checkLinuxExecutorSetup: 
> [/hadoop/hadoop-yarn-nodemanager/bin/container-executor, --checksetup]
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Mount point Based on mtab file: /proc/mounts. Controller mount point not 
> writable for: cpu
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Unable to get cgroups handle.
> 2016-03-03 16:47:58,044 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to 
> initialize container executor
> 2016-03-03 16:47:58,044 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: NodeManager entered state STOPPED
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.CompositeService: 
> NodeManager: stopping services, size=0
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> entered state STOPPED
> 2016-03-03 16:47:58,047 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4762) NMs failing on DelegatingLinuxContainerRuntime init with LCE on

2016-03-03 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179196#comment-15179196
 ] 

Sidharta Seethana commented on YARN-4762:
-

Changing priority to critical - NMs don't see to come up when cgroups are not 
in use. 

> NMs failing on DelegatingLinuxContainerRuntime init with LCE on
> ---
>
> Key: YARN-4762
> URL: https://issues.apache.org/jira/browse/YARN-4762
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sidharta Seethana
>
> Seeing this exception and the NMs crash.
> {code}
> 2016-03-03 16:47:57,807 DEBUG org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> is started
> 2016-03-03 16:47:58,027 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> checkLinuxExecutorSetup: 
> [/hadoop/hadoop-yarn-nodemanager/bin/container-executor, --checksetup]
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Mount point Based on mtab file: /proc/mounts. Controller mount point not 
> writable for: cpu
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Unable to get cgroups handle.
> 2016-03-03 16:47:58,044 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to 
> initialize container executor
> 2016-03-03 16:47:58,044 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: NodeManager entered state STOPPED
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.CompositeService: 
> NodeManager: stopping services, size=0
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> entered state STOPPED
> 2016-03-03 16:47:58,047 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4762) NMs failing on DelegatingLinuxContainerRuntime init with LCE on

2016-03-03 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179194#comment-15179194
 ] 

Sidharta Seethana commented on YARN-4762:
-

/cc [~vvasudev]

When the new resource handler mechanism was introduced a CGroupHandlerImpl 
instance was only created/initialized if one of the resource handlers was 
enabled. Initialization does one of the following : 

#  if mounting of cgroups is enabled, does not mount anything because mounting 
is done on demand for individual resource handlers 
#  If mounting of cgroups is disabled, ‘initializeControllerPathsFromMtab’ gets 
called - which checks for writability for each of the cgroup mounts.  

(2) was correct behavior because the cgroups handler wasn’t created unless at 
least one of the (cgroups based) resource handlers was in use. However, with 
YARN-4553 , a CGroupsHandler is always created, even if there are no 
cgroups-based handlers in use. This (incorrectly) leads to an attempt to check 
if cgroups' mount paths are writable. 

I'll take a look at fixing this.

> NMs failing on DelegatingLinuxContainerRuntime init with LCE on
> ---
>
> Key: YARN-4762
> URL: https://issues.apache.org/jira/browse/YARN-4762
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>
> Seeing this exception and the NMs crash.
> {code}
> 2016-03-03 16:47:57,807 DEBUG org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> is started
> 2016-03-03 16:47:58,027 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> checkLinuxExecutorSetup: 
> [/hadoop/hadoop-yarn-nodemanager/bin/container-executor, --checksetup]
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Mount point Based on mtab file: /proc/mounts. Controller mount point not 
> writable for: cpu
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Unable to get cgroups handle.
> 2016-03-03 16:47:58,044 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to 
> initialize container executor
> 2016-03-03 16:47:58,044 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: NodeManager entered state STOPPED
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.CompositeService: 
> NodeManager: stopping services, size=0
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> entered state STOPPED
> 2016-03-03 16:47:58,047 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> {code}



--
This message was sent 

[jira] [Assigned] (YARN-4762) NMs failing on DelegatingLinuxContainerRuntime init with LCE on

2016-03-03 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana reassigned YARN-4762:
---

Assignee: Sidharta Seethana

> NMs failing on DelegatingLinuxContainerRuntime init with LCE on
> ---
>
> Key: YARN-4762
> URL: https://issues.apache.org/jira/browse/YARN-4762
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sidharta Seethana
>
> Seeing this exception and the NMs crash.
> {code}
> 2016-03-03 16:47:57,807 DEBUG org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> is started
> 2016-03-03 16:47:58,027 DEBUG 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> checkLinuxExecutorSetup: 
> [/hadoop/hadoop-yarn-nodemanager/bin/container-executor, --checksetup]
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Mount point Based on mtab file: /proc/mounts. Controller mount point not 
> writable for: cpu
> 2016-03-03 16:47:58,043 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Unable to get cgroups handle.
> 2016-03-03 16:47:58,044 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to 
> initialize container executor
> 2016-03-03 16:47:58,044 INFO org.apache.hadoop.service.AbstractService: 
> Service NodeManager failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: NodeManager entered state STOPPED
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.CompositeService: 
> NodeManager: stopping services, size=0
> 2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
> entered state STOPPED
> 2016-03-03 16:47:58,047 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
> container executor
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
> Caused by: java.io.IOException: Failed to initialize linux container 
> runtime(s)!
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
> ... 3 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4762) NMs failing on DelegatingLinuxContainerRuntime init with LCE on

2016-03-03 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-4762:
-

 Summary: NMs failing on DelegatingLinuxContainerRuntime init with 
LCE on
 Key: YARN-4762
 URL: https://issues.apache.org/jira/browse/YARN-4762
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli


Seeing this exception and the NMs crash.
{code}
2016-03-03 16:47:57,807 DEBUG org.apache.hadoop.service.AbstractService: 
Service 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
is started
2016-03-03 16:47:58,027 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
checkLinuxExecutorSetup: 
[/hadoop/hadoop-yarn-nodemanager/bin/container-executor, --checksetup]
2016-03-03 16:47:58,043 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
 Mount point Based on mtab file: /proc/mounts. Controller mount point not 
writable for: cpu
2016-03-03 16:47:58,043 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Unable to get cgroups handle.
2016-03-03 16:47:58,044 DEBUG org.apache.hadoop.service.AbstractService: 
noteFailure org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to 
initialize container executor
2016-03-03 16:47:58,044 INFO org.apache.hadoop.service.AbstractService: Service 
NodeManager failed in state INITED; cause: 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
container executor
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
Caused by: java.io.IOException: Failed to initialize linux container runtime(s)!
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
... 3 more
2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
Service: NodeManager entered state STOPPED
2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.CompositeService: 
NodeManager: stopping services, size=0
2016-03-03 16:47:58,047 DEBUG org.apache.hadoop.service.AbstractService: 
Service: 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService 
entered state STOPPED
2016-03-03 16:47:58,047 FATAL 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize 
container executor
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:240)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:539)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:587)
Caused by: java.io.IOException: Failed to initialize linux container runtime(s)!
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:207)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:238)
... 3 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179171#comment-15179171
 ] 

Rohith Sharma K S commented on YARN-4761:
-

I think we can move the test to TestAbstractYarnScheduler to test this JIRA 
behavior. TestAbstractYarnScheduler test class extends 
ParameterizedSchedulerTestBase that runs for both CS and FS. 

Some test cases are specific to CS behavior are EITHER skipped for 
FairScheduler OR being added in specific FairScheduler package. The test cases 
which assumes CS as default scheduler need to re-visit for fairscheduer 
functionality impacts like this JIRA.


> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations on fair scheduler
> 
>
> Key: YARN-4761
> URL: https://issues.apache.org/jira/browse/YARN-4761
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4761.01.patch
>
>
> YARN-3802 uncovered an issue with the scheduler where the resource 
> calculation can be incorrect due to async event handling. It was subsequently 
> fixed by YARN-4344, but it was never fixed for the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179127#comment-15179127
 ] 

Sangjin Lee commented on YARN-4761:
---

I'd like to discuss the unit test for this. I could essentially duplicate the 
same test that was added to the {{TestCapacityScheduler}}. However, it might be 
largely a copy-and-paste, and I'm not too happy about that but I could still do 
that. Do let me know your thoughts on this.

A larger question is, we have a large amount of generic RM unit tests out 
there, but they are exercised only against the capacity scheduler. Should we 
try to find ways to exercise them against the fair scheduler as well? That 
would be the most effective way of ensuring the soundness of any changes.

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations on fair scheduler
> 
>
> Key: YARN-4761
> URL: https://issues.apache.org/jira/browse/YARN-4761
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4761.01.patch
>
>
> YARN-3802 uncovered an issue with the scheduler where the resource 
> calculation can be incorrect due to async event handling. It was subsequently 
> fixed by YARN-4344, but it was never fixed for the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4761:
--
Attachment: YARN-4761.01.patch

Posted patch v.1.

Applied the same fix to the fair scheduler.

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations on fair scheduler
> 
>
> Key: YARN-4761
> URL: https://issues.apache.org/jira/browse/YARN-4761
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4761.01.patch
>
>
> YARN-3802 uncovered an issue with the scheduler where the resource 
> calculation can be incorrect due to async event handling. It was subsequently 
> fixed by YARN-4344, but it was never fixed for the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service

2016-03-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179070#comment-15179070
 ] 

Jian He commented on YARN-4117:
---

bq. MiniYarnCluster allocates the ports for the RM during the Start phase, and 
there was no way to pass the information to the AMRMProxy. 
I think this approach may not work for HA case where you will have multiple RM 
scheduler address.  could you check how NM talks to RM ?  I think NM have the 
same problem. we may let AMRMProxy follow the same method ?

> End to end unit test with mini YARN cluster for AMRMProxy Service
> -
>
> Key: YARN-4117
> URL: https://issues.apache.org/jira/browse/YARN-4117
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Kishore Chaliparambil
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-4117.v0.patch, YARN-4117.v1.patch
>
>
> YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to 
> end unit test using mini YARN cluster to the AMRMProxy service. This test 
> will validate register, allocate and finish application and token renewal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179066#comment-15179066
 ] 

zhihai xu commented on YARN-4761:
-

Good Finding [~sjlee0]! the same issue could also happen for fair scheduler. we 
should decouple RMNode status from fair scheduler also.

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations on fair scheduler
> 
>
> Key: YARN-4761
> URL: https://issues.apache.org/jira/browse/YARN-4761
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> YARN-3802 uncovered an issue with the scheduler where the resource 
> calculation can be incorrect due to async event handling. It was subsequently 
> fixed by YARN-4344, but it was never fixed for the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179045#comment-15179045
 ] 

Sangjin Lee commented on YARN-4761:
---

To see this, you add this code to 
{{TestResourceTrackerService#testReconnectNode}}:
{code}
  public void testReconnectNode() throws Exception {
Configuration conf = new Configuration();
conf.set(YarnConfiguration.RM_SCHEDULER,

"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler");
rm = new MockRM(conf) {
...
{code}

and the test breaks:
{noformat}
testReconnectNode(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
  Time elapsed: 1.188 sec  <<< FAILURE!
java.lang.AssertionError: expected:<15360> but was:<10240>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testReconnectNode(TestResourceTrackerService.java:1044)
{noformat}

> NMs reconnecting with changed capabilities can lead to wrong cluster resource 
> calculations on fair scheduler
> 
>
> Key: YARN-4761
> URL: https://issues.apache.org/jira/browse/YARN-4761
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> YARN-3802 uncovered an issue with the scheduler where the resource 
> calculation can be incorrect due to async event handling. It was subsequently 
> fixed by YARN-4344, but it was never fixed for the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4761:
-

 Summary: NMs reconnecting with changed capabilities can lead to 
wrong cluster resource calculations on fair scheduler
 Key: YARN-4761
 URL: https://issues.apache.org/jira/browse/YARN-4761
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.4
Reporter: Sangjin Lee
Assignee: Sangjin Lee


YARN-3802 uncovered an issue with the scheduler where the resource calculation 
can be incorrect due to async event handling. It was subsequently fixed by 
YARN-4344, but it was never fixed for the fair scheduler.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178995#comment-15178995
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
32s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s 
{color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} yarn-2877 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 42s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common in 
yarn-2877 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 3s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in yarn-2877 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 45s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 36s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 10 new + 
415 unchanged - 1 fixed = 425 total (was 416) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 21s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_72. {color} |
| 

[jira] [Commented] (YARN-4749) Generalize config file handling in container-executor

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178835#comment-15178835
 ] 

Hadoop QA commented on YARN-4749:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 48s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
59s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 35s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 49s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 48m 10s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791263/YARN-4749.002.patch |
| JIRA Issue | YARN-4749 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 782dc5b63277 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 0a9f00a |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_74 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/10705/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10705/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Generalize config file handling in container-executor
> -
>
> Key: YARN-4749
> URL: https://issues.apache.org/jira/browse/YARN-4749
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>

[jira] [Updated] (YARN-2883) Queuing of container requests in the NM

2016-03-03 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-2883:
-
Attachment: YARN-2883-yarn-2877.003.patch

Adding some first test cases for the queuing of containers to the patch.

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-yarn-2877.001.patch, 
> YARN-2883-yarn-2877.002.patch, YARN-2883-yarn-2877.003.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-03 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178833#comment-15178833
 ] 

Sidharta Seethana commented on YARN-4744:
-

Thanks, [~jlowe] !

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
> Attachments: YARN-4744.001.patch, YARN-4744.002.patch
>
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 9 more
> 2014-03-02 09:20:43,113 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=yarn 
> OPERATION=Container Finished - Succeeded

[jira] [Assigned] (YARN-4760) proxy redirect to history server uses wrong URL

2016-03-03 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reassigned YARN-4760:
-

Assignee: Eric Badger

> proxy redirect to history server uses wrong URL
> ---
>
> Key: YARN-4760
> URL: https://issues.apache.org/jira/browse/YARN-4760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Eric Badger
>
> YARN-3975 added the ability to redirect to the history server when an app 
> fails to specify a tracking URL and the RM has since forgotten about the 
> application.  However it redirects to /apps/ instead of /app/ 
> which is the wrong destination page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4760) proxy redirect to history server uses wrong URL

2016-03-03 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-4760:


 Summary: proxy redirect to history server uses wrong URL
 Key: YARN-4760
 URL: https://issues.apache.org/jira/browse/YARN-4760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.7.2
Reporter: Jason Lowe


YARN-3975 added the ability to redirect to the history server when an app fails 
to specify a tracking URL and the RM has since forgotten about the application. 
 However it redirects to /apps/ instead of /app/ which is the 
wrong destination page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178775#comment-15178775
 ] 

Hadoop QA commented on YARN-4744:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 54s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 23s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 48s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 47s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791259/YARN-4744.002.patch |
| JIRA Issue | YARN-4744 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 526b1198d34e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 0a9f00a |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  

[jira] [Updated] (YARN-4758) Enable discovery of AMs by containers

2016-03-03 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-4758:
-
Description: 
{color:red}
This is already discussed on the umbrella JIRA YARN-1489.

Copying some of my condensed summary from the design doc (section 3.2.10.3) of 
YARN-4692.
{color}

Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
YARN-1489), we still haven’t solved the problem of old running containers not 
knowing where the new AM starts running after the previous AM crashes. This is 
a specifically important problem to be solved for long running services where 
we’d like to avoid killing service containers when AMs fail­over. So far, we 
left this as a task for the apps, but solving it in YARN is much desirable. 
[(Task) This looks very much like service­-registry (YARN-913), but for 
app­containers to discover their own AMs.

Combining this requirement (of any container being able to find their AM across 
fail­overs) with those of services (to be able to find through DNS where a 
service container is running - YARN-4757) will put our registry scalability 
needs to be much higher than that of just service end­points. This calls for a 
more distributed solution for registry readers  something that is discussed in 
the comments section of YARN-1489 and MAPREDUCE-6608.
See comment 
https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359

  was:
{color:red}
This is already discussed on the umbrella JIRA YARN-1489.

Copying some of my condensed summary from the design doc (section 3.2.10.3) of 
YARN-4692.
{color}

Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
YARN-1489), we still haven’t solved the problem of old running containers not 
knowing where the new AM starts running after the previous AM crashes. This is 
a specifically important problem to be solved for long running services where 
we’d like to avoid killing service containers when AMs fail­over. So far, we 
left this as a task for the apps, but solving it in YARN is much desirable. 
[(Task) This looks very much like service­-registry (YARN-913), but for 
app­containers to discover their own AMs.

Combining this requirement (of any container being able to find their AM across 
fail­overs) with those of services (to be able to find through DNS where a 
service container is running - YARN-4757) will put our registry scalability 
needs to be much higher than that of just service end­points. This calls for a 
more distributed solution for registry readers  something that is discussed in 
the comments section of YARN-1489 and MAPREDUCE-6608.


> Enable discovery of AMs by containers
> -
>
> Key: YARN-4758
> URL: https://issues.apache.org/jira/browse/YARN-4758
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>
> {color:red}
> This is already discussed on the umbrella JIRA YARN-1489.
> Copying some of my condensed summary from the design doc (section 3.2.10.3) 
> of YARN-4692.
> {color}
> Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
> YARN-1489), we still haven’t solved the problem of old running containers not 
> knowing where the new AM starts running after the previous AM crashes. This 
> is a specifically important problem to be solved for long running services 
> where we’d like to avoid killing service containers when AMs fail­over. So 
> far, we left this as a task for the apps, but solving it in YARN is much 
> desirable. [(Task) This looks very much like service­-registry (YARN-913), 
> but for app­containers to discover their own AMs.
> Combining this requirement (of any container being able to find their AM 
> across fail­overs) with those of services (to be able to find through DNS 
> where a service container is running - YARN-4757) will put our registry 
> scalability needs to be much higher than that of just service end­points. 
> This calls for a more distributed solution for registry readers  something 
> that is discussed in the comments section of YARN-1489 and MAPREDUCE-6608.
> See comment 
> https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4749) Generalize config file handling in container-executor

2016-03-03 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-4749:

Attachment: YARN-4749.002.patch

Uploaded a new patch based on review feedback.

> Generalize config file handling in container-executor
> -
>
> Key: YARN-4749
> URL: https://issues.apache.org/jira/browse/YARN-4749
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-4749.001.patch, YARN-4749.002.patch
>
>
> The current implementation of container-executor already supports parsing of 
> key value pairs from a config file. However, it is currently restricted to 
> {{container-executor.cfg}} and cannot be reused for parsing additional 
> config/command files. Generalizing this is a required step for YARN-4245.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178722#comment-15178722
 ] 

Jason Lowe commented on YARN-4744:
--

Thanks for updating the patch!

+1, pending Jenkins.

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
> Attachments: YARN-4744.001.patch, YARN-4744.002.patch
>
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 9 more
> 2014-03-02 09:20:43,113 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=yarn 
> OPERATION=Container Finished - 

[jira] [Commented] (YARN-4740) container complete msg may lost while AM restart in race condition

2016-03-03 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178708#comment-15178708
 ] 

sandflee commented on YARN-4740:


yes, this patch ensure AM receive at least one container complete msg, but to 
one AM process, just receive one.

> container complete msg may lost while AM restart in race condition
> --
>
> Key: YARN-4740
> URL: https://issues.apache.org/jira/browse/YARN-4740
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4740.01.patch, YARN-4740.02.patch
>
>
> 1, container completed, and the msg is store in 
> RMAppAttempt.justFinishedContainers
> 2,  AM allocate and before allocateResponse came to AM, AM crashed
> 3,  AM restart and couldn't get the container complete msg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-03 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-4744:

Attachment: YARN-4744.002.patch

Uploaded a new patch - added a new PrivilegedOperation constructor and fixed 
all instances that had a null second argument to use this new constructor. 

Also filed YARN-4759 to revisit signal handling for docker containers. 

[~jlowe], please take a look? Thanks!

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
> Attachments: YARN-4744.001.patch, YARN-4744.002.patch
>
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> 

[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178661#comment-15178661
 ] 

Jason Lowe commented on YARN-4744:
--

Ah, ignore my previous comment -- I see now that we don't have the docker tools 
in place to know whether or not the kill failed in that way.

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
> Attachments: YARN-4744.001.patch
>
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 9 more
> 2014-03-02 09:20:43,113 INFO 
> 

[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178658#comment-15178658
 ] 

Jason Lowe commented on YARN-4744:
--

Even if the Docker stuff doesn't work totally, it has the same logic and will 
have the same issue at a high level (i.e.: will always be a race between kill 
and container exiting on its own) -- so why wouldn't we want to make the change 
at least for doc purposes for those coming along later to fix it?

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
> Attachments: YARN-4744.001.patch
>
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> 

[jira] [Commented] (YARN-4740) container complete msg may lost while AM restart in race condition

2016-03-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178657#comment-15178657
 ] 

Jian He commented on YARN-4740:
---

[~sandflee], actually, with this fix, the 2nd AM may possibly receive 
duplicated container statuses, while the 1st AM has already received it ?

> container complete msg may lost while AM restart in race condition
> --
>
> Key: YARN-4740
> URL: https://issues.apache.org/jira/browse/YARN-4740
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4740.01.patch, YARN-4740.02.patch
>
>
> 1, container completed, and the msg is store in 
> RMAppAttempt.justFinishedContainers
> 2,  AM allocate and before allocateResponse came to AM, AM crashed
> 3,  AM restart and couldn't get the container complete msg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart

2016-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178653#comment-15178653
 ] 

Vinod Kumar Vavilapalli commented on YARN-1489:
---

bq. That and the "Old running containers don't know where the new AM is 
running." issue is big enough that we shouldn't close this umbrella as done.
Just filed YARN-4758.

> [Umbrella] Work-preserving ApplicationMaster restart
> 
>
> Key: YARN-1489
> URL: https://issues.apache.org/jira/browse/YARN-1489
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Work preserving AM restart.pdf
>
>
> Today if AMs go down,
>  - RM kills all the containers of that ApplicationAttempt
>  - New ApplicationAttempt doesn't know where the previous containers are 
> running
>  - Old running containers don't know where the new AM is running.
> We need to fix this to enable work-preserving AM restart. The later two 
> potentially can be done at the app level, but it is good to have a common 
> solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4759) Revisit signalContainer() for docker containers

2016-03-03 Thread Sidharta Seethana (JIRA)
Sidharta Seethana created YARN-4759:
---

 Summary: Revisit signalContainer() for docker containers
 Key: YARN-4759
 URL: https://issues.apache.org/jira/browse/YARN-4759
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana


The current signal handling (in the DockerContainerRuntime) needs to be 
revisited for docker containers. For example, container reacquisition on NM 
restart might not work, depending on which user the process in the container 
runs as. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178647#comment-15178647
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 46s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s 
{color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} yarn-2877 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 43s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common in 
yarn-2877 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 4s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in yarn-2877 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 39s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 35s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 9 new + 
415 unchanged - 1 fixed = 424 total (was 416) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_72. {color} |
| 

[jira] [Created] (YARN-4758) Enable discovery of AMs by containers

2016-03-03 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-4758:
-

 Summary: Enable discovery of AMs by containers
 Key: YARN-4758
 URL: https://issues.apache.org/jira/browse/YARN-4758
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli


{color:red}
This is already discussed on the umbrella JIRA YARN-1489.

Copying some of my condensed summary from the design doc (section 3.2.10.3) of 
YARN-4692.
{color}

Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
YARN-1489), we still haven’t solved the problem of old running containers not 
knowing where the new AM starts running after the previous AM crashes. This is 
a specifically important problem to be solved for long running services where 
we’d like to avoid killing service containers when AMs fail­over. So far, we 
left this as a task for the apps, but solving it in YARN is much desirable. 
[(Task) This looks very much like service­-registry (YARN-913), but for 
app­containers to discover their own AMs.

Combining this requirement (of any container being able to find their AM across 
fail­overs) with those of services (to be able to find through DNS where a 
service container is running - YARN-4757) will put our registry scalability 
needs to be much higher than that of just service end­points. This calls for a 
more distributed solution for registry readers  something that is discussed in 
the comments section of YARN-1489 and MAPREDUCE-6608.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN

2016-03-03 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-4737:
-
Attachment: YARN-4737.004.patch

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: YARN-4737.001.patch, YARN-4737.002.patch, 
> YARN-4737.003.patch, YARN-4737.004.patch
>
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-03-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2888:
--
Attachment: YARN-2888-yarn-2877.002.patch

Updating above patch with some extra documentation and some minor refactoring

> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-03-03 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178629#comment-15178629
 ] 

Jonathan Maron commented on YARN-4757:
--

I've actually been working on a DNS approach for some time.  I'll upload a 
document describing the approach soon.

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-03-03 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron reassigned YARN-4757:


Assignee: Jonathan Maron

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service­-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent end­points of a service is not easy to implement 
> using the present registry­-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-­known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-03-03 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-4757:
-

 Summary: [Umbrella] Simplified discovery of services via DNS 
mechanisms
 Key: YARN-4757
 URL: https://issues.apache.org/jira/browse/YARN-4757
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Vinod Kumar Vavilapalli


[See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track all 
related efforts.]

In addition to completing the present story of service­-registry (YARN-913), we 
also need to simplify the access to the registry entries. The existing read 
mechanisms of the YARN Service Registry are currently limited to a registry 
specific (java) API and a REST interface. In practice, this makes it very 
difficult for wiring up existing clients and services. For e.g, dynamic 
configuration of dependent end­points of a service is not easy to implement 
using the present registry­-read mechanisms, *without* code-changes to existing 
services.

A good solution to this is to expose the registry information through a more 
generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
uses the well-­known DNS interfaces to browse the network for services. 
YARN-913 in fact talked about such a DNS based mechanism but left it as a 
future task. (Task) Having the registry information exposed via DNS simplifies 
the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4756) Unnecessary wait in Node Status Updater during reboot

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178614#comment-15178614
 ] 

Hadoop QA commented on YARN-4756:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 3s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 35s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 40s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791244/YARN-4756.002.patch |
| JIRA Issue | YARN-4756 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux bb8f2e3e7d26 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 0a9f00a |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  

[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-03 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178599#comment-15178599
 ] 

Eric Badger commented on YARN-4686:
---

As per above comment, these test failures are not related to the patch. All 
relevant test failures have been addressed. [~jlowe] [~kasha] Please review the 
patch when you get a chance. 

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178584#comment-15178584
 ] 

Hadoop QA commented on YARN-4686:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
30 unchanged - 0 fixed = 31 total (was 30) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 20s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 43s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 29s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 31s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | 

[jira] [Updated] (YARN-4756) Unnecessary wait in Node Status Updater during reboot

2016-03-03 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4756:
--
Attachment: YARN-4756.002.patch

Fixing patch so that it applies to trunk instead of being dependent on the 
[YARN-4686] patch.

> Unnecessary wait in Node Status Updater during reboot
> -
>
> Key: YARN-4756
> URL: https://issues.apache.org/jira/browse/YARN-4756
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-4756.001.patch, YARN-4756.002.patch
>
>
> The startStatusUpdater thread waits for the isStopped variable to be set to 
> true, but it is waiting for the next heartbeat. During a reboot, the next 
> heartbeat will not come and so the thread waits for a timeout. Instead, we 
> should notify the thread to continue so that it can check the isStopped 
> variable and exit without having to wait for a timeout. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178467#comment-15178467
 ] 

Hadoop QA commented on YARN-4737:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 5s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 52s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 59s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 3s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 0s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 16s 
{color} | {color:red} root: patch generated 5 new + 431 unchanged - 9 fixed = 
436 total (was 440) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 9m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 16s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 41s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 13s {color} 
| 

[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-03-03 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178448#comment-15178448
 ] 

Kuhu Shukla commented on YARN-4311:
---

Requesting [~jlowe], [~templedf] for review/comments. Thanks a lot!

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v10.patch, 
> YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, 
> YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178426#comment-15178426
 ] 

Hadoop QA commented on YARN-4311:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 40s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 9s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 42s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s 
{color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_74. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |

[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler address

2016-03-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178423#comment-15178423
 ] 

Vinod Kumar Vavilapalli commented on YARN-4083:
---

bq. Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler address.
Please see my comment 
[here|https://issues.apache.org/jira/browse/YARN-4650?focusedCommentId=15176322=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15176322]
 on YARN-4650. After rolling-upgrades (YARN-666), for correctness sake, we 
require all apps to *not* depend on server side config files (which may change 
during upgrades).

bq. I feel an environment variable should be accessible by linux, windows and 
other containers.
I think [~aw] is making the same comment I made in the design doc (in section 
3.2.5) for YARN-4692. Pasting that comment below:
{quote}
All of our platform-­to­-application communication currently is only through 
process environment variables: for e.g. A​pplicationContants.NM_HOST.​ With 
things like Linux CGroups, containerization through docker etc, it is now 
possible to launch multi­-process containers where the solution of 
environmental­ variables breaks down. [(Task) We need better ways of 
propagating important information down to the containers ­ information like 
container’s resource size, local­-dirs and log­-dirs available for writing etc.

{quote}

> Add a discovery mechanism for the scheduler address
> ---
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
> HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler 
> address. This JIRA proposes the addition of an explicit discovery mechanism 
> for the scheduler address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4756) Unnecessary wait in Node Status Updater during reboot

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178420#comment-15178420
 ] 

Hadoop QA commented on YARN-4756:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-4756 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791229/YARN-4756.001.patch |
| JIRA Issue | YARN-4756 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10700/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Unnecessary wait in Node Status Updater during reboot
> -
>
> Key: YARN-4756
> URL: https://issues.apache.org/jira/browse/YARN-4756
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-4756.001.patch
>
>
> The startStatusUpdater thread waits for the isStopped variable to be set to 
> true, but it is waiting for the next heartbeat. During a reboot, the next 
> heartbeat will not come and so the thread waits for a timeout. Instead, we 
> should notify the thread to continue so that it can check the isStopped 
> variable and exit without having to wait for a timeout. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-03 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178409#comment-15178409
 ] 

Eric Badger commented on YARN-4686:
---

JIRA [YARN-4756] has been opened regarding the heartbeatMonitor optimization in 
the startStatusUpdater thread.

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4756) Unnecessary wait in Node Status Updater during reboot

2016-03-03 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4756:
--
Description: The startStatusUpdater thread waits for the isStopped variable 
to be set to true, but it is waiting for the next heartbeat. During a reboot, 
the next heartbeat will not come and so the thread waits for a timeout. 
Instead, we should notify the thread to continue so that it can check the 
isStopped variable and exit without having to wait for a timeout.   (was: The 
Node Status Updater thread waits for the isStopped variable to be set to true, 
but it is waiting for the next heartbeat. During a reboot, the next heartbeat 
will not come and so the thread waits for a timeout. Instead, we should notify 
the thread to continue so that it can check the isStopped variable and exit 
without having to wait for a timeout. )

> Unnecessary wait in Node Status Updater during reboot
> -
>
> Key: YARN-4756
> URL: https://issues.apache.org/jira/browse/YARN-4756
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-4756.001.patch
>
>
> The startStatusUpdater thread waits for the isStopped variable to be set to 
> true, but it is waiting for the next heartbeat. During a reboot, the next 
> heartbeat will not come and so the thread waits for a timeout. Instead, we 
> should notify the thread to continue so that it can check the isStopped 
> variable and exit without having to wait for a timeout. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4756) Unnecessary wait in Node Status Updater during reboot

2016-03-03 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4756:
--
Attachment: YARN-4756.001.patch

The optimization to notify the Node Status Updater thread to stop waiting for a 
heartbeat exposes a race condition in the test 
TestNodeManagerResync#testContainerResourceIncreaseIsSynchronizedWithRMResync. 
The test checks the current resources of the NM, then checks for it again since 
a different thread changes the current resources. However, there is no 
synchronization between these threads and it was only working because of the 
excessive wait time from the reboot. The patch adds in a barrier to synchronize 
these two threads. 

> Unnecessary wait in Node Status Updater during reboot
> -
>
> Key: YARN-4756
> URL: https://issues.apache.org/jira/browse/YARN-4756
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-4756.001.patch
>
>
> The Node Status Updater thread waits for the isStopped variable to be set to 
> true, but it is waiting for the next heartbeat. During a reboot, the next 
> heartbeat will not come and so the thread waits for a timeout. Instead, we 
> should notify the thread to continue so that it can check the isStopped 
> variable and exit without having to wait for a timeout. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4756) Unnecessary wait in Node Status Updater during reboot

2016-03-03 Thread Eric Badger (JIRA)
Eric Badger created YARN-4756:
-

 Summary: Unnecessary wait in Node Status Updater during reboot
 Key: YARN-4756
 URL: https://issues.apache.org/jira/browse/YARN-4756
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Eric Badger
Assignee: Eric Badger


The Node Status Updater thread waits for the isStopped variable to be set to 
true, but it is waiting for the next heartbeat. During a reboot, the next 
heartbeat will not come and so the thread waits for a timeout. Instead, we 
should notify the thread to continue so that it can check the isStopped 
variable and exit without having to wait for a timeout. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-03-03 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2888:
--
Attachment: YARN-2888-yarn-2877.001.patch

Uploading initial patch to solicit feedback

> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-03-03 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4686:
--
Attachment: YARN-4686.003.patch

Fixing a deadlock issue related to the Node Status Updater thread and the 
Reboot thread. Also taking out heartbeatMonitor notify optimization. I will 
file a separate JIRA for this issue as well as the test whose race condition it 
exposes. 

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, 
> YARN-4686.002.patch, YARN-4686.003.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-03 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178179#comment-15178179
 ] 

Sidharta Seethana commented on YARN-4744:
-

Thanks, [~jlowe]. I would prefer to leave the log-then-throw in place right now 
(or at least keep it outside the scope of this JIRA ). 

About the patch : I didn't modify DockerLinuxContainerRuntime because the 
signaling there needs additional work - needs to be reimplemented in terms of 
docker operations. Not sure if I filed a JIRA for that yet, I'll check. I'll 
fix the PrivilegedOperation constructor. 




> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
> Attachments: YARN-4744.001.patch
>
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> 

[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178165#comment-15178165
 ] 

Varun Saxena commented on YARN-4712:


Yeah even I think we should make minimal changes here i.e. only those required 
specifically for YARN-2928. Because this is something which has to be done 
primarily in trunk.
I will let [~djp] comment on it as he was reviewing YARN-4308 as well.

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178143#comment-15178143
 ] 

Sunil G commented on YARN-4712:
---

Hi [~Naganarasimha Garla] and [~varun_saxena]

I think changes in {{ContainersMonitorImpl}} need to be trunk also. A patch was 
given in YARN-4308, but there were few discussion on  YARN-3304 and most people 
agreed for -1 there. Hence YARN-4308 is in limbo state, and I think for first 
time we can send 0 rather than -1. So patch there can go to trunk i think. In 
that case, next trunk sync will fetch that change. Thoughts?

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4749) Generalize config file handling in container-executor

2016-03-03 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178107#comment-15178107
 ] 

Sidharta Seethana commented on YARN-4749:
-

Thanks, [~vvasudev]. These issues existed before my changes and I missed fixing 
them. New patch coming up.

> Generalize config file handling in container-executor
> -
>
> Key: YARN-4749
> URL: https://issues.apache.org/jira/browse/YARN-4749
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-4749.001.patch
>
>
> The current implementation of container-executor already supports parsing of 
> key value pairs from a config file. However, it is currently restricted to 
> {{container-executor.cfg}} and cannot be reused for parsing additional 
> config/command files. Generalizing this is a required step for YARN-4245.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-03 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178065#comment-15178065
 ] 

Sunil G commented on YARN-4712:
---

[~varun_saxena]  I was also planning to handle this pblm from it root cause 
end,  which is in NM reporting side,  as YARN-4308. Ideally to me,  it's not 
good to send UNAVAILABLE,  bcz we send it for every first reading. In some 
error care,  when there s no reading s there,  I think we may need this error 
code also.  But I would like to fix sending - 1 in first reponse at least. 

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4749) Generalize config file handling in container-executor

2016-03-03 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178053#comment-15178053
 ] 

Varun Vasudev commented on YARN-4749:
-

Thanks for the patch [~sidharta-s]. It looks mostly good. One 
formatting/indentation fix -

{code}
+if(cfg->confdetails[cfg->size] )
+cfg->size++;
{code}

Please fix the formatting of the if condition, and fix the indentation of the 
the increment statement. I would prefer it if you added braces but that's my 
personal choice.

> Generalize config file handling in container-executor
> -
>
> Key: YARN-4749
> URL: https://issues.apache.org/jira/browse/YARN-4749
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-4749.001.patch
>
>
> The current implementation of container-executor already supports parsing of 
> key value pairs from a config file. However, it is currently restricted to 
> {{container-executor.cfg}} and cannot be reused for parsing additional 
> config/command files. Generalizing this is a required step for YARN-4245.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178055#comment-15178055
 ] 

Varun Saxena commented on YARN-4712:


[~Naganarasimha],
UNAVAILABLE related issue exists in trunk. Shouldnt we fix all issues related 
to UNAVAILABLE as part of JIRA in trunk ? YARN-4308 is raised specifically for 
this purpose if I am not wrong. I was seeing this JIRA more for handling the 
issue with sending float as CPU metric and just bringing in YARN-4308 fix in 
the interim.
However I am open to fixing it here. But this can duplicate effort and cause 
clash if JIRA in trunk goes before this JIRA.
Thoughts [~djp] ?

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-03-03 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4311:
--
Attachment: YARN-4311-v11.patch

Re-attaching the v11 patch with no changes to trigger another pre-commit since 
TestResourceTrackerService failure are not reproducible locally and from 
investigation seem related to the sleep based wait. Need to see if this failure 
is consistent. Also checked that it applies clean to trunk.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v10.patch, 
> YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, 
> YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177970#comment-15177970
 ] 

Hadoop QA commented on YARN-4712:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
4s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 patch generated 1 new + 20 unchanged - 4 fixed = 21 total (was 24) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 16s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 20s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 34s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791179/YARN-4712-YARN-2928.v1.003.patch
 |
| JIRA Issue | YARN-4712 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 60f598883cac 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Updated] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4712:

Attachment: YARN-4712-YARN-2928.v1.003.patch

Hi [~djp] & [~varun_saxena],
Please find the latest patch addressing the comments but additionally i have 
tried to take care all other places where -1 
*(ResourceCalculatorProcessTree.UNAVAILABLE)* can be used in calculations. 
Please review.

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch, YARN-4712-YARN-2928.v1.003.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN

2016-03-03 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-4737:
-
Attachment: YARN-4737.003.patch

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: YARN-4737.001.patch, YARN-4737.002.patch, 
> YARN-4737.003.patch
>
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177872#comment-15177872
 ] 

Jason Lowe commented on YARN-4744:
--

Thanks for the patch!

bq. In addition, logging in PrivilegedOperationExecutor includes information 
that isn't necessarily available when the exception is propagated. 

That problem is solved by having the throwing code either encode that 
information in the exception message or adding necessary fields to the 
exception class, allowing the error handler to retrieve them as needed.  If the 
throwing code can create an appropriate log message then it can put that same 
information in the exception.  There's already a custom exception for these 
errors, so it would be easy to add things like full command line, etc.  I still 
think the code handling the error is the real problem if we're missing 
appropriate logs, but I don't feel so strongly to block it if others prefer 
leaving the log-then-throw logic in place.

Comments on the patch:

Don't we need to update DockerLinuxContainerRuntime in a similar manner?  I 
think we'll have the same issue there.

PrivilegedOperation should have a constructor that just takes an opType 
parameter and the other constructors should be implemented in terms of it.  
That eliminates the duplicate code maintenance pitfalls and avoids doing odd 
things like passing nulls as standard practice.


> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
> Attachments: YARN-4744.001.patch
>
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> 

[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-03-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177826#comment-15177826
 ] 

Varun Saxena commented on YARN-4700:


Thanks [~Naganarasimha] for the latest patch. 
+1, looks good to me.

I will wait for a while so that if others have any comment, they can give. Will 
commit it later today.

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.v1.001.patch, 
> YARN-4700-YARN-2928.v1.002.patch, YARN-4700-YARN-2928.v1.003.patch, 
> YARN-4700-YARN-2928.v1.004.patch, YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4602) Scalable and Simple Message Service for YARN application

2016-03-03 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4602:
-
Summary: Scalable and Simple Message Service for YARN application  (was: 
Message/notification service between containers)

> Scalable and Simple Message Service for YARN application
> 
>
> Key: YARN-4602
> URL: https://issues.apache.org/jira/browse/YARN-4602
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>
> Currently, mostly communications among YARN daemons, services and 
> applications are go through RPC. In almost all cases, logic running inside of 
> containers are RPC client but not server because it get launched inflight. 
> The only special case is AM container, because it get launched earlier than 
> any other containers so it can be RPC server and tell new coming containers 
> server address in application logic (like MR AM). 
> The side effects are: 
> 1. When AM container get failed, the new AM attempts will get launched with 
> new address/port, so previous RPC are broken.
> 2. Application's requirement are variable, there could be other dependency 
> between containers (not AM), so some container failed over will affect other 
> containers' running logic.
> It is better to have some message/notification mechanism between containers 
> for handle above cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN

2016-03-03 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177543#comment-15177543
 ] 

Varun Vasudev commented on YARN-4737:
-

Thanks for the updated patch Jon. Some more fixes required -

1) In WebApps.java -
{code}
+Map params = getCsrfConfigParameters();
+if (hasCSRFEnabled(params)) {
+  LOG.info("CSRF Protection has been enabled for the {} application. "
+  + "Please ensure that there is an authentication mechanism "
+  + "enabled (kerberos, custom, etc).",
+  name);
+  String restCsrfClassName = RestCsrfPreventionFilter.class.getName();
+  HttpServer2.defineFilter(server.getWebAppContext(), 
restCsrfClassName,
+  restCsrfClassName, params,
+  new String[] {"/*"});
+}
{code}
should be before
{code}
 HttpServer2.defineFilter(server.getWebAppContext(), "guice",
   GuiceFilter.class.getName(), null, new String[] { "/*" });
{code}

The guice filter redirects the request to the appropriate handler and the 
requests get executed before going through the CSRF filter.

2) The JHS configs in mapred-default.xml start with the prefix - 
mapreduce.jobhistory.webapp but the prefix used in code is mapreduce.jobhistory 
(no webapp) - I think you need to create a mapreduce.jobhistory.webapp prefix 
in the code.

3) In yarn-default.xml, all the timeline service configs have an extra "." in 
them after "yarn.timeline-service". e.g. 
yarn.timeline-service..webapp.rest-csrf.methods-to-ignore

The failing tests and ASF warnings are unrelated to the patch.

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: YARN-4737.001.patch, YARN-4737.002.patch
>
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-03-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177526#comment-15177526
 ] 

Varun Saxena commented on YARN-3863:


Furthermore, in ATSv1 we had something called secondary filters which is along 
the lines our filters(atleast similar to info filters). It used to check other 
info field in an entity for match.
Even there, it was not mandatory to have fields to retrieve as OTHER_INFO for 
secondary filters to match.
Not saying that we have to do what ATSv1 did, but just letting you know what 
was done in ATSv1.

We can discuss further and take a final decision on this in today's meeting.

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-YARN-2928.v2.03.patch, 
> YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-03-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177521#comment-15177521
 ] 

Varun Saxena commented on YARN-3863:


Moreover, info also has associated info as well. 
- Sorry, meant "Moreover, events may have associated info as well."



> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-YARN-2928.v2.03.patch, 
> YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-03-03 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177509#comment-15177509
 ] 

Varun Saxena commented on YARN-3863:


Thanks [~sjlee0] for the review.

bq. One high level question: am I correct in understanding that if a relations 
filter is specified for example but relation was not specified as part of 
fields to retrieve, we would try to fetch the relation?
Yes, we would try to fetch only those relations which are required to match the 
relation filters. Same goes for event filters. We will try to fetch only those 
events which are required to match event filters if fields to retrieve does not 
specify EVENTS.

bq. What if we simply reject or ignore the filters if they do not match the 
fields to retrieve? Would it make the implementation simpler or harder?
It will preclude the need of some of the code in GenericEntityReader and 
ApplicationEntity i.e. primarily code in method 
{{fetchPartialColsFromInfoFamily}} and {{createFilterListForColsOfInfoFamily}}.

bq. To me, supporting more contents even if the filters and the fields to 
retrieve are not consistent seems very much optional, and I'm not sure if it is 
worth it especially if it adds a lot more complexity. What do you think?
Personally I think fields to retrieve and filters should be treated separately. 
Filters decide which entities to carry back in response and 
fields/configs/metrics to retrieve decide what should be carried in each entity.
Treating filters and fields to retrieve is consistent with code written 
previously in the branch but as this is new code we can change the behavior 
too. But I am not very sure if we should do so.
For instance, if I want to get IDs' of all the FINISHED apps, I can make a 
query with eventfilters as APPLICATION_FINISHED and not specify anything in 
fields to retrieve as I am only interested in application ID. If I link it to 
fields to retrieve, I will have to unnecessarily fetch other events as well, 
which I have no interest in. This increases the amount of bytes transferred 
across the wire as well. Moreover, info also has associated info as well. 
Maybe along the lines of confs/metrics to retrieve we can have something like 
events to retrieve as well but in all these cases one query param is depending 
on other which doesn't sound right to me.
Thoughts ?
We can discuss further on this in today's meeting.

bq. I know Vrushali C had some thoughts on how to split this monolithic 
TestHBaseTimelineStorage. It might be good to come to a consensus on how to 
split it...
Ok. I had split it across apps and entities. We can seek her opinion too on 
this in today's meeting.

I will check other comments when I start coding for next version of patch. Most 
sound like they would be valid and fixable.

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-YARN-2928.v2.03.patch, 
> YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4754) Too many connection opened to TimelineServer while publishing entities

2016-03-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177506#comment-15177506
 ] 

Rohith Sharma K S commented on YARN-4754:
-

bq. I still see 2 places where we are not closing ClientResponse, when we call 
putDomain and in doPosting if response is not 200 OK.
It looks to be this is the case. After RM recovery completes, timeline entities 
are published in background. During this span of time, if there timeline sever 
is restarted or down for sometime, it is able to see many connections are kept 
CLOSE_WAIT state.

> Too many connection opened to TimelineServer while publishing entities
> --
>
> Key: YARN-4754
> URL: https://issues.apache.org/jira/browse/YARN-4754
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Critical
> Attachments: ConnectionLeak.rar
>
>
> It is observed that there are too many connections are kept opened to 
> TimelineServer while publishing entities via SystemMetricsPublisher. This 
> cause sometimes resource shortage for other process or RM itself
> {noformat}
> tcp0  0 10.18.99.110:3999   10.18.214.60:59265  
> ESTABLISHED 115302/java 
> tcp0  0 10.18.99.110:25001  :::*LISTEN
>   115302/java 
> tcp0  0 10.18.99.110:25002  :::*LISTEN
>   115302/java 
> tcp0  0 10.18.99.110:25003  :::*LISTEN
>   115302/java 
> tcp0  0 10.18.99.110:25004  :::*LISTEN
>   115302/java 
> tcp0  0 10.18.99.110:25005  :::*LISTEN
>   115302/java 
> tcp1  0 10.18.99.110:48866  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:48137  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:47553  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:48424  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:48139  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:48096  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:47558  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:49270  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4754) Too many connection opened to TimelineServer while publishing entities

2016-03-03 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177478#comment-15177478
 ] 

Rohith Sharma K S commented on YARN-4754:
-

As I see ATS logs, there were no exception. In RM logs, there was exception as 
I mentioned in first comment {{SocketException: Too many open files}}.
I am recovering the applications once again, and will check it out for 
close_wait connections.

> Too many connection opened to TimelineServer while publishing entities
> --
>
> Key: YARN-4754
> URL: https://issues.apache.org/jira/browse/YARN-4754
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Critical
> Attachments: ConnectionLeak.rar
>
>
> It is observed that there are too many connections are kept opened to 
> TimelineServer while publishing entities via SystemMetricsPublisher. This 
> cause sometimes resource shortage for other process or RM itself
> {noformat}
> tcp0  0 10.18.99.110:3999   10.18.214.60:59265  
> ESTABLISHED 115302/java 
> tcp0  0 10.18.99.110:25001  :::*LISTEN
>   115302/java 
> tcp0  0 10.18.99.110:25002  :::*LISTEN
>   115302/java 
> tcp0  0 10.18.99.110:25003  :::*LISTEN
>   115302/java 
> tcp0  0 10.18.99.110:25004  :::*LISTEN
>   115302/java 
> tcp0  0 10.18.99.110:25005  :::*LISTEN
>   115302/java 
> tcp1  0 10.18.99.110:48866  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:48137  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:47553  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:48424  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:48139  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:48096  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:47558  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> tcp1  0 10.18.99.110:49270  10.18.99.110:8188   
> CLOSE_WAIT  115302/java 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)