[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-03-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173427#comment-15173427
 ] 

Chris Douglas commented on YARN-4734:
-

bq. For merge it at the top level, did you mean LICENSE.txt and BUILDING.txt? 
Are there any other files I need to change?

{{NOTICE.txt}} may also need to be updated. No worries on the WIP, we can do a 
pass on the docs when it's ready to merge.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-03-01 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173510#comment-15173510
 ] 

Varun Saxena commented on YARN-4700:


Thanks [~Naganarasimha] for the patch.
I have nothing further to add other than what Vrushali said about changes in 
main code. I have same comments.

Looked at the test failures.
For TestHBaseStorageFlowActivity,
FlowActivityRowKey constructor is used while parsing row key so I don't think 
we should be changing that.
I think we can just change the timestamps of the app events and as Vrushali 
suggested, keep all the timestamps within one day. So that we can test that 
different apps on a single day generate one flow for that day. Currently 4 flow 
activity entries are coming due to app event timestamps generating 4 different 
top of the day timestamps.

For the other test case failure i.e. in 
TestTimelineReaderWebServicesHBaseStorage, you will have to change daterange 
queries because those REST queries are based on current timestamp.




> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.v1.001.patch, 
> YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4002:

Attachment: 0001-YARN-4002.patch

Updated patch myself with small correction.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173557#comment-15173557
 ] 

Rohith Sharma K S commented on YARN-4002:
-

Recently we hit this issue in 2K nodes testing. It is good to go in for 
branch-2.8.
nit on the patch : need not to have read lock on method printConfiguredHosts 
since it is called from refreshHostsReader which is write locked.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173560#comment-15173560
 ] 

Steve Loughran commented on YARN-4746:
--

Having played with this some more, I think it's probably wise to review all the 
uses of the conversion logic in the codebase; bits of it appear to assume that 
the return value is {{null}} if there's no match, rather than anything else.

regarding the patch, -1 I'm afraid. Loses the stack. Look at what I've done in 
YARN-4696 here

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2016-03-01 Thread wanglei-it (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173716#comment-15173716
 ] 

wanglei-it commented on YARN-1506:
--

Hi Junping Du, thanks for your work.
I tested this feature in HA. When switching from RM1 to RM2, all the update 
information would be lost.
RM2 will recovery the NM's resource configuration as it registered. Right?

> Replace set resource change on RMNode/SchedulerNode directly with event 
> notification.
> -
>
> Key: YARN-1506
> URL: https://issues.apache.org/jira/browse/YARN-1506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
> YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, 
> YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, 
> YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
> YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
> YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch
>
>
> According to Vinod's comments on YARN-312 
> (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
>  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173728#comment-15173728
 ] 

Hadoop QA commented on YARN-4002:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 21s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 49s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 158m 14s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL |

[jira] [Updated] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-03-01 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4696:
-
Attachment: YARN-4696-010.patch

YARN-4696 patch 010. Checkstyle warnings. The FileSystemTimelineWriter use 
FileSystem.newInstance() to create a new FS instance, with the chosen retry 
policies.

> EntityGroupFSTimelineStore to work in the absence of an RM
> --
>
> Key: YARN-4696
> URL: https://issues.apache.org/jira/browse/YARN-4696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-4696-001.patch, YARN-4696-002.patch, 
> YARN-4696-003.patch, YARN-4696-005.patch, YARN-4696-006.patch, 
> YARN-4696-007.patch, YARN-4696-008.patch, YARN-4696-009.patch, 
> YARN-4696-010.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the 
> configuration pointing to it. This is a new change, and impacts testing where 
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is 
> running; it falls back to "unknown" if not. If the RM connection was 
> optional, the "unknown" codepath could be called directly, relying on age of 
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173824#comment-15173824
 ] 

Hadoop QA commented on YARN-4696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 26s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
29 unchanged - 0 fixed = 30 total (was 29) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 37s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s 
{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch 
passed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 

[jira] [Commented] (YARN-4750) App metrics may not be correct when an app is recovered

2016-03-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173917#comment-15173917
 ] 

Jian He commented on YARN-4750:
---

this was intentional, as the thought was persisting the metrics periodically 
while the app is running will cause too much load on state-store.

> App metrics may not be correct when an app is recovered
> ---
>
> Key: YARN-4750
> URL: https://issues.apache.org/jira/browse/YARN-4750
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are 
> saved in the state store when there is an attempt state transition. Values 
> for running attempts will be in memory and will not be saved when there is an 
> RM restart/failover. For recovered app metrics value will be reset. In that 
> case, these values will be incomplete. 
> Was this intentional or have we not found a correct way to fix it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2016-03-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173941#comment-15173941
 ] 

Junping Du commented on YARN-1506:
--

Hi Wanglei, Yes. the understanding here is correct. We are not persistent 
updated resource configuration so far but other JIRA (like YARN-1000) will 
track the effort. Thanks for your comments.

> Replace set resource change on RMNode/SchedulerNode directly with event 
> notification.
> -
>
> Key: YARN-1506
> URL: https://issues.apache.org/jira/browse/YARN-1506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
> YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, 
> YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, 
> YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
> YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
> YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch
>
>
> According to Vinod's comments on YARN-312 
> (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
>  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4740) container complete msg may lost while AM restart in race condition

2016-03-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174027#comment-15174027
 ] 

Jian He commented on YARN-4740:
---

thanks [~sandflee] !
- could you add a check that the completed container is indeed in the returned 
allocate response ? also check that the completed container is not in the 
RMAppAttemptImpl#justFinishedContainers.
{code}
// sleep a while make sure allocate() get complete container,
// before this msg pass to AM, AM may crash
Thread.sleep(1000);
am1.allocate(
new ArrayList(), new ArrayList());
{code}
- it's unnecessary to parameterize the test based on whether 
RMWorkPreservingEnabled or not, because the test is not doing any RM restart at 
all.
{code}
testAMRestartNotLostContainerCompleteMsg(true);
testAMRestartNotLostContainerCompleteMsg(false);
{code}

> container complete msg may lost while AM restart in race condition
> --
>
> Key: YARN-4740
> URL: https://issues.apache.org/jira/browse/YARN-4740
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4740.01.patch
>
>
> 1, container completed, and the msg is store in 
> RMAppAttempt.justFinishedContainers
> 2,  AM allocate and before allocateResponse came to AM, AM crashed
> 3,  AM restart and couldn't get the container complete msg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4740) container complete msg may lost while AM restart in race condition

2016-03-01 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174027#comment-15174027
 ] 

Jian He edited comment on YARN-4740 at 3/1/16 5:05 PM:
---

thanks [~sandflee] !
- could you add a check that the completed container is indeed in the returned 
allocate response ? also check that the completed container is not in the 
RMAppAttemptImpl#justFinishedContainers.
{code}
// sleep a while make sure allocate() get complete container,
// before this msg pass to AM, AM may crash
Thread.sleep(1000);
am1.allocate(
new ArrayList(), new ArrayList());
{code}
- it's unnecessary to parameterize the test based on whether 
RMWorkPreservingEnabled or not, because the test is not doing any RM restart at 
all.
{code}
testAMRestartNotLostContainerCompleteMsg(true);
testAMRestartNotLostContainerCompleteMsg(false);
{code}
- please also add a comment about why doing so in transferStateFromAttempt


was (Author: jianhe):
thanks [~sandflee] !
- could you add a check that the completed container is indeed in the returned 
allocate response ? also check that the completed container is not in the 
RMAppAttemptImpl#justFinishedContainers.
{code}
// sleep a while make sure allocate() get complete container,
// before this msg pass to AM, AM may crash
Thread.sleep(1000);
am1.allocate(
new ArrayList(), new ArrayList());
{code}
- it's unnecessary to parameterize the test based on whether 
RMWorkPreservingEnabled or not, because the test is not doing any RM restart at 
all.
{code}
testAMRestartNotLostContainerCompleteMsg(true);
testAMRestartNotLostContainerCompleteMsg(false);
{code}

> container complete msg may lost while AM restart in race condition
> --
>
> Key: YARN-4740
> URL: https://issues.apache.org/jira/browse/YARN-4740
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4740.01.patch
>
>
> 1, container completed, and the msg is store in 
> RMAppAttempt.justFinishedContainers
> 2,  AM allocate and before allocateResponse came to AM, AM crashed
> 3,  AM restart and couldn't get the container complete msg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings

2016-03-01 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174082#comment-15174082
 ] 

Sunil G commented on YARN-4634:
---

Thanks [~leftnoteasy] for the comments.

I agree that we are considering too many variables to decide upon whether to 
render with labels or not in earlier patch. Yes, it adds the complexity. 
Meantime, current system has some corner cases where label-queue mappings are 
not present too. So i think such cases can be handled by assuming that we 
render UI with labels.

So I will try to consolidate the idea here, if cluster has labels other than 
DEFAULT_LABEL and at least one such label has >0 active NMs (other than 
DEFAULT_LABEL), then we will render UI with labels. Is this fine?

> Scheduler UI/Metrics need to consider cases like non-queue label mappings
> -
>
> Key: YARN-4634
> URL: https://issues.apache.org/jira/browse/YARN-4634
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4634.patch, 0002-YARN-4634.patch
>
>
> Currently when label-queue mappings are not available, there are few 
> assumptions taken in UI and in metrics.
> In above case where labels are enabled and available in cluster but without 
> any queue mappings, UI displays queues under labels. This is not correct.
> Currently  labels enabled check and availability of labels are considered to 
> render scheduler UI. Henceforth we also need to check whether 
> - queue-mappings are available
> - nodes are mapped with labels with proper exclusivity flags on
> This ticket also will try to see the default configurations in queue when 
> labels are not mapped. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI

2016-03-01 Thread Eric Payne (JIRA)
Eric Payne created YARN-4751:


 Summary: In 2.7, Labeled queue usage not shown properly in 
capacity scheduler UI
 Key: YARN-4751
 URL: https://issues.apache.org/jira/browse/YARN-4751
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 2.7.3
Reporter: Eric Payne
Assignee: Eric Payne


In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs 
separated by partition. When applications are running on a labeled queue, no 
color is shown in the bar graph, and several of the "Used" metrics are zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI

2016-03-01 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4751:
-
Attachment: 2.7 CS UI No BarGraph.jpg

In the attached screenshot, please note that the {{Used Capacity}}, {{Absolute 
Used Capacity}}, and {{Active User Info::Used Resources}} values are all zero 
even though {{Num Containers}} is 11. The application runs and completes 
successfully.

> In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
> ---
>
> Key: YARN-4751
> URL: https://issues.apache.org/jira/browse/YARN-4751
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: 2.7 CS UI No BarGraph.jpg
>
>
> In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs 
> separated by partition. When applications are running on a labeled queue, no 
> color is shown in the bar graph, and several of the "Used" metrics are zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN

2016-03-01 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-4737:
-
Attachment: (was: YARN-4737.patch.001)

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: YARN-4737.001.patch
>
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN

2016-03-01 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-4737:
-
Attachment: YARN-4737.001.patch

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: YARN-4737.001.patch
>
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI

2016-03-01 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174114#comment-15174114
 ] 

Sunil G commented on YARN-4751:
---

[~eepayne],  Thanks for updating this. YARN-4304 handled issue in few metrics.  
In Trunk,  I could see this metric is coming correctly. So I think one UI 
ticket is not picked in 2.7. 

> In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
> ---
>
> Key: YARN-4751
> URL: https://issues.apache.org/jira/browse/YARN-4751
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: 2.7 CS UI No BarGraph.jpg
>
>
> In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs 
> separated by partition. When applications are running on a labeled queue, no 
> color is shown in the bar graph, and several of the "Used" metrics are zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings

2016-03-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174165#comment-15174165
 ] 

Wangda Tan commented on YARN-4634:
--

[~sunilg], sounds good.

> Scheduler UI/Metrics need to consider cases like non-queue label mappings
> -
>
> Key: YARN-4634
> URL: https://issues.apache.org/jira/browse/YARN-4634
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4634.patch, 0002-YARN-4634.patch
>
>
> Currently when label-queue mappings are not available, there are few 
> assumptions taken in UI and in metrics.
> In above case where labels are enabled and available in cluster but without 
> any queue mappings, UI displays queues under labels. This is not correct.
> Currently  labels enabled check and availability of labels are considered to 
> render scheduler UI. Henceforth we also need to check whether 
> - queue-mappings are available
> - nodes are mapped with labels with proper exclusivity flags on
> This ticket also will try to see the default configurations in queue when 
> labels are not mapped. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings

2016-03-01 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4634:
--
Attachment: 0003-YARN-4634.patch

Updating patch as per the above mentioned comments.

> Scheduler UI/Metrics need to consider cases like non-queue label mappings
> -
>
> Key: YARN-4634
> URL: https://issues.apache.org/jira/browse/YARN-4634
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4634.patch, 0002-YARN-4634.patch, 
> 0003-YARN-4634.patch
>
>
> Currently when label-queue mappings are not available, there are few 
> assumptions taken in UI and in metrics.
> In above case where labels are enabled and available in cluster but without 
> any queue mappings, UI displays queues under labels. This is not correct.
> Currently  labels enabled check and availability of labels are considered to 
> render scheduler UI. Henceforth we also need to check whether 
> - queue-mappings are available
> - nodes are mapped with labels with proper exclusivity flags on
> This ticket also will try to see the default configurations in queue when 
> labels are not mapped. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI

2016-03-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174314#comment-15174314
 ] 

Eric Payne commented on YARN-4751:
--

Thanks, [~sunilg] for pointing out YARN-4304. I see that this revision has 
several JIRAs that would also need to be pulled back if YARN-4304 is cherry 
picked to 2.7, including YARN-1651 YARN-2003 YARN-3362 YARN-3463 YARN-3961 
YARN-4082 YARN-4162. Is that correct? I think it would be better if we had a 
2.7-specific patch for YARN-4304. Is that something you would be willing to 
provide?


> In 2.7, Labeled queue usage not shown properly in capacity scheduler UI
> ---
>
> Key: YARN-4751
> URL: https://issues.apache.org/jira/browse/YARN-4751
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: 2.7 CS UI No BarGraph.jpg
>
>
> In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs 
> separated by partition. When applications are running on a labeled queue, no 
> color is shown in the bar graph, and several of the "Used" metrics are zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages

2016-03-01 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-4681:

Assignee: Jan Lukavsky

> ProcfsBasedProcessTree should not calculate private clean pages
> ---
>
> Key: YARN-4681
> URL: https://issues.apache.org/jira/browse/YARN-4681
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Jan Lukavsky
>Assignee: Jan Lukavsky
> Attachments: YARN-4681.patch, YARN-4681.patch
>
>
> ProcfsBasedProcessTree in Node manager calculates memory used by a process 
> tree by parsing {{/etc//smaps}}, where it calculates {{min(Pss, 
> Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} 
> private clean pages can be reclaimed by kernel, this should be changed to 
> calculating only {{Locked}} pages instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages

2016-03-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174361#comment-15174361
 ] 

Chris Nauroth commented on YARN-4681:
-

[~je.ik], thank you for updating the patch.  I'm +1 for this change, pending a 
pre-commit test run from Jenkins.  I just clicked the Submit Patch button, so 
Jenkins should pick it up now.

However, I'm not confident enough to commit it immediately.  I'd like to see 
reviews from committers who spend more time in YARN than me.  I'd also like to 
find out if anyone thinks it should be configurable whether it checks locked or 
performs the old calculation.  I don't have a sense for how widely people are 
dependent on the current smaps checks.

> ProcfsBasedProcessTree should not calculate private clean pages
> ---
>
> Key: YARN-4681
> URL: https://issues.apache.org/jira/browse/YARN-4681
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Jan Lukavsky
>Assignee: Jan Lukavsky
> Attachments: YARN-4681.patch, YARN-4681.patch
>
>
> ProcfsBasedProcessTree in Node manager calculates memory used by a process 
> tree by parsing {{/etc//smaps}}, where it calculates {{min(Pss, 
> Shared_Dirty) + Private_Dirty + Private_Clean}}. Because not {{mlocked}} 
> private clean pages can be reclaimed by kernel, this should be changed to 
> calculating only {{Locked}} pages instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4681) ProcfsBasedProcessTree should not calculate private clean pages

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174387#comment-15174387
 ] 

Hadoop QA commented on YARN-4681:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch 
generated 3 new + 32 unchanged - 4 fixed = 35 total (was 36) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 28s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12787236/YARN-4681.patch |
| JIRA Issue | YARN-4681 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 32c7221bb89b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision 

[jira] [Commented] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174436#comment-15174436
 ] 

Hadoop QA commented on YARN-4634:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 55 unchanged - 0 fixed = 56 total (was 55) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 20s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 4s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 1s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 159m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Nullcheck of CapacitySchedulerPage$QueuesBlock.nodeLabelsInfo at line 419 
of value previously dereferenced in 
org.apach

[jira] [Commented] (YARN-4671) There is no need to acquire CS lock when completing a container

2016-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174449#comment-15174449
 ] 

Hudson commented on YARN-4671:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #9404 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9404/])
YARN-4671. There is no need to acquire CS lock when completing a (jianhe: rev 
5c465df90414d43250d09084748ab2d41af44eea)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> There is no need to acquire CS lock when completing a container
> ---
>
> Key: YARN-4671
> URL: https://issues.apache.org/jira/browse/YARN-4671
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: MENG DING
>Assignee: MENG DING
> Fix For: 2.8.0
>
> Attachments: YARN-4671.1.patch, YARN-4671.2.patch
>
>
> In YARN-4519, we discovered that there is no need to acquire CS lock in 
> CS#completedContainerInternal, because:
> * Access to critical section are already guarded by queue lock.
> * It is not essential to guard {{schedulerHealth}} with cs lock in 
> completedContainerInternal. All maps in schedulerHealth are concurrent maps. 
> Even if schedulerHealth is not consistent at the moment, it will be 
> eventually consistent.
> With this fix, we can truly claim that CS#allocate doesn't require CS lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174555#comment-15174555
 ] 

Hadoop QA commented on YARN-4737:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s 
{color} | {color:red} root: patch generated 3 new + 387 unchanged - 0 fixed = 
390 total (was 387) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 9m 25s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-jdk1.7.0_95 with JDK 
v1.7.0_95 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 56s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 1s {color} | 
{color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {

[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries

2016-03-01 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174594#comment-15174594
 ] 

Karthik Kambatla commented on YARN-4719:


bq. For ClusterNodeTracker#nodes, can we use lock-free data structure to avoid 
copying the whole set?
Not sure I understand the suggestion. Elaborate? 

bq. We'd better not add addBlacklistedNodeIdsToList to ClusterNodeTracker since 
it calls application's logic, we should only include node related stuffs to 
ClusterNodeTracker.
I feel any logic that has to iterate through all nodes should go through 
ClusterNodeTracker - that way, we don't run into cases where we access the list 
of nodes without a lock. Alternatively, we could get a list of nodeIDs from 
ClusterNodeTracker and then look up individual nodes. I am not particular about 
which approach, but also I don't quite see an issue with it being a part of 
ClusterNodeTracker. Any particular reason you think this doesn't belong here? 

> Add a helper library to maintain node state and allows common queries
> -
>
> Key: YARN-4719
> URL: https://issues.apache.org/jira/browse/YARN-4719
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4719-1.patch, yarn-4719-2.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-03-01 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4700:

Attachment: YARN-4700-YARN-2928.v1.002.patch

Thanks for the comments [~varun_saxena] and [~vrushalic],
bq. I believe the constructor for FlowActivityRowKey should change to correctly 
calculate top of the day timestamp given the input timestamp. 
As [~varun_saxena] mentioned ??FlowActivityRowKey constructor is used while 
parsing row key so I don't think we should be changing that?? and without 
correcting it, tests passed .
bq.  It might be more explicit to fetch the exact created (or finished) event 
from the TimelineEntity and use the timestamp that belong to either 
ApplicationMetricsConstants.CREATED_EVENT_TYPE or 
I have refactored quiet a bit for this to avoid the looping of events at 
multiple places. Please check.
I have corrected all the other comments by correcting the time stamps

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.v1.001.patch, 
> YARN-4700-YARN-2928.v1.002.patch, YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-03-01 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4700:

Attachment: (was: YARN-4700-YARN-2928.v1.002.patch)

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.v1.001.patch, 
> YARN-4700-YARN-2928.v1.002.patch, YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-03-01 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4700:

Attachment: YARN-4700-YARN-2928.v1.002.patch

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.v1.001.patch, 
> YARN-4700-YARN-2928.v1.002.patch, YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-03-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174775#comment-15174775
 ] 

Vinod Kumar Vavilapalli commented on YARN-1040:
---

bq. General case: AM launches multiple containers at the same time. This is 
essentially container-groups - we should keep this option open.
Clarification on what I meant here: It's okay for now to only design APIs (and 
defer implementation) so that even if our first version of implementation only 
covers allocation-vs-container delinking, container-groups are possible in 
future without further API changes/addition.

> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4002:

Target Version/s: 2.8.0

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174776#comment-15174776
 ] 

Rohith Sharma K S commented on YARN-4002:
-

[~leftnoteasy] would you like to have look at patch? If no comments I will go 
ahead with committing it.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4752) [Umbrella] FairScheduler: Improve preemption

2016-03-01 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4752:
--

 Summary: [Umbrella] FairScheduler: Improve preemption
 Key: YARN-4752
 URL: https://issues.apache.org/jira/browse/YARN-4752
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.8.0
Reporter: Karthik Kambatla


A number of issues have been reported with respect to preemption in 
FairScheduler along the lines of:
# FairScheduler preempts resources from nodes even if the resultant free 
resources cannot fit the incoming request.
# Preemption doesn't preempt from sibling queues
# Preemption doesn't preempt from sibling apps under the same queue that is 
over its fairshare
# ...

Filing this umbrella JIRA to group all the issues together and think of a 
comprehensive solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3997:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> An Application requesting multiple core containers can't preempt running 
> application made of single core containers
> ---
>
> Key: YARN-3997
> URL: https://issues.apache.org/jira/browse/YARN-3997
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.7.1
> Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines
>Reporter: Dan Shechter
>Assignee: Arun Suresh
>Priority: Critical
>
> When our cluster is configured with preemption, and is fully loaded with an 
> application consuming 1-core containers, it will not kill off these 
> containers when a new application kicks in requesting containers with a size 
> > 1, for example 4 core containers.
> When the "second" application attempts to us 1-core containers as well, 
> preemption proceeds as planned and everything works properly.
> It is my assumption, that the fair-scheduler, while recognizing it needs to 
> kill off some container to make room for the new application, fails to find a 
> SINGLE container satisfying the request for a 4-core container (since all 
> existing containers are 1-core containers), and isn't "smart" enough to 
> realize it needs to kill off 4 single-core containers (in this case) on a 
> single node, for the new application to be able to proceed...
> The exhibited affect is that the new application is hung indefinitely and 
> never gets the resources it requires.
> This can easily be replicated with any yarn application.
> Our "goto" scenario in this case is running pyspark with 1-core executors 
> (containers) while trying to launch h20.ai framework which INSISTS on having 
> at least 4 cores per container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174795#comment-15174795
 ] 

Hong Zhiguo commented on YARN-4002:
---

Hi, [~rohithsharma], thanks for the refinement.
But why don't take the lockless version? 

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3405:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> FairScheduler's preemption cannot happen between sibling in some case
> -
>
> Key: YARN-3405
> URL: https://issues.apache.org/jira/browse/YARN-3405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: YARN-3405.01.patch, YARN-3405.02.patch
>
>
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100
> # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, and it cause some new preemption request for 
> fairshare 25.
> # When preemption from root, it has possibility to find preemption candidate 
> is queue-2. If so preemptContainerPreCheck for queue-2 return false because 
> it's equal to its fairshare.
> # Finally queue-1-2 will be waiting for resource release form queue-1-1 
> itself.
> What I expect here is that queue-1-2 preempt from queue-1-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4333) Fair scheduler should support preemption within queue

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4333:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4752

> Fair scheduler should support preemption within queue
> -
>
> Key: YARN-4333
> URL: https://issues.apache.org/jira/browse/YARN-4333
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
> Attachments: YARN-4333.001.patch, YARN-4333.002.patch, 
> YARN-4333.003.patch
>
>
> Now each app in fair scheduler is allocated its fairshare, however  fairshare 
> resource is not ensured even if fairSharePreemption is enabled.
> Consider: 
> 1, When the cluster is idle, we submit app1 to queueA,which takes maxResource 
> of queueA.  
> 2, Then the cluster becomes busy, but app1 does not release any resource, 
> queueA resource usage is over its fairshare
> 3, Then we submit app2(maybe with higher priority) to queueA. Now app2 has 
> its own fairshare, but could not obtain any resource, since queueA is still 
> over its fairshare and resource will not assign to queueA anymore. Also, 
> preemption is not triggered in this case.
> So we should allow preemption within queue, when app is starved for fairshare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2154) FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2154:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4752

> FairScheduler: Improve preemption to preempt only those containers that would 
> satisfy the incoming request
> --
>
> Key: YARN-2154
> URL: https://issues.apache.org/jira/browse/YARN-2154
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Arun Suresh
>Priority: Critical
> Attachments: YARN-2154.1.patch
>
>
> Today, FairScheduler uses a spray-gun approach to preemption. Instead, it 
> should only preempt resources that would satisfy the incoming request. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4120:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> FSAppAttempt.getResourceUsage() should not take preemptedResource into account
> --
>
> Key: YARN-4120
> URL: https://issues.apache.org/jira/browse/YARN-4120
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> When compute resource usage for Schedulables, the following code is envolved,
> {{FSAppAttempt.getResourceUsage}},
> {code}
> public Resource getResourceUsage() {
>   return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> and this value is aggregated to FSLeafQueues and FSParentQueues. In my 
> opinion, taking {{preemptedResource}} into account here is not reasonable, 
> there are two main reasons,
> # it is something in future, i.e., even though these resources are marked as 
> preempted, it is currently used by app, and these resources will be 
> subtracted from {{currentCosumption}} once the preemption is finished. it's 
> not reasonable to make arrange for it ahead of time. 
> # there's another problem here, consider following case,
> {code}
> root
>/\
>   queue1   queue2
>   /\
> queue1.3, queue1.4
> {code}
> suppose queue1.3 need resource and it can preempt resources from queue1.4, 
> the preemption happens in the interior of queue1. But when compute resource 
> usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - 
> preemption}} according to the current code, which is unfair to queue2 when 
> doing resource allocating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4134:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> FairScheduler preemption stops at queue level that all child queues are not 
> over their fairshare
> 
>
> Key: YARN-4134
> URL: https://issues.apache.org/jira/browse/YARN-4134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4134.001.patch, YARN-4134.002.patch, 
> YARN-4134.003.patch
>
>
> Now FairScheudler uses a choose-a-candidate method to select a container from 
> leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}},
> {code}
> readLock.lock();
> try {
>   for (FSQueue queue : childQueues) {
> if (candidateQueue == null ||
> comparator.compare(queue, candidateQueue) > 0) {
>   candidateQueue = queue;
> }
>   }
> } finally {
>   readLock.unlock();
> }
> // Let the selected queue choose which of its container to preempt
> if (candidateQueue != null) {
>   toBePreempted = candidateQueue.preemptContainer();
> }
> {code}
> a candidate child queue is selected. However, if the queue's usage isn't over 
> it's fairshare, preemption will not happen:
> {code}
> if (!preemptContainerPreCheck()) {
>   return toBePreempted;
> }
> {code}
>  A scenario:
> {code}
> root
>/\
>   queue1   queue2
>/\
>   queue2.3, (  queue2.4  )
> {code}
> suppose there're 8 containers, and queues at any level have the same weight. 
> queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their 
> fairshare. Now we submit an app in queue2.4 with 4 containers needs, it 
> should preempt 2 from queue2.3, but the candidate-containers selection 
> procedure will stop at queue1, so none of the containers will be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3902) Fair scheduler preempts ApplicationMaster

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3902:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> Fair scheduler preempts ApplicationMaster
> -
>
> Key: YARN-3902
> URL: https://issues.apache.org/jira/browse/YARN-3902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.3.0
> Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
> (2014-12-08) x86_64
>Reporter: He Tianyi
>Assignee: He Tianyi
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> YARN-2022 have fixed the similar issue related to CapacityScheduler.
> However, FairScheduler still suffer, preempting AM while other normal 
> containers running out there.
> I think we should take the same approach, avoid AM being preempted unless 
> there is no container running other than AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4133) Containers to be preempted leak in FairScheduler preemption logic.

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4133:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> Containers to be preempted leak in FairScheduler preemption logic.
> --
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leak in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if (!getPreemptionContainers().contains(container) &&
>   (toBePreempted == null ||
>   comparator.compare(toBePreempted, container) > 0)) {
> toBePreempted = container;
>   }
> }
> return toBePreempted;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1961:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> Fair scheduler preemption doesn't work for non-leaf queues
> --
>
> Key: YARN-1961
> URL: https://issues.apache.org/jira/browse/YARN-1961
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler, scheduler
>Affects Versions: 2.4.0
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: scheduler
>
> Setting minResources and minSharePreemptionTimeout to a non-leaf queue 
> doesn't cause preemption to happen when that non-leaf queue is below 
> minResources and there are outstanding demands in that non-leaf queue.
> Here is an example fs allocation config(partial) :
> {code:xml}
> 
>   3072 mb,0 vcores
>   30
> 
> 
> 
> 
>  
>  {code}
> With the above configs,preemption doesn't seem to happen if queue abc is 
> below minShare and it has outstanding unsatisfied demands from apps in its 
> child queues. Ideally in such cases we would like preemption to kick off and 
> reclaim resources from other queues(not under queue abc).
> Looking at the code it seems like preemption checks for starvation only at 
> the leaf queue level and not at the parent level.
> {code:title=FairScheduler.java|borderStyle=solid}
> boolean isStarvedForMinShare(FSLeafQueue sched)
> boolean isStarvedForFairShare(FSLeafQueue sched)
> {code}
> This affects our use case where we have a parent queue with probably a 100 
> unconfigured leaf queues under it.We want to give a minshare to the parent 
> queue to protect all the leaf queues under it,but we cannot do it due to this 
> bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3121) FairScheduler preemption metrics

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3121:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4752

> FairScheduler preemption metrics
> 
>
> Key: YARN-3121
> URL: https://issues.apache.org/jira/browse/YARN-3121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: yARN-3121.prelim.patch, yARN-3121.prelim.patch
>
>
> Add FSQueuemetrics for preemption related information



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3414) FairScheduler's preemption may cause livelock

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3414:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> FairScheduler's preemption may cause livelock
> -
>
> Key: YARN-3414
> URL: https://issues.apache.org/jira/browse/YARN-3414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>
> I met this problem in our cluster, it cause livelock during preemption and 
> scheduling.
> Queue hierarchy described as below:
> {noformat}
>   root
>   /|\
>   queue-1queue-2queue-3 
>   /\
> queue-1-1  queue-1-2
> {noformat}
> # Assume cluster resource is 100G in memory
> # Assume queue-1 has max resource limit 20G
> # queue-1-1 is active and it will get max 20G memory(equal to its fairshare)
> # queue-2 is active then, and it require 30G memory(less than its fairshare)
> # queue-3 is active, and it can be assigned with all other resources, 50G 
> memory(larger than its fairshare). At here three queues' fair share is (20, 
> 40, 40), and usage is (20, 30, 50)
> # queue-1-2 is active, it will cause new preemption request(10G memory and 
> intuitively it can only preempt from its sibling queue-1-1)
> # Actually preemption starts from root, and it will find queue-3 is most over 
> fairshare, and preempt some resources form queue-3.
> # But during scheduling, it will find queue-1 itself arrived it's max 
> fairshare, and cannot assign resource to it. Then resource's again assigned 
> to queue-3
> And then it repeats between last two steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3054) Preempt policy in FairScheduler may cause mapreduce job never finish

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3054:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4752

> Preempt policy in FairScheduler may cause mapreduce job never finish
> 
>
> Key: YARN-3054
> URL: https://issues.apache.org/jira/browse/YARN-3054
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>
> Preemption policy is related with schedule policy now. Using comparator of 
> schedule policy to find preemption candidate cannot guarantee a subset of 
> containers never be preempted. And this may cause tasks to be preempted 
> periodically before they finish. So job cannot make any progress. 
> I think preemption in YARN should got below assurance:
> 1. Mapreduce jobs can get additional resources when others are idle;
> 2. Mapreduce jobs for one user in one queue can still progress with its min 
> share when others preempt resources back.
> Maybe always preempt the latest app and container can get this? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4661) Per-queue preemption policy in FairScheduler

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4661:
---
Issue Type: Sub-task  (was: Wish)
Parent: YARN-4752

> Per-queue preemption policy in FairScheduler
> 
>
> Key: YARN-4661
> URL: https://issues.apache.org/jira/browse/YARN-4661
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>Priority: Minor
>
> When {{FairScheduler}} needs to preempt container, it tries to find a 
> container by hierachically sorting and selecting {{AppSchedulable}} with most 
> 'over fairshare' (in {{FairSharePolicy}}), and pick its latest launched 
> container.
> In some case, strategy above become non-optimal, one may want to kill latest 
> container (not {{AppSchedulable}}) launched in the queue for better trade-off 
> between fairness and efficiency. Since most app with over fairshare tend to 
> be started longer ago than other apps, perhaps even its latent launch 
> container is running quite some time.
> Maybe besides {{policy}}, we make it possible to also specify a 
> {{preemptionPolicy}} only for selecting container to preempt, without 
> changing scheduling policy.
> For example:
> {quote}
> 
>   fifo
>   fair
> 
> {quote}
> Any suggestions or comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3903) Disable preemption at Queue level for Fair Scheduler

2016-03-01 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3903:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4752

> Disable preemption at Queue level for Fair Scheduler
> 
>
> Key: YARN-3903
> URL: https://issues.apache.org/jira/browse/YARN-3903
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
> Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
> (2014-12-08) x86_64
>Reporter: He Tianyi
>Priority: Trivial
> Attachments: YARN-3093.1.patch, YARN-3093.2.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> YARN-2056 supports disabling preemption at queue level for CapacityScheduler.
> As for fair scheduler, we recently encountered the same need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174833#comment-15174833
 ] 

Rohith Sharma K S commented on YARN-4002:
-

All the method in HostsFileReader are synchronized. And method {{isValidNode}} 
does 2 separate calls to hostsReader. There could be a scenario (if lock is not 
used) where after executing {{hostsReader.getHosts();}} , hostreader can do 
refresh which gives updated result for {{hostsReader.getExcludedHosts();}} but 
stale host details for getHosts method. Lockless read might mix up old and new 
values which is incorrect.


> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3903) Disable preemption at Queue level for Fair Scheduler

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174852#comment-15174852
 ] 

Hadoop QA commented on YARN-3903:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-3903 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12746093/YARN-3093.2.patch |
| JIRA Issue | YARN-3903 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10681/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Disable preemption at Queue level for Fair Scheduler
> 
>
> Key: YARN-3903
> URL: https://issues.apache.org/jira/browse/YARN-3903
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0
> Environment: 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt2-1~bpo70+1 
> (2014-12-08) x86_64
>Reporter: He Tianyi
>Priority: Trivial
> Attachments: YARN-3093.1.patch, YARN-3093.2.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> YARN-2056 supports disabling preemption at queue level for CapacityScheduler.
> As for fair scheduler, we recently encountered the same need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4133) Containers to be preempted leak in FairScheduler preemption logic.

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174866#comment-15174866
 ] 

Hadoop QA commented on YARN-4133:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 3s {color} 
| {color:red} YARN-4133 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12754810/YARN-4133.000.patch |
| JIRA Issue | YARN-4133 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10684/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Containers to be preempted leak in FairScheduler preemption logic.
> --
>
> Key: YARN-4133
> URL: https://issues.apache.org/jira/browse/YARN-4133
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-4133.000.patch
>
>
> Containers to be preempted leak in FairScheduler preemption logic. It may 
> cause missing preemption due to containers in {{warnedContainers}} wrongly 
> removed. The problem is in {{preemptResources}}:
> There are two issues which can cause containers  wrongly removed from 
> {{warnedContainers}}:
> Firstly missing the container state {{RMContainerState.ACQUIRED}} in the 
> condition check:
> {code}
> (container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED)
> {code}
> Secondly if  {{isResourceGreaterThanNone(toPreempt)}} return false, we 
> shouldn't remove container from {{warnedContainers}}. We should only remove 
> container from {{warnedContainers}}, if container is not in state 
> {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and 
> {{RMContainerState.ACQUIRED}}.
> {code}
>   if ((container.getState() == RMContainerState.RUNNING ||
>   container.getState() == RMContainerState.ALLOCATED) &&
>   isResourceGreaterThanNone(toPreempt)) {
> warnOrKillContainer(container);
> Resources.subtractFrom(toPreempt, 
> container.getContainer().getResource());
>   } else {
> warnedIter.remove();
>   }
> {code}
> Also once the containers in {{warnedContainers}} are wrongly removed, it will 
> never be preempted. Because these containers are already in 
> {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't 
> return the containers in {{FSAppAttempt#preemptionMap}}.
> {code}
>   public RMContainer preemptContainer() {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("App " + getName() + " is going to preempt a running " +
>   "container");
> }
> RMContainer toBePreempted = null;
> for (RMContainer container : getLiveContainers()) {
>   if (!getPreemptionContainers().contains(container) &&
>   (toBePreempted == null ||
>   comparator.compare(toBePreempted, container) > 0)) {
> toBePreempted = container;
>   }
> }
> return toBePreempted;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174865#comment-15174865
 ] 

Hadoop QA commented on YARN-3405:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} 
| {color:red} YARN-3405 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12727554/YARN-3405.02.patch |
| JIRA Issue | YARN-3405 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10683/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> FairScheduler's preemption cannot happen between sibling in some case
> -
>
> Key: YARN-3405
> URL: https://issues.apache.org/jira/browse/YARN-3405
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: YARN-3405.01.patch, YARN-3405.02.patch
>
>
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100
> # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, and it cause some new preemption request for 
> fairshare 25.
> # When preemption from root, it has possibility to find preemption candidate 
> is queue-2. If so preemptContainerPreCheck for queue-2 return false because 
> it's equal to its fairshare.
> # Finally queue-1-2 will be waiting for resource release form queue-1-1 
> itself.
> What I expect here is that queue-1-2 preempt from queue-1-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2016-03-01 Thread wanglei-it (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174870#comment-15174870
 ] 

wanglei-it commented on YARN-1506:
--

Thanks for your reply.

> Replace set resource change on RMNode/SchedulerNode directly with event 
> notification.
> -
>
> Key: YARN-1506
> URL: https://issues.apache.org/jira/browse/YARN-1506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
> YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, 
> YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, 
> YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, 
> YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, 
> YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch
>
>
> According to Vinod's comments on YARN-312 
> (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
>  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2016-03-01 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174874#comment-15174874
 ] 

Rohith Sharma K S commented on YARN-4478:
-

Consistently few test cases are failing because of UnknowHostException. 
Detailed analysis given in  HADOOP-12687. This is mainly because of 
yarn-precommit build machine hostname.  
I have raised INFRA JIRA INFRA-11150 to change YARN precommit build machine 
hostname. There were no response from INFRA team.
Could any folks/PMC's knows whom to contact for resolving INFRA-11150?


> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2016-03-01 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174888#comment-15174888
 ] 

Allen Wittenauer commented on YARN-4478:



* There are plenty of examples where the Jenkins network connectivity fails, 
which of course would also cause DNS failures...
* Changing the Jenkins servers isn't likely to fix anything given that all of 
the tests run in a docker container that the Hadoop project itself controls.

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-01 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4712:

Attachment: YARN-4712-YARN-2928.v1.002.patch

Hi [~varun_saxena],
bq. I was incorrectly assuming that CPU % is reported to NMTimelinePublisher in 
the range of 0-1. This doesn't seem to be the case though.
You are right even i tested it i was able to to get more than cores as 
percentage. so multiplying with 100 is not required and even i felt *round* is 
better than floor have incorporated the required changes.

bq.  2 of the checkstyle issues seem fixable.
Well have corrected it but i generally use the eclipse formatter which follows 
the sun conventions as mentioned in the [Hadoop 
wiki|https://wiki.apache.org/hadoop/HowToContribute] , so usually eclipse 
formatter takes care of 80 chars per line effectively where ever possible but 
is required to do additionally apart from it? cc/ [~sjlee0]

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4712-YARN-2928.v1.001.patch, 
> YARN-4712-YARN-2928.v1.002.patch
>
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2016-03-01 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174968#comment-15174968
 ] 

Rohith Sharma K S commented on YARN-4478:
-

In analysis of HADOOP-12687, test failures are not because of network 
connectivity which makes DNS server down.
Hadoop security model obey to RFC standards. In 
[comment|https://issues.apache.org/jira/browse/HADOOP-12687?focusedCommentId=15087185&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15087185],
 Varun talks about RFC 1535. This RFC says hostname must be end with dot("."). 
But  jenkins machines does hostname are not configured with RFC which is 
causing test failures.

As a test, this is able to reproduce in Ubuntu. After changing hostname ending 
with dot("."), these test cases are passing. This change wants to bring in 
YARN-precommit build machine too.



> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2016-03-01 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174995#comment-15174995
 ] 

Allen Wittenauer commented on YARN-4478:


You realize that RFC is talking about DNS and not /etc/hosts, right?  It's 
specifically to prevent the DNS resolver from adding more domains during 
resolution.

Also, in 20+ years of Unix system administration, I have never configured 
/etc/hosts with an ending period.  That's because /etc/hosts resolution isn't 
supposed to go through the DNS resolver at all.


> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2016-03-01 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175002#comment-15175002
 ] 

Rohith Sharma K S commented on YARN-4478:
-

bq. Also, in 20+ years of Unix system administration, I have never configured 
/etc/hosts with an ending period. That's because /etc/hosts resolution isn't 
supposed to go through the DNS resolver at all.
Cool.. Then do you think original patch of HADOOP-12687 can go in?

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175020#comment-15175020
 ] 

Hadoop QA commented on YARN-4700:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 54s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
49s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 30s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 10s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
26s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s 
{color} | {color:red} root: patch generated 1 new + 80 unchanged - 0 fixed = 81 
total (was 80) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
56s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 49s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-jdk1.7.0_95
 with JDK v1.7.0_95 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 26s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 1s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 38s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 8s 
{color} | {col

[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175029#comment-15175029
 ] 

Hadoop QA commented on YARN-4712:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
22s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
57s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 patch generated 1 new + 22 unchanged - 2 fixed = 23 total (was 24) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 46s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 14s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 27s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790851/YARN-4712-YARN-2928.v1.002.patch
 |
| JIRA Issue | YARN-4712 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c78f28d74503 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Pe

[jira] [Created] (YARN-4753) Use doxia macro to generate in-page TOC of YARN site documentation

2016-03-01 Thread Masatake Iwasaki (JIRA)
Masatake Iwasaki created YARN-4753:
--

 Summary: Use doxia macro to generate in-page TOC of YARN site 
documentation
 Key: YARN-4753
 URL: https://issues.apache.org/jira/browse/YARN-4753
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


Since maven-site-plugin 3.5 was releaced, we can use toc macro in Markdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4753) Use doxia macro to generate in-page TOC of YARN site documentation

2016-03-01 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-4753:
---
Description: Since maven-site-plugin 3.5 was released, we can use toc macro 
in Markdown.  (was: Since maven-site-plugin 3.5 was releaced, we can use toc 
macro in Markdown.)

> Use doxia macro to generate in-page TOC of YARN site documentation
> --
>
> Key: YARN-4753
> URL: https://issues.apache.org/jira/browse/YARN-4753
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>
> Since maven-site-plugin 3.5 was released, we can use toc macro in Markdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4750) App metrics may not be correct when an app is recovered

2016-03-01 Thread Srikanth Sampath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175105#comment-15175105
 ] 

Srikanth Sampath commented on YARN-4750:


Agree [~jianhe]that it would be expensive to do update periodically.  However, 
it will be useful to indicate that the metrics are compromised.  One option can 
be to set the value to a special value (say a negative number) so as to to 
indicate a compromised value one time.  Just carrying on silently, can be 
misleading.

> App metrics may not be correct when an app is recovered
> ---
>
> Key: YARN-4750
> URL: https://issues.apache.org/jira/browse/YARN-4750
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are 
> saved in the state store when there is an attempt state transition. Values 
> for running attempts will be in memory and will not be saved when there is an 
> RM restart/failover. For recovered app metrics value will be reset. In that 
> case, these values will be incomplete. 
> Was this intentional or have we not found a correct way to fix it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-03-01 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175180#comment-15175180
 ] 

Sidharta Seethana commented on YARN-4744:
-

/cc [~vvasudev]

It looks like this is an artifact of existing NM behavior - the NM appears to 
signal containers that have already exited ( as a part of 
{{ContainerLaunch.cleanupContainer()}} ) .  This signal operation fails because 
the process has already exited. These failures were not logged before but they 
are being logged now because of the centralization of container-executor 
operations via {{PrivilegedOperationExecutor}} - which logs all 
container-executor failures. 

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Sh