date:20150512


 [ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3489:
-
Attachment: YARN-3489-branch-2.7.patch

[~varun_saxena],
I noticed there're some test failures of the patch I rebased, could you take a 
look at the patch I attached and run tests of yarn-server-resourcemanager? It 
seems like there're some test environment setup issues.

 RMServerUtils.validateResourceRequests should only obtain queue info once
 -

 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
  Labels: BB2015-05-RFC
 Attachments: YARN-3489-branch-2.7.patch, YARN-3489.01.patch, 
 YARN-3489.02.patch, YARN-3489.03.patch


 Since the label support was added we now get the queue info for each request 
 being validated in SchedulerUtils.validateResourceRequest.  If 
 validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
 large cluster with lots of varied locality in the requests) then it will get 
 the queue info for each request.  Since we build the queue info this 
 generates a lot of unnecessary garbage, as the queue isn't changing between 
 requests.  We should grab the queue info once and pass it down rather than 
 building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster

2015-05-12 Thread Yang Weiwei (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Weiwei updated YARN-3526:
--
Attachment: YARN-3526.002.patch

Fixed trailing white spaces

 ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
 -

 Key: YARN-3526
 URL: https://issues.apache.org/jira/browse/YARN-3526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Red Hat Enterprise Linux Server 6.4 
Reporter: Yang Weiwei
Assignee: Yang Weiwei
  Labels: BB2015-05-TBR
 Attachments: YARN-3526.001.patch, YARN-3526.002.patch


 On a QJM HA cluster, view RM web UI to track job status, it shows
 This is standby RM. Redirecting to the current active RM: 
 http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce
 it refreshes every 3 sec but never going to the correct tracking page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once


[ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539332#comment-14539332
 ] 

Hadoop QA commented on YARN-3489:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732137/YARN-3489-branch-2.7.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | branch-2 / 12584ac |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7877/console |


This message was automatically generated.

 RMServerUtils.validateResourceRequests should only obtain queue info once
 -

 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
  Labels: BB2015-05-RFC
 Attachments: YARN-3489-branch-2.7.patch, YARN-3489.01.patch, 
 YARN-3489.02.patch, YARN-3489.03.patch


 Since the label support was added we now get the queue info for each request 
 being validated in SchedulerUtils.validateResourceRequest.  If 
 validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
 large cluster with lots of varied locality in the requests) then it will get 
 the queue info for each request.  Since we build the queue info this 
 generates a lot of unnecessary garbage, as the queue isn't changing between 
 requests.  We should grab the queue info once and pass it down rather than 
 building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3629) NodeID is always printed as null in node manager initialization log.

nijel created YARN-3629:
---

 Summary: NodeID is always printed as null in node manager 
initialization log.
 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel


In Node manager log during startup the following logs is printed

2015-05-12 11:20:02,347 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
virtual-cores=8

This line is printed from NodeStatusUpdaterImpl.serviceInit.
But the nodeid assignment is happening only in 
NodeStatusUpdaterImpl.serviceStart
{code}
  protected void serviceStart() throws Exception {

// NodeManager is the last service to start, so NodeId is available.
this.nodeId = this.context.getNodeId();
{code}

Assigning the node id in serviceinit is not feasible since it is generated by  
ContainerManagerImpl.serviceStart.

The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly

2015-05-12 Thread Yang Weiwei (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539341#comment-14539341
 ] 

Yang Weiwei commented on YARN-2605:
---

Hello [~xgong]

I noticed that you set an Ignore tag for testRMWebAppRedirect in your patch, 
can you please let me know why to ignore this test case ? Any idea that how to 
fix this ? I opened another JIRA YARN-3601, please let me know if that one is 
valid.

 [RM HA] Rest api endpoints doing redirect incorrectly
 -

 Key: YARN-2605
 URL: https://issues.apache.org/jira/browse/YARN-2605
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: bc Wong
Assignee: Xuan Gong
  Labels: newbie
 Fix For: 2.7.1

 Attachments: YARN-2605.1.patch, YARN-2605.2.patch


 The standby RM's webui tries to do a redirect via meta-refresh. That is fine 
 for pages designed to be viewed by web browsers. But the API endpoints 
 shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd 
 suggest HTTP 303, or return a well-defined error message (json or xml) 
 stating that the standby status and a link to the active RM.
 The standby RM is returning this today:
 {noformat}
 $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics
 HTTP/1.1 200 OK
 Cache-Control: no-cache
 Expires: Thu, 25 Sep 2014 18:34:53 GMT
 Date: Thu, 25 Sep 2014 18:34:53 GMT
 Pragma: no-cache
 Expires: Thu, 25 Sep 2014 18:34:53 GMT
 Date: Thu, 25 Sep 2014 18:34:53 GMT
 Pragma: no-cache
 Content-Type: text/plain; charset=UTF-8
 Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
 Content-Length: 117
 Server: Jetty(6.1.26)
 This is standby RM. Redirecting to the current active RM: 
 http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3362) Add node label usage in RM CapacityScheduler web UI

2015-05-12 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3362:

Attachment: YARN-3362.20150512-1.patch
2015.05.12_3362_Queue_Hierarchy.png

Hi [~wangda]
Have updated the patch and image with appending label with its available 
resource

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue 
 Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 
 2015.05.10_3362_Queue_Hierarchy.png, 2015.05.12_3362_Queue_Hierarchy.png, 
 CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, 
 Screen Shot 2015-04-29 at 11.42.17 AM.png, 
 YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, 
 YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, 
 YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, 
 YARN-3362.20150512-1.patch, capacity-scheduler.xml


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2015-05-12 Thread Yang Weiwei (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539467#comment-14539467
 ] 

Yang Weiwei commented on YARN-1042:
---

There is no update for a long time, anything new here ? This looks like a nice 
feature that should be helping us a lot, is this a correct direction that we 
should put some more efforts to get this done in RM side ? I noticed there are 
alternatives in slider project, e.g SLIDER-82.
Please advise.

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Arun C Murthy
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.


[ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539495#comment-14539495
 ] 

nijel commented on YARN-3629:
-

Moving the logs message is bit tricky since it logs some parameters which is 
not available in serviceStart. So keeping this log as it is 
Adding a new log message to print the nodeid for information purpose

Any different thought ? 

 NodeID is always printed as null in node manager initialization log.
 --

 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3629-1.patch


 In Node manager log during startup the following logs is printed
 2015-05-12 11:20:02,347 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
 nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
 virtual-cores=8
 This line is printed from NodeStatusUpdaterImpl.serviceInit.
 But the nodeid assignment is happening only in 
 NodeStatusUpdaterImpl.serviceStart
 {code}
   protected void serviceStart() throws Exception {
 // NodeManager is the last service to start, so NodeId is available.
 this.nodeId = this.context.getNodeId();
 {code}
 Assigning the node id in serviceinit is not feasible since it is generated by 
  ContainerManagerImpl.serviceStart.
 The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3629) NodeID is always printed as null in node manager initialization log.


 [ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3629:

Attachment: YARN-3629-1.patch

Please review

 NodeID is always printed as null in node manager initialization log.
 --

 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3629-1.patch


 In Node manager log during startup the following logs is printed
 2015-05-12 11:20:02,347 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
 nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
 virtual-cores=8
 This line is printed from NodeStatusUpdaterImpl.serviceInit.
 But the nodeid assignment is happening only in 
 NodeStatusUpdaterImpl.serviceStart
 {code}
   protected void serviceStart() throws Exception {
 // NodeManager is the last service to start, so NodeId is available.
 this.nodeId = this.context.getNodeId();
 {code}
 Assigning the node id in serviceinit is not feasible since it is generated by 
  ContainerManagerImpl.serviceStart.
 The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


 [ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3411:
-
Attachment: YARN-3411.poc.6.txt

Uploading  a patch with the review suggestions. No major changes, some coding 
updates.

Also, rebased to pull in latest commits (Phoenix related changes). But now I am 
having some trouble getting the timelineservice module to build since it 
includes both the phoenix and hbase dependencies in the pom. I get Some 
Enforcer rules have failed errors.  I am working on resolving those. Any more 
eyes on this build error would help! 

{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:1.3.1:enforce (depcheck) on 
project hadoop-yarn-server-timelineservice: Some Enforcer rules have failed. 
Look above for specific messages explaining why the rule failed. - [Help 1]
[ERROR]
{code}

{code}
[INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ 
hadoop-yarn-server-timelineservice ---
[INFO]
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (depcheck) @ 
hadoop-yarn-server-timelineservice ---
[WARNING]
Dependency convergence error for org.apache.hbase:hbase-common:1.0.1 paths to 
dependency are:
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-client:1.0.1
+-org.apache.hbase:hbase-common:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-testing-util:1.0.1
+-org.apache.hbase:hbase-common:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-testing-util:1.0.1
+-org.apache.hbase:hbase-common:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-common:0.98.9-hadoop2
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-server:1.0.1
  +-org.apache.hbase:hbase-common:0.98.9-hadoop2
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-server:1.0.1
  +-org.apache.hbase:hbase-prefix-tree:0.98.9-hadoop2
+-org.apache.hbase:hbase-common:0.98.9-hadoop2
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-server:1.0.1
  +-org.apache.hbase:hbase-prefix-tree:0.98.9-hadoop2
+-org.apache.hbase:hbase-common:0.98.9-hadoop2
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-server:1.0.1
  +-org.apache.hbase:hbase-common:0.98.9-hadoop2

[WARNING]
Dependency convergence error for org.apache.hbase:hbase-protocol:1.0.1 paths to 
dependency are:
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-client:1.0.1
+-org.apache.hbase:hbase-protocol:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-testing-util:1.0.1
+-org.apache.hbase:hbase-protocol:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-protocol:0.98.9-hadoop2
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-server:1.0.1
  +-org.apache.hbase:hbase-protocol:0.98.9-hadoop2
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-server:1.0.1
  +-org.apache.hbase:hbase-protocol:0.98.9-hadoop2

[WARNING]
Dependency convergence error for org.apache.hbase:hbase-hadoop-compat:1.0.1 
paths to dependency are:
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-testing-util:1.0.1
+-org.apache.hbase:hbase-hadoop-compat:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-testing-util:1.0.1
+-org.apache.hbase:hbase-hadoop-compat:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-testing-util:1.0.1
+-org.apache.hbase:hbase-hadoop2-compat:1.0.1
  +-org.apache.hbase:hbase-hadoop-compat:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-server:1.0.1
  +-org.apache.hbase:hbase-prefix-tree:0.98.9-hadoop2
+-org.apache.hbase:hbase-hadoop-compat:1.0.1
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.phoenix:phoenix-core:4.3.0
+-org.apache.hbase:hbase-server:1.0.1

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539425#comment-14539425
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  1s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732170/YARN-3411.poc.6.txt |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 987abc9 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7879/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

2015-05-12 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539421#comment-14539421
 ] 

zhihai xu commented on YARN-3619:
-

thanks [~kasha] for assigning this JIRA to me.
The root cause is exactly what [~jlowe] said. I just added a little more 
details based on [~jlowe] succinct comment.
{{sampleMetrics}} will be called periodically in MetricsSystemImpl. 
{{sampleMetrics}} will iterate the {{sources}} in the following code:
{code}
for (EntryString, MetricsSourceAdapter entry : sources.entrySet()) {
  if (sourceFilter == null || sourceFilter.accepts(entry.getKey())) {
snapshotMetrics(entry.getValue(), bufferBuilder);
  }
}
{code}
{{snapshotMetrics}} will be called to process every entry from {{sources}}
The calling sequence which leads to a ConcurrentModificationException is 
snapshotMetrics = MetricsSourceAdapter#getMetrics = 
ContainerMetrics#getMetrics = MetricsSystemImpl#unregisterSource = 
sources.remove(name)
the entry in the {{sources}} is removed when iterate the {{sources}}. So 
unregisterSource can't be called from getMetrics.

I will prepare a patch for review.

 ContainerMetrics unregisters during getMetrics and leads to 
 ConcurrentModificationException
 ---

 Key: YARN-3619
 URL: https://issues.apache.org/jira/browse/YARN-3619
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: zhihai xu

 ContainerMetrics is able to unregister itself during the getMetrics method, 
 but that method can be called by MetricsSystemImpl.sampleMetrics which is 
 trying to iterate the sources.  This leads to a 
 ConcurrentModificationException log like this:
 {noformat}
 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN 
 impl.MetricsSystemImpl: java.util.ConcurrentModificationException
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3630) YARN should suggest a heartbeat interval for applications

2015-05-12 Thread JIRA

Zoltán Zvara created YARN-3630:
--

 Summary: YARN should suggest a heartbeat interval for applications
 Key: YARN-3630
 URL: https://issues.apache.org/jira/browse/YARN-3630
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client, resourcemanager, scheduler
Reporter: Zoltán Zvara


It seems currently applications - for example Spark - are not adaptive to RM 
regarding heartbeat intervals. RM should be able to suggest a desired heartbeat 
interval to application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI


[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539483#comment-14539483
 ] 

Hadoop QA commented on YARN-3362:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 47s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  6s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 30s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  51m 59s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m  6s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732150/YARN-3362.20150512-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 987abc9 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7878/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7878/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7878/console |


This message was automatically generated.

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue 
 Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 
 2015.05.10_3362_Queue_Hierarchy.png, 2015.05.12_3362_Queue_Hierarchy.png, 
 CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, 
 Screen Shot 2015-04-29 at 11.42.17 AM.png, 
 YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, 
 YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, 
 YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, 
 YARN-3362.20150512-1.patch, capacity-scheduler.xml


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2015-05-12 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539481#comment-14539481
 ] 

Weiwei Yang commented on YARN-1042:
---

Sorry, not an alternative, SLIDER-82 is depending on this jira.

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Arun C Murthy
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2015-05-12 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2336:

Hadoop Flags: Incompatible change

Marking this issue as incompatible change since this fix includes API change.

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
  Labels: BB2015-05-TBR
 Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, 
 YARN-2336.005.patch, YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2015-05-12 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2336:

Labels: BB2015-05-RFC  (was: BB2015-05-TBR)

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
  Labels: BB2015-05-RFC
 Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, 
 YARN-2336.005.patch, YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3513) Remove unused variables in ContainersMonitorImpl and add debug log for overall resource usage by all containers

2015-05-12 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3513:

Hadoop Flags: Reviewed

Thanks [~Naganarasimha] for the updated patch.

+1, Latest patch looks good to me, will commit it shortly.

 Remove unused variables in ContainersMonitorImpl and add debug log for 
 overall resource usage by all containers 
 

 Key: YARN-3513
 URL: https://issues.apache.org/jira/browse/YARN-3513
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Trivial
  Labels: BB2015-05-TBR, newbie
 Attachments: YARN-3513.20150421-1.patch, YARN-3513.20150503-1.patch, 
 YARN-3513.20150506-1.patch, YARN-3513.20150507-1.patch, 
 YARN-3513.20150508-1.patch, YARN-3513.20150508-1.patch, 
 YARN-3513.20150511-1.patch


 Some local variables in MonitoringThread.run()  : {{vmemStillInUsage and 
 pmemStillInUsage}} are not used and just updated. 
 Instead we need to add debug log for overall resource usage by all containers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster


[ 
https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539468#comment-14539468
 ] 

Hadoop QA commented on YARN-3526:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 58s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   6m 52s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |  52m 11s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 46s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732139/YARN-3526.002.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 3d28611 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7876/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7876/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7876/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7876/console |


This message was automatically generated.

 ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
 -

 Key: YARN-3526
 URL: https://issues.apache.org/jira/browse/YARN-3526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Red Hat Enterprise Linux Server 6.4 
Reporter: Weiwei Yang
Assignee: Weiwei Yang
  Labels: BB2015-05-TBR
 Attachments: YARN-3526.001.patch, YARN-3526.002.patch


 On a QJM HA cluster, view RM web UI to track job status, it shows
 This is standby RM. Redirecting to the current active RM: 
 http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce
 it refreshes every 3 sec but never going to the correct tracking page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1042) add ability to specify affinity/anti-affinity in container requests

2015-05-12 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539482#comment-14539482
 ] 

Weiwei Yang commented on YARN-1042:
---

Sorry, not an alternative, SLIDER-82 is depending on this jira.

 add ability to specify affinity/anti-affinity in container requests
 ---

 Key: YARN-1042
 URL: https://issues.apache.org/jira/browse/YARN-1042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Arun C Murthy
 Attachments: YARN-1042-demo.patch


 container requests to the AM should be able to request anti-affinity to 
 ensure that things like Region Servers don't come up on the same failure 
 zones. 
 Similarly, you may be able to want to specify affinity to same host or rack 
 without specifying which specific host/rack. Example: bringing up a small 
 giraph cluster in a large YARN cluster would benefit from having the 
 processes in the same rack purely for bandwidth reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2015-05-12 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2336:

Attachment: YARN-2336.005.patch

v5 patch
* rebased for the latest trunk
* updated the document

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
  Labels: BB2015-05-TBR
 Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, 
 YARN-2336.005.patch, YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3630) YARN should suggest a heartbeat interval for applications

2015-05-12 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Zvara updated YARN-3630:
---
Description: It seems currently applications - for example Spark - are not 
adaptive to RM regarding heartbeat intervals. RM should be able to suggest a 
desired heartbeat interval to applications.  (was: It seems currently 
applications - for example Spark - are not adaptive to RM regarding heartbeat 
intervals. RM should be able to suggest a desired heartbeat interval to 
application.)

 YARN should suggest a heartbeat interval for applications
 -

 Key: YARN-3630
 URL: https://issues.apache.org/jira/browse/YARN-3630
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client, resourcemanager, scheduler
Reporter: Zoltán Zvara

 It seems currently applications - for example Spark - are not adaptive to RM 
 regarding heartbeat intervals. RM should be able to suggest a desired 
 heartbeat interval to applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540229#comment-14540229
 ] 

Hadoop QA commented on YARN-160:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 43s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m 27s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 30s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | tools/hadoop tests |  14m 37s | Tests passed in 
hadoop-gridmix. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m  2s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  65m  9s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732266/YARN-160.005.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / f4e2b3c |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/whitespace.txt
 |
| hadoop-gridmix test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/testrun_hadoop-gridmix.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7890/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7890/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7890/console |


This message was automatically generated.

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Attachments: YARN-160.005.patch, apache-yarn-160.0.patch, 
 apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods


[ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540324#comment-14540324
 ] 

Karthik Kambatla commented on YARN-3613:


+1. Checking this in. 

 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie
 Attachments: YARN-3613-1.patch


 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540383#comment-14540383
 ] 

Vrushali C commented on YARN-3411:
--

Sounds good, thanks [~djp]! But, would like to request you to wait before 
reviewing again, I am planning to upload a new patch later today. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


[ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540278#comment-14540278
 ] 

Zhijie Shen commented on YARN-3529:
---

I think the reason why you still need version in timelineservice/pom.xml is 
because the non-test scope dependency is not added in hadoop-project/pom.xml
{code}
125   groupIdorg.apache.phoenix/groupId
126   artifactIdphoenix-core/artifactId
127   version${phoenix.version}/version
128   exclusions
129 !-- Exclude jline from here --
130 exclusion
131   artifactIdjline/artifactId
132   groupIdjline/groupId
133 /exclusion
134   /exclusions
{code}

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: AbstractMiniHBaseClusterTest.java, 
 YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, 
 YARN-3529-YARN-2928.002.patch, output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540297#comment-14540297
 ] 

Vrushali C commented on YARN-3411:
--

Ah, I haven't added that patch, thanks [~gtCarrera9] let me give that a try.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put


[ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540148#comment-14540148
 ] 

Hadoop QA commented on YARN-3625:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 36s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 28s | The applied patch generated  1 
new checkstyle issues (total was 6, now 5). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 48s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   3m  8s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  38m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732257/YARN-3625.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f4e2b3c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7889/artifact/patchprocess/diffcheckstylehadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7889/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7889/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7889/console |


This message was automatically generated.

 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
 --

 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3625.1.patch, YARN-3625.2.patch


 RollingLevelDBTimelineStore batches all entities in the same put to improve 
 performance. This causes an error when relating to an entity in the same put 
 however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2015-05-12 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-160:
---
Attachment: YARN-160.006.patch

Uploaded 006.patch to fix whitespace issues.

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Attachments: YARN-160.005.patch, YARN-160.006.patch, 
 apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, 
 apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put


 [ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3625:
--
Target Version/s: 2.8.0

 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
 --

 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3625.1.patch, YARN-3625.2.patch


 RollingLevelDBTimelineStore batches all entities in the same put to improve 
 performance. This causes an error when relating to an entity in the same put 
 however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods


 [ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-3613.

   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed

Thanks for working on this, nijel. Just committed this to trunk and branch-2. 

 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3613-1.patch, yarn-3613-2.patch


 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String

2015-05-12 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3579:
--
Attachment: 0002-YARN-3579.patch

Thank You [~leftnoteasy] for the comments.

I updated the patch as per same. I used a generic method to remove the code 
duplication, kindly check the same.

 getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead 
 of label name as String
 

 Key: YARN-3579
 URL: https://issues.apache.org/jira/browse/YARN-3579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-3579.patch, 0002-YARN-3579.patch


 CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is 
 not passing information such as Exclusivity etc back to REST interface apis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-12 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540211#comment-14540211
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], sorry to hear the trouble. Have you tried to apply the latest 
patch in YARN-3529? In that JIRA I'm trying to make the Phoenix writer works 
with the snapshot version of Phoenix, which lives happily with HBase 1.0.1. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3069) Document missing properties in yarn-default.xml

2015-05-12 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-3069:
-
Attachment: YARN-3069.008.patch

Update for property added in YARN-1912.

 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: BB2015-05-TBR, supportability
 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval
   yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
   yarn.resourcemanager.metrics.runtime.buckets
   yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
   yarn.resourcemanager.reservation-system.class
   yarn.resourcemanager.reservation-system.enable
   yarn.resourcemanager.reservation-system.plan.follower
   yarn.resourcemanager.reservation-system.planfollower.time-step
   yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
   yarn.resourcemanager.webapp.spnego-keytab-file
   yarn.resourcemanager.webapp.spnego-principal
   yarn.scheduler.include-port-in-node-name
   yarn.timeline-service.delegation.key.update-interval
   yarn.timeline-service.delegation.token.max-lifetime
   yarn.timeline-service.delegation.token.renew-interval
   yarn.timeline-service.generic-application-history.enabled
   
 yarn.timeline-service.generic-application-history.fs-history-store.compression-type
   yarn.timeline-service.generic-application-history.fs-history-store.uri
   yarn.timeline-service.generic-application-history.store-class
   yarn.timeline-service.http-cross-origin.enabled
   yarn.tracking.url.generator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues

Sangjin Lee created YARN-3634:
-

 Summary: TestMRTimelineEventHandling is broken due to timing issues
 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


TestMRTimelineEventHandling is broken. Relevant error message:

{noformat}
2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] 
collector.NodeTimelineCollectorManager 
(NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with 
NM Collector Service for application_1431412130291_0001
2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] 
containermanager.AuxServices 
(AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The 
auxService name is timeline_collector and it got an error at event: 
CONTAINER_INIT
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to 
asf904.gq1.ygridcore.net:0 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:97)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.addApplication(PerNodeTimelineCollectorsAuxService.java:99)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.initializeContainer(PerNodeTimelineCollectorsAuxService.java:126)

[jira] [Updated] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods


 [ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3613:
---
Attachment: yarn-3613-2.patch

At commit time, modified the comment that precedes call to 
{{testContainerTokenWithEpoch}}. Attaching the updated diff here.

 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie
 Attachments: YARN-3613-1.patch, yarn-3613-2.patch


 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests

2015-05-12 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3529:

Attachment: YARN-3529-YARN-2928.003.patch

Thanks [~zjshen]! I forgot to move this part. Nice catch! 

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: AbstractMiniHBaseClusterTest.java, 
 YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, 
 YARN-3529-YARN-2928.002.patch, YARN-3529-YARN-2928.003.patch, 
 output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.

2015-05-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540225#comment-14540225
 ] 

Hudson commented on YARN-3629:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7807 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7807/])
YARN-3629. NodeID is always printed as null in node manager (devaraj: rev 
5c2f05cd9bad9bf9beb0f4ca18f4ae1bc3e84499)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


 NodeID is always printed as null in node manager initialization log.
 --

 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Fix For: 2.8.0

 Attachments: YARN-3629-1.patch


 In Node manager log during startup the following logs is printed
 2015-05-12 11:20:02,347 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
 nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
 virtual-cores=8
 This line is printed from NodeStatusUpdaterImpl.serviceInit.
 But the nodeid assignment is happening only in 
 NodeStatusUpdaterImpl.serviceStart
 {code}
   protected void serviceStart() throws Exception {
 // NodeManager is the last service to start, so NodeId is available.
 this.nodeId = this.context.getNodeId();
 {code}
 Assigning the node id in serviceinit is not feasible since it is generated by 
  ContainerManagerImpl.serviceStart.
 The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2015-05-12 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540263#comment-14540263
 ] 

Jonathan Eagles commented on YARN-3625:
---

[~zjshen], this is a small bug that was found. The end result is that sometimes 
there are missing related entities. As a domain is required now, it is safe to 
say that a missing domain can represent an entity that is permitted to relate 
to. Checkstyle issue is existing for this method.

Jon

 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
 --

 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3625.1.patch, YARN-3625.2.patch


 RollingLevelDBTimelineStore batches all entities in the same put to improve 
 performance. This causes an error when relating to an entity in the same put 
 however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues

2015-05-12 Thread Rohit Agarwal (JIRA)

Rohit Agarwal created YARN-3633:
---

 Summary: With Fair Scheduler, cluster can logjam when there are 
too many queues
 Key: YARN-3633
 URL: https://issues.apache.org/jira/browse/YARN-3633
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
Priority: Critical


It's possible to logjam a cluster by submitting many applications at once in 
different queues.

For example, let's say there is a cluster with 20GB of total memory. Let's say 
4 users submit applications at the same time. The fair share of each queue is 
5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 2.5GB memory 
for AMs. If all the users requested AMs of size 3GB - the cluster logjams. 
Nothing gets scheduled even when 20GB of resources are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540303#comment-14540303
]

Vrushali C commented on YARN-3411:
--

bq. Also comments on latest (v5) patch
Thanks [~djp] for the review but the latest patch is v6. So some of the code
lines are no longer there but I will make the other changes like class name
rename and making the table name consistent with Phoenix writer table names and
the code changes for creating the connection etc.

I am working on updating the patch with your review suggestions and the patch
in YARN-3529 which will help me build.

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt

There is work that's in progress to implement the storage based on a Phoenix
schema (YARN-3134).
In parallel, we would like to explore an implementation based on a native
HBase schema for the write path. Such a schema does not exclude using
Phoenix, especially for reads and offline queries.
Once we have basic implementations of both options, we could evaluate them in
terms of performance, scalability, usability, etc. and make a call.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-12 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540373#comment-14540373
 ] 

Junping Du commented on YARN-3411:
--

Sure. I start this rounds of review start several days ago, but cannot get 
finished mostly until today. Sorry for missing the new version of patch, will 
check it quite soon.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3539) Compatibility doc to state that ATS v1 is a stable REST API


[ 
https://issues.apache.org/jira/browse/YARN-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540113#comment-14540113
 ] 

Hadoop QA commented on YARN-3539:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 59s | Site still builds. |
| {color:green}+1{color} | checkstyle |   3m 31s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   1m 38s | The patch has 18  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 15s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | common tests |  23m 28s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| | |  76m 58s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732244/YARN-3539-010.patch |
| Optional Tests | site javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6d5da94 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7886/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7886/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7886/console |


This message was automatically generated.

 Compatibility doc to state that ATS v1 is a stable REST API
 ---

 Key: YARN-3539
 URL: https://issues.apache.org/jira/browse/YARN-3539
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Steve Loughran
Assignee: Steve Loughran
  Labels: BB2015-05-TBR
 Attachments: HADOOP-11826-001.patch, HADOOP-11826-002.patch, 
 TimelineServer.html, YARN-3539-003.patch, YARN-3539-004.patch, 
 YARN-3539-005.patch, YARN-3539-006.patch, YARN-3539-007.patch, 
 YARN-3539-008.patch, YARN-3539-009.patch, YARN-3539-010.patch, 
 timeline_get_api_examples.txt


 The ATS v2 discussion and YARN-2423 have raised the question: how stable are 
 the ATSv1 APIs?
 The existing compatibility document actually states that the History Server 
 is [a stable REST 
 API|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#REST_APIs],
  which effectively means that ATSv1 has already been declared as a stable API.
 Clarify this by patching the compatibility document appropriately



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups


[ 
https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540184#comment-14540184
 ] 

Hadoop QA commented on YARN-1297:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 14s | The applied patch generated  2 
new checkstyle issues (total was 180, now 179). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 16s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  52m 19s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 20s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732250/YARN-1297.4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6d5da94 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7887/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7887/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7887/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7887/console |


This message was automatically generated.

 Miscellaneous Fair Scheduler speedups
 -

 Key: YARN-1297
 URL: https://issues.apache.org/jira/browse/YARN-1297
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Sandy Ryza
Assignee: Arun Suresh
 Attachments: YARN-1297-1.patch, YARN-1297-2.patch, YARN-1297.3.patch, 
 YARN-1297.4.patch, YARN-1297.4.patch, YARN-1297.patch, YARN-1297.patch


 I ran the Fair Scheduler's core scheduling loop through a profiler tool and 
 identified a bunch of minimally invasive changes that can shave off a few 
 milliseconds.
 The main one is demoting a couple INFO log messages to DEBUG, which brought 
 my benchmark down from 16000 ms to 6000.
 A few others (which had way less of an impact) were
 * Most of the time in comparisons was being spent in Math.signum.  I switched 
 this to direct ifs and elses and it halved the percent of time spent in 
 comparisons.
 * I removed some unnecessary instantiations of Resource objects
 * I made it so that queues' usage wasn't calculated from the applications up 
 each time getResourceUsage was called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-12 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540220#comment-14540220
 ] 

Junping Du commented on YARN-3411:
--

Also comments on latest (v5) patch:
{code}
+public class CreateSchema {
{code}
Can we rename it to a more concrete name, something like: TimelineSchemaCreator?

{code}
+  private static int createTimelineEntityTable() {
+try {
+  Configuration config = HBaseConfiguration.create();
+  // add the hbase configuration details from classpath
+  config.addResource(hbase-site.xml);
+  Connection conn = ConnectionFactory.createConnection(config);
+  Admin admin = conn.getAdmin();
...
{code}
All of these code should be reused by create other tables. May be we should 
move it out of createTimelineEntityTable() and make it as static part of Class?

{code}
+  if (admin.tableExists(table)) {
+// do not disable / delete existing table
+// similar to the approach taken by map-reduce jobs when
+// output directory exists
+LOG.error(Table  + table.getNameAsString() +  already exists.);
+return 1;
+  }
{code}
We would like to throw exception here so user can get notified the failed 
reason immediately?

{code}
+  // TTL is 30 days, need to make it configurable perhaps
+  cf3.setTimeToLive(2592000);
{code}
We shouldn't have a hard code value here. At least, add a TODO in comment to 
fix it later.

In HBaseTimelineWriterImpl.java,
{code}
+// TODO right now using a default table name
+// change later to use a config driven table name
+entityTableName = TableName
+.valueOf(EntityTableDetails.DEFAULT_ENTITY_TABLE_NAME);
{code}
Shall we consistent with table name of Pheonix writer if haven't make it 
configurable? Or we intent to do so for some reasons?

{code}
+  if (entityPuts.size()  0) {
+LOG.info(Storing  + entityPuts.size() +  to 
++ this.entityTableName.getNameAsString());
+entityTable.put(entityPuts);
+  } else {
+LOG.warn(empty entity object?);
+  }
{code}
The first log should be DEBUG level and wrap with if block of 
LOG.isDebugEnabled() which help performance.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540371#comment-14540371
 ] 

Wangda Tan commented on YARN-3635:
--

+[~vinodkv], [~kasha], [~jlowe], [~jianhe]

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan

 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

Wangda Tan created YARN-3635:


 Summary: Get-queue-mapping should be a common interface of 
YarnScheduler
 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan


Currently, both of fair/capacity scheduler support queue mapping, which makes 
scheduler can change queue of an application after submitted to scheduler.

One issue of doing this in specific scheduler is: If the queue after mapping 
has different maximum_allocation/default-node-label-expression of the original 
queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong 
queue.

I propose to make the queue mapping as a common interface of scheduler, and 
RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes

2015-05-12 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3632:
--
Attachment: YARN-3632.0.patch

 Ordering policy should be allowed to reorder an application when demand 
 changes
 ---

 Key: YARN-3632
 URL: https://issues.apache.org/jira/browse/YARN-3632
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3632.0.patch


 At present, ordering policies have the option to have an application 
 re-ordered (for allocation and preemption) when it is allocated to or a 
 container is recovered from the application.  Some ordering policies may also 
 need to reorder when demand changes if that is part of the ordering 
 comparison, this needs to be made available (and used by the 
 fairorderingpolicy when sizebasedweight is true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3627) Preemption not triggered in Fair scheduler when maxResources is set on parent queue


[ 
https://issues.apache.org/jira/browse/YARN-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540111#comment-14540111
 ] 

Karthik Kambatla commented on YARN-3627:


I see this as a duplicate or closely related to YARN-3405. [~bibinchundatt] - 
are you able to try out the patch there and see if it solves the issue for 
here. I  ll try and get to YARN-3405 later this week. 

 Preemption not triggered in Fair scheduler when maxResources is set on parent 
 queue
 ---

 Key: YARN-3627
 URL: https://issues.apache.org/jira/browse/YARN-3627
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, scheduler
 Environment: Suse 11 SP3, 2 NM 
Reporter: Bibin A Chundatt

 Consider the below scenario of fair configuration 
  
 Root (10Gb cluster resource)
 --Q1 (maxResources  4gb) 
 Q1.1 (maxResources 4gb) 
 Q1.2  (maxResources  4gb) 
 --Q2 (maxResources 6GB)
  
 No applications are running in Q2
  
 Submit one application with to Q1.1 with 50 maps   4Gb gets allocated to Q1.1
 Now submit application to  Q1.2 the same will be starving for memory always.
  
 Preemption will never get triggered since 
 yarn.scheduler.fair.preemption.cluster-utilization-threshold =.8 and the 
 cluster utilization is below .8.
  
 *Fairscheduler.java*
 {code}
   private boolean shouldAttemptPreemption() {
 if (preemptionEnabled) {
   return (preemptionUtilizationThreshold  Math.max(
   (float) rootMetrics.getAllocatedMB() / clusterResource.getMemory(),
   (float) rootMetrics.getAllocatedVirtualCores() /
   clusterResource.getVirtualCores()));
 }
 return false;
   }
 {code}
 Are we supposed to configure in running cluster maxResources  0mb and 0 
 cores  so that all queues can take full cluster resources always if 
 available??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3629) NodeID is always printed as null in node manager initialization log.

2015-05-12 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3629:

Target Version/s: 2.8.0
Hadoop Flags: Reviewed

Thanks [~nijel] for your contribution.

+1, patch looks good to me.

 NodeID is always printed as null in node manager initialization log.
 --

 Key: YARN-3629
 URL: https://issues.apache.org/jira/browse/YARN-3629
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: nijel
Assignee: nijel
 Attachments: YARN-3629-1.patch


 In Node manager log during startup the following logs is printed
 2015-05-12 11:20:02,347 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized 
 nodemanager for *null* : physical-memory=4096 virtual-memory=8602 
 virtual-cores=8
 This line is printed from NodeStatusUpdaterImpl.serviceInit.
 But the nodeid assignment is happening only in 
 NodeStatusUpdaterImpl.serviceStart
 {code}
   protected void serviceStart() throws Exception {
 // NodeManager is the last service to start, so NodeId is available.
 this.nodeId = this.context.getNodeId();
 {code}
 Assigning the node id in serviceinit is not feasible since it is generated by 
  ContainerManagerImpl.serviceStart.
 The log can be moved to service start to give right information to user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3579) getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead of label name as String


[ 
https://issues.apache.org/jira/browse/YARN-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540210#comment-14540210
 ] 

Hadoop QA commented on YARN-3579:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 55s | The applied patch generated  1 
new checkstyle issues (total was 33, now 33). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 15  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 28s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  38m 42s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-common |
|  |  Comparison of String objects using == or != in 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.createNodeLabelFromLabelNames(Set)
   At CommonNodeLabelsManager.java:== or != in 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.createNodeLabelFromLabelNames(Set)
   At CommonNodeLabelsManager.java:[line 1014] |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732271/0002-YARN-3579.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f4e2b3c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7892/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7892/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7892/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7892/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7892/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7892/console |


This message was automatically generated.

 getLabelsToNodes in CommonNodeLabelsManager should support NodeLabel instead 
 of label name as String
 

 Key: YARN-3579
 URL: https://issues.apache.org/jira/browse/YARN-3579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Sunil G
Assignee: Sunil G
Priority: Minor
 Attachments: 0001-YARN-3579.patch, 0002-YARN-3579.patch


 CommonNodeLabelsManager#getLabelsToNodes returns label name as string. It is 
 not passing information such as Exclusivity etc back to REST interface apis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-12 Thread Robert Kanter (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540288#comment-14540288
]

Robert Kanter commented on YARN-2942:
-

Thanks [~jlowe] for your feedback. It's good to get more views on this.

{quote} If I understand them correctly they both propose that the NMs upload
the original per-node aggregated log to HDFS and then something (either the NMs
or the RM) later comes along and creates the aggregate-of-aggregates log{quote}
Yes. That's correct.

{quote}However I didn't see details on solving the race condition where a log
reader comes along, sees from the index file that the desired log isn't in the
aggregate-of-aggregates, then opens the log and reads from it just as the log
is deleted by the entity appending to the aggregate-of-aggregates.{quote}
That's a good point. I hadn't thought of that issue. Thinking about it now, I
think there's a few options here:
- We could simply have the reader try again if it runs into a problem
- We could have the last NM delete the aggregated log files, so that it's less
likely that this situation can occur
- Each NM could wait some amount of time (e.g. a few mins) after appending it's
log file before deleting the original file, so that it's less likely that this
situation can occur

{quote}We have an internal solution where we create per-application har files
of the logs{quote}
Can you give some more details on this? Is it something you can share? If
you've already solved this issue, then perhaps we can just use that. Though
doesn't creating har files require running an MR job?

{quote}Another issue from log aggregation we've seen in practice is that the
proposals don't address the significant write load the per-node aggregate files
place on the namenode.{quote}
That's a good point. Shortly after a job finishes, all of the involved NMs
would upload their log files around the same time, which puts stress on the NN.
The NM giving the RM reports of the current aggregation progress was recently
added by YARN-1376 and related. Having the RM coordinate the aggregation is
similar to my design with ZK, but instead of a ZK lock, the RM orchestrates
things. I like the idea of getting rid of the original aggregation and having
the NMs all write to HDFS once, in the combined file directly. We'd have to
implement your last bullet point to have the NMs serve the logs in the
meantime, as I don't think that's there today.

I'll try to flesh this design out a bit more and see where it goes. Unless we
should use har files; though that adds an MR dependency.

Aggregated Log Files should be combined
---

Key: YARN-2942
URL: https://issues.apache.org/jira/browse/YARN-2942
Project: Hadoop YARN
Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Attachments: CombinedAggregatedLogsProposal_v3.pdf,
CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf,
CompactedAggregatedLogsProposal_v1.pdf,
CompactedAggregatedLogsProposal_v2.pdf,
ConcatableAggregatedLogsProposal_v4.pdf,
ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch,
YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch,
YARN-2942.003.patch

Turning on log aggregation allows users to easily store container logs in
HDFS and subsequently view them in the YARN web UIs from a central place.
Currently, there is a separate log file for each Node Manager. This can be a
problem for HDFS if you have a cluster with many nodes as you’ll slowly start
accumulating many (possibly small) files per YARN application. The current
“solution” for this problem is to configure YARN (actually the JHS) to
automatically delete these files after some amount of time.
We should improve this by compacting the per-node aggregated log files into
one log file per application.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3613) TestContainerManagerSecurity should init and start Yarn cluster in setup instead of individual methods

2015-05-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540337#comment-14540337
 ] 

Hudson commented on YARN-3613:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7808 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7808/])
YARN-3613. TestContainerManagerSecurity should init and start Yarn cluster in 
setup instead of individual methods. (nijel via kasha) (kasha: rev 
fe0df596271340788095cb43a1944e19ac4c2cf7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


 TestContainerManagerSecurity should init and start Yarn cluster in setup 
 instead of individual methods
 --

 Key: YARN-3613
 URL: https://issues.apache.org/jira/browse/YARN-3613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: nijel
Priority: Minor
  Labels: newbie
 Fix For: 2.8.0

 Attachments: YARN-3613-1.patch, yarn-3613-2.patch


 In TestContainerManagerSecurity, individual tests init and start Yarn 
 cluster. This duplication can be avoided by moving that to setup. 
 Further, one could merge the two @Test methods to avoid bringing up another 
 mini-cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server


[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540377#comment-14540377
 ] 

Hadoop QA commented on YARN-2556:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 33s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 41s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | mapreduce tests | 109m 52s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| | | 126m 40s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | org.apache.hadoop.mapred.TestMRIntermediateDataEncryption |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732269/YARN-2556.5.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / f4e2b3c |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7891/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7891/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7891/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7891/console |


This message was automatically generated.

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Chang Li
  Labels: BB2015-05-TBR
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, 
 YARN-2556.5.patch, YARN-2556.patch, yarn2556.patch, yarn2556.patch, 
 yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540411#comment-14540411
 ] 

Hadoop QA commented on YARN-3069:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 41s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 23s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-common. |
| | |  38m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732298/YARN-3069.008.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fe0df59 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7894/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7894/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7894/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7894/console |


This message was automatically generated.

 Document missing properties in yarn-default.xml
 ---

 Key: YARN-3069
 URL: https://issues.apache.org/jira/browse/YARN-3069
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: BB2015-05-TBR, supportability
 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
 YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
 YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch


 The following properties are currently not defined in yarn-default.xml.  
 These properties should either be
   A) documented in yarn-default.xml OR
   B)  listed as an exception (with comments, e.g. for internal use) in the 
 TestYarnConfigurationFields unit test
 Any comments for any of the properties below are welcome.
   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
   security.applicationhistory.protocol.acl
   yarn.app.container.log.backups
   yarn.app.container.log.dir
   yarn.app.container.log.filesize
   yarn.client.app-submission.poll-interval
   yarn.client.application-client-protocol.poll-timeout-ms
   yarn.is.minicluster
   yarn.log.server.url
   yarn.minicluster.control-resource-monitoring
   yarn.minicluster.fixed.ports
   yarn.minicluster.use-rpc
   yarn.node-labels.fs-store.retry-policy-spec
   yarn.node-labels.fs-store.root-dir
   yarn.node-labels.manager-class
   yarn.nodemanager.container-executor.os.sched.priority.adjustment
   yarn.nodemanager.container-monitor.process-tree.class
   yarn.nodemanager.disk-health-checker.enable
   yarn.nodemanager.docker-container-executor.image-name
   yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
   yarn.nodemanager.linux-container-executor.group
   yarn.nodemanager.log.deletion-threads-count
   yarn.nodemanager.user-home-dir
   yarn.nodemanager.webapp.https.address
   yarn.nodemanager.webapp.spnego-keytab-file
   yarn.nodemanager.webapp.spnego-principal
   yarn.nodemanager.windows-secure-container-executor.group
   yarn.resourcemanager.configuration.file-system-based-store
   yarn.resourcemanager.delegation-token-renewer.thread-count
   yarn.resourcemanager.delegation.key.update-interval
   yarn.resourcemanager.delegation.token.max-lifetime
   yarn.resourcemanager.delegation.token.renew-interval

[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues

2015-05-12 Thread Rohit Agarwal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540491#comment-14540491
]

Rohit Agarwal commented on YARN-3633:
-

So, in essence the problem is that when there are too many queues, the fair
share of each queue gets low and thus the maxAMShare, which is calculated from
the fairShare of each queue, gets too low to run any container.

I propose the following solution:
Instead of setting
{code}
maxAMShare = 0.5*fairShare
{code}
we set it to
{code}
maxAMShare = max(0.5*fairShare, SomeMinimumSizeEnoughToRunOneContainer)
{code}
And then add a cluster-wide maxAMShare to be {{0.5*totalClusterCapacity}}

All these ratios/values can be configurable.

So, in the scenario described in the JIRA, we would still run AMs in some
queues but we won't overrun the cluster with AMs because it will hit the
cluster-wide limit.

If this proposal sounds reasonable, I can start working on this.

However, I am not sure how this would interact with preemption.

With Fair Scheduler, cluster can logjam when there are too many queues
--

Key: YARN-3633
URL: https://issues.apache.org/jira/browse/YARN-3633
Project: Hadoop YARN
Issue Type: Bug
Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
Priority: Critical

It's possible to logjam a cluster by submitting many applications at once in
different queues.
For example, let's say there is a cluster with 20GB of total memory. Let's
say 4 users submit applications at the same time. The fair share of each
queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most
2.5GB memory for AMs. If all the users requested AMs of size 3GB - the
cluster logjams. Nothing gets scheduled even when 20GB of resources are
available.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


[ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540497#comment-14540497
 ] 

Hadoop QA commented on YARN-3529:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 25s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 11s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 56s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 47s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 44s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m  3s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  39m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732304/YARN-3529-YARN-2928.003.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b3b791b |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7895/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7895/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7895/console |


This message was automatically generated.

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: AbstractMiniHBaseClusterTest.java, 
 YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, 
 YARN-3529-YARN-2928.002.patch, YARN-3529-YARN-2928.003.patch, 
 output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken


 [ 
https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3634:
--
Attachment: YARN-3634-YARN-2928.002.patch

Patch v.2 posted.

- fixed the findbugs issue
- fixed the TestApplication tests (existing failure)

 TestMRTimelineEventHandling and TestApplication are broken
 --

 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3634-YARN-2928.001.patch, 
 YARN-3634-YARN-2928.002.patch


 TestMRTimelineEventHandling is broken. Relevant error message:
 {noformat}
 2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] 
 collector.NodeTimelineCollectorManager 
 (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with 
 NM Collector Service for application_1431412130291_0001
 2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] 
 containermanager.AuxServices 
 (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The 
 auxService name is timeline_collector and it got an error at event: 
 CONTAINER_INIT
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 
 to asf904.gq1.ygridcore.net:0 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at

[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-05-12 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540407#comment-14540407
]

Jason Lowe commented on YARN-2942:
--

bq. Can you give some more details on this? Is it something you can share?

It's a hack to help mitigate the log aggregation namespace scaling issues on
our large clusters. Essentially its a periodic process to run an Oozie
workflow that does the following:

# determines which applications are good candidates for log archiving (i.e.:
lots of files and total size is not that big)
# runs a streaming job with a shell script that uses the list of applications
to aggregate as input
# for each application it runs a local-mode archive job to archive the log
contents
# when the archive has been created it swaps out the application directory with
a symlink into the har archive

The symlink makes the archive transparent to the readers. Both the JHS and the
yarn logs command use FileContext and just worked with the symlink into the
har without modifications.

So yes, we are running a MapReduce job to archive the logs which itself will
create more logs. However it processes many application logs for each
archiving job. If there is sufficient interest we can pursue how to share it,
but the script is specific to how we configure our nodes and clusters and
relies on unsupported symlinks. I'm hoping the outcome of this JIRA allows us
to move away from the need for it.

bq. We'd have to implement your last bullet point to have the NMs serve the
logs in the meantime, as I don't think that's there today.

That feature is indeed there today. Links to the app logs on the NM will try
to serve the local app logs first, then redirect to the log server if the local
logs are unavailable. See NMController and ContainerLogsPage. It only becomes
an issue when things link to the aggregated log server directly before the NM
has finished aggregating them.

Aggregated Log Files should be combined
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets

2015-05-12 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-3624:

Target Version/s: 2.7.1  (was: 2.6.1)

 ApplicationHistoryServer reverses the order of the filters it gets
 --

 Key: YARN-3624
 URL: https://issues.apache.org/jira/browse/YARN-3624
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-3624.patch


 AppliactionHistoryServer should not alter the order in which it gets the 
 filter chain. Additional filters should be added at the end of the chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-05-12 Thread Naganarasimha G R (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540511#comment-14540511
]

Naganarasimha G R commented on YARN-3044:
-

bq. Sorry to put my comments at last minute.
No probs, better late than never :)

bq. 1. I incline to only having ContainerEntity, but RM and NM may put
different info/event about it based on their knowledge.
+1 for different event, this would be sufficient to capture difference in time
being displayed when published from RM and NM(earlier we were having different
container entity for this reason.). [~djp] please inform if anything else can
differ if published from RM and NM which needs to be captured seperately.

bq. 2. Should v1 and v2 publisher only differentiate at publishEvent,
however, it seems that we duplicate code more than that. And perhaps defining
and implementing SystemMetricsEvent.toTimelineEvent can further cleanup the
code.
May be i did not get this clearly, but AFAIK the packages and classes for the
Timeline events entities are different and the way we publish entities is
also different, so though the code looks duplicated i think nutting further to
be reduced/cleaned up here.

bq. 3. I saw v2 is going to send config, but where the config is coming from.
Did we conclude who and how to send the config? IAC, sending config seems to be
half done.
Well i had raised config related queries earlier as it dint get concluded was
planning to get it done as part of a new jira, AFAIK intention in ATS is to
collect the App side configs than server side ones. And RM will not be aware of
App configs, so my initial idea was to support additional interface in the
client to publish Application specific configs. Correct me if i am wrong and
also inform whether its ok to handle configs in another jira.

{quote}
And we can use entity.addConfigs(event.getConfig());. No need to iterate over
config collection and put each config one-by-one.
4. yarn.system-metrics-publisher.rm.publish.container-metrics -
yarn.rm.system-metrics-publisher.emit-container-events?
5. Methods/innner classes in SystemMetricsPublisher don't need to be changed to
public. Default is enough to access them?
{quote}
will get these corrected.

bq. Moreover, I also think we should not have
yarn.system-metrics-publisher.enabled too, and reuse the existing config. And
it's not limited to RM metrics publisher, but all existing ATS service. IMHO,
the better practice is to reuse the existing config. And we can have a global
config (or env var) timeline-service.version to determine the service is
enabled with v1 or v2 implementation. Anyway, it's a separate problem, I'll
file a separate jira for it.
{{yarn.system-metrics-publisher.rm.publish.container-metrics}} has been
additionally added just to ensure container life cycle metrics are not emitted
always from RM and only if required we will publishing it.
Initially in YARN-3034
[discussions|https://issues.apache.org/jira/browse/YARN-3034?focusedCommentId=14359174page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14359174]
we wanted to proceed with having single config like
{{yarn.system-metrics-publisher.enabled}} (as existing one
{{yarn.resourcemanager.system-metrics-publisher.enabled}} was specific to RM)
and have {{yarn.timeline-service.version}}
but you had
[commented|https://issues.apache.org/jira/browse/YARN-3034?focusedCommentId=14376575page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14376575]
to have single config {{yarn.system-metrics-publisher.enabled}} and hence i
had remodified.

[Event producers] Implement RM writing app lifecycle events to ATS
--

Key: YARN-3044
URL: https://issues.apache.org/jira/browse/YARN-3044
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
Labels: BB2015-05-TBR
Attachments: YARN-3044-YARN-2928.004.patch,
YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch,
YARN-3044-YARN-2928.007.patch, YARN-3044.20150325-1.patch,
YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch

Per design in YARN-2928, implement RM writing app lifecycle events to ATS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken


 [ 
https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3634:
--
Description: 
TestMRTimelineEventHandling is broken. Relevant error message:

{noformat}
2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] 
collector.NodeTimelineCollectorManager 
(NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with 
NM Collector Service for application_1431412130291_0001
2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] 
containermanager.AuxServices 
(AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The 
auxService name is timeline_collector and it got an error at event: 
CONTAINER_INIT
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to 
asf904.gq1.ygridcore.net:0 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:97)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.addApplication(PerNodeTimelineCollectorsAuxService.java:99)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.initializeContainer(PerNodeTimelineCollectorsAuxService.java:126)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:226)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:49)
at

[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server


 [ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-2556:
---
Attachment: YARN-2556.7.patch

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Chang Li
  Labels: BB2015-05-TBR
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, 
 YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.patch, 
 yarn2556.patch, yarn2556.patch, yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-05-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3635:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-1317

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan

 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state


 [ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-2421:
---
Attachment: YARN-2421.6.patch

 CapacityScheduler still allocates containers to an app in the FINISHING state
 -

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 yarn2421.patch, yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler

2015-05-12 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540586#comment-14540586
 ] 

Vinod Kumar Vavilapalli commented on YARN-3635:
---

+10 for this.

In addition to the interfaces themselves, I'd like us to consolidate the 
concrete mapping-rules that we have in each scheduler. Ideally, we only need 
one set of rules acceptable by all schedulers. If not that, I'd live with ~80% 
common rules.

BTW, thematically this fits into YARN-1317, making it a sub-task.

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan

 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2369) Environment variable handling assumes values should be appended

2015-05-12 Thread Dustin Cote (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated YARN-2369:
--
Attachment: YARN-2369-3.patch

[~vinodkv] here's v3 of the patch.  I've got a new unit test in this one and 
I'm using MRJobConfig now for the property (now with a new and improved name).  
I think I've trimmed down the lines, but if something looks misplaced please 
let me know.  Thanks!

 Environment variable handling assumes values should be appended
 ---

 Key: YARN-2369
 URL: https://issues.apache.org/jira/browse/YARN-2369
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jason Lowe
Assignee: Dustin Cote
 Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch


 When processing environment variables for a container context the code 
 assumes that the value should be appended to any pre-existing value in the 
 environment.  This may be desired behavior for handling path-like environment 
 variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
 non-intuitive and harmful way to handle any variable that does not have 
 path-like semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1317) Make Queue, QueueACLs and QueueMetrics first class citizens in YARN

2015-05-12 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540588#comment-14540588
 ] 

Vinod Kumar Vavilapalli commented on YARN-1317:
---

Also added YARN-3635 as a sub-task - the goal is to make mapping-rules a first 
class citizen.

 Make Queue, QueueACLs and QueueMetrics first class citizens in YARN
 ---

 Key: YARN-1317
 URL: https://issues.apache.org/jira/browse/YARN-1317
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 Today, we are duplicating the exact same code in all the schedulers. Queue is 
 a top class concept - clientService, web-services etc already recognize queue 
 as a top level concept.
 We need to move Queue, QueueMetrics and QueueACLs to be top level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540416#comment-14540416
 ] 

Hadoop QA commented on YARN-160:


\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 44s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 27s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 31s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | tools/hadoop tests |  14m 47s | Tests passed in 
hadoop-gridmix. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m  3s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  65m 21s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732288/YARN-160.006.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / 5c2f05c |
| hadoop-gridmix test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7893/artifact/patchprocess/testrun_hadoop-gridmix.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7893/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7893/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7893/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7893/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7893/console |


This message was automatically generated.

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
  Labels: BB2015-05-TBR
 Attachments: YARN-160.005.patch, YARN-160.006.patch, 
 apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, 
 apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server


 [ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-2556:
---
Attachment: YARN-2556.6.patch

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Chang Li
  Labels: BB2015-05-TBR
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, 
 YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.patch, yarn2556.patch, 
 yarn2556.patch, yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server


[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540581#comment-14540581
 ] 

Hadoop QA commented on YARN-2556:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m 13s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   4m 43s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732342/YARN-2556.6.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / f24452d |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7897/console |


This message was automatically generated.

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Chang Li
  Labels: BB2015-05-TBR
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 YARN-2556.1.patch, YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, 
 YARN-2556.5.patch, YARN-2556.6.patch, YARN-2556.patch, yarn2556.patch, 
 yarn2556.patch, yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken


 [ 
https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3634:
--
Summary: TestMRTimelineEventHandling and TestApplication are broken  (was: 
TestMRTimelineEventHandling is broken due to timing issues)

 TestMRTimelineEventHandling and TestApplication are broken
 --

 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3634-YARN-2928.001.patch


 TestMRTimelineEventHandling is broken. Relevant error message:
 {noformat}
 2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] 
 collector.NodeTimelineCollectorManager 
 (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with 
 NM Collector Service for application_1431412130291_0001
 2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] 
 containermanager.AuxServices 
 (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The 
 auxService name is timeline_collector and it got an error at event: 
 CONTAINER_INIT
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 
 to asf904.gq1.ygridcore.net:0 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at

[jira] [Commented] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues


[ 
https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540526#comment-14540526
 ] 

Hadoop QA commented on YARN-3634:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  0s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 46s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 32s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 39s | The patch appears to introduce 1 
new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |   5m 53s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  43m 42s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
|  |  Boxing/unboxing to parse a primitive 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(NMTokenIdentifier,
 ContainerTokenIdentifier, StartContainerRequest)  At 
ContainerManagerImpl.java:org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(NMTokenIdentifier,
 ContainerTokenIdentifier, StartContainerRequest)  At 
ContainerManagerImpl.java:[line 881] |
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.application.TestApplication |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732309/YARN-3634-YARN-2928.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b3b791b |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7896/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7896/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7896/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7896/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7896/console |


This message was automatically generated.

 TestMRTimelineEventHandling is broken due to timing issues
 --

 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3634-YARN-2928.001.patch


 TestMRTimelineEventHandling is broken. Relevant error message:
 {noformat}
 2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried

[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put


[ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540730#comment-14540730
 ] 

Zhijie Shen commented on YARN-3625:
---

Hi, Jonathan! Would you mind give me an example to help understanding why the 
entity exists but the entity's domain is missing?

BTW, there's a special logic for LeveldbTimelineStore, where domain is 
implemented after the first version of store is done. So we need to to be 
compatible with the existing db data, which doesn't have domain. For 
RollingLevelDBTimelineStore, this shouldn't be a problem, right? We don't need 
the special treatment as well as the test case for it.

 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
 --

 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3625.1.patch, YARN-3625.2.patch


 RollingLevelDBTimelineStore batches all entities in the same put to improve 
 performance. This causes an error when relating to an entity in the same put 
 however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-12 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3505:

Attachment: YARN-3505.5.patch

new patch addressed all the latest comments

 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI


[ 
https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540756#comment-14540756
 ] 

Wangda Tan commented on YARN-3362:
--

[~Naganarasimha], thanks for updating, the latest result looks great!

About the queue-hierarchy discussion, I think maybe one alternative is, make a 
hierarchy queue names, but align usage bars, like following:
{code}
root[-- 100% used]
  - a   [- 60% used]
- a1[- 40% used]
- a2[---]
  - b   [-]
- b1[-]
  - b11 [--]
{code}
Which can also help comparing queue's resource but doesn't need a extra button 
to hide/show queue hierarchy?

 Add node label usage in RM CapacityScheduler web UI
 ---

 Key: YARN-3362
 URL: https://issues.apache.org/jira/browse/YARN-3362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, webapp
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: 2015.05.06 Folded Queues.png, 2015.05.06 Queue 
 Expanded.png, 2015.05.07_3362_Queue_Hierarchy.png, 
 2015.05.10_3362_Queue_Hierarchy.png, 2015.05.12_3362_Queue_Hierarchy.png, 
 CSWithLabelsView.png, No-space-between-Active_user_info-and-next-queues.png, 
 Screen Shot 2015-04-29 at 11.42.17 AM.png, 
 YARN-3362.20150428-3-modified.patch, YARN-3362.20150428-3.patch, 
 YARN-3362.20150506-1.patch, YARN-3362.20150507-1.patch, 
 YARN-3362.20150510-1.patch, YARN-3362.20150511-1.patch, 
 YARN-3362.20150512-1.patch, capacity-scheduler.xml


 We don't have node label usage in RM CapacityScheduler web UI now, without 
 this, user will be hard to understand what happened to nodes have labels 
 assign to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540794#comment-14540794
 ] 

Wangda Tan commented on YARN-1197:
--

[~mding],
Thanks for interesting in this ticket, some comments:
bq. For JVM based containers (e.g., container running HBase), it is not 
possible right now to change the heap size of JVM without restarting the Java 
process. Even if we can implement a wrapper in the container to relaunch a Java 
process when resource is changed for a container, we still need to implement an 
interface between node manager and container to trigger the relaunch action.
Good point, this is one thing we noted as well. I don't think there's any easy 
solution to shrink JVM. Relaunch the container could be one method, but it will 
be hard to make a generic container wrapper since kill and relaunch will make 
data in memory lost.

But since the shrink memory is a proactive action, when a process wants to 
shrink its resource, it can use its own container wrapper to relaunch the 
process if it has some data recovery mechanism.


 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2015-05-12 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540808#comment-14540808
 ] 

Jonathan Eagles commented on YARN-3625:
---

[~zjshen], one difference between Leveldb and RollingLevelDB is in the way that 
batch writes are done. In LeveldbTimelineStore, each entity is processed and 
written to the db before the next. In RollingLevelDBTimelineStore, all entities 
in the put are processed one after then next, however they are written all 
together in one batch. This has created a temporary inconsistency for 
RollingLevelDB where related entities in the same put have the start time in 
the db, but nothing else until the last entity in the put is processed. To 
handle this scenario, I relax the domain checking to mean if a domain is 
non-existent we can treat this as if we are in the temporary state we have 
staged the domain to be written for the related entity but have not yet written 
it to the database.
Jon

 RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
 --

 Key: YARN-3625
 URL: https://issues.apache.org/jira/browse/YARN-3625
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-3625.1.patch, YARN-3625.2.patch


 RollingLevelDBTimelineStore batches all entities in the same put to improve 
 performance. This causes an error when relating to an entity in the same put 
 however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2369) Environment variable handling assumes values should be appended


[ 
https://issues.apache.org/jira/browse/YARN-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540781#comment-14540781
 ] 

Hadoop QA commented on YARN-2369:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m  0s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 17s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 18s | The applied patch generated  7 
new checkstyle issues (total was 176, now 182). |
| {color:red}-1{color} | checkstyle |   3m 56s | The applied patch generated  
15 new checkstyle issues (total was 509, now 524). |
| {color:red}-1{color} | checkstyle |   4m 32s | The applied patch generated  2 
new checkstyle issues (total was 211, now 213). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 14  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 58s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   8m  2s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | common tests |  24m 45s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | mapreduce tests |   0m 46s | Tests passed in 
hadoop-mapreduce-client-common. |
| {color:green}+1{color} | mapreduce tests |   1m 36s | Tests passed in 
hadoop-mapreduce-client-core. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| | |  83m 46s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732347/YARN-2369-3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f24452d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/diffcheckstylehadoop-common.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| hadoop-mapreduce-client-core test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7898/console |


This message was automatically generated.

 Environment variable handling assumes values should be appended
 ---

 Key: YARN-2369
 URL: https://issues.apache.org/jira/browse/YARN-2369
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jason Lowe
Assignee: Dustin Cote
 Attachments: YARN-2369-1.patch, YARN-2369-2.patch, YARN-2369-3.patch


 When processing environment variables for a container context the code 
 assumes that the value should be appended to any pre-existing value in the 
 environment.  This may be desired behavior for handling path-like environment 
 variables such as PATH, LD_LIBRARY_PATH, CLASSPATH, etc. but it is a 
 non-intuitive and harmful way to handle any variable that does not have 
 path-like semantics.

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

[
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540783#comment-14540783
]

Zhijie Shen commented on YARN-3044:
---

bq. Well i had raised config related queries earlier as it dint get concluded
was planning to get it done as part of a new jira, AFAIK intention in ATS is to
collect the App side configs than server side ones. And RM will not be aware of
App configs, so my initial idea was to support additional interface in the
client to publish Application specific configs. Correct me if i am wrong and
also inform whether its ok to handle configs in another jira.

So can we undo the code change related to config for this jira?

bq. May be i did not get this clearly, but AFAIK the packages and classes for
the Timeline events entities are different and the way we publish entities is
also different, so though the code looks duplicated i think nutting further to
be reduced/cleaned up here.

In general, I'm suggesting the code style in MAPREDUCE-6335, which seems to be
more clear. However, I'm okay to keep to the current code if it's complex to
refactor it.

bq. but you had commented to have single config
yarn.system-metrics-publisher.enabled and hence i had remodified.

Right, but it turns out to be that RM and NM seem not to be able to uniform the
configs (such as the new config added here) between them, and more than one
feature require somewhat version flag to differentiate the behavior. Never
mind. I'll take care of it separately.

[Event producers] Implement RM writing app lifecycle events to ATS
--

Per design in YARN-2928, implement RM writing app lifecycle events to ATS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540800#comment-14540800
 ] 

Wangda Tan commented on YARN-3557:
--

[~dian.fu], Thanks for sharing your idea about it, it is definitely an 
interesting idea, but I don't have immediate feeling about should we do it or 
not. We can continue discuss it along with design of YARN-3409.

 Support Intel Trusted Execution Technology(TXT) in YARN scheduler
 -

 Key: YARN-3557
 URL: https://issues.apache.org/jira/browse/YARN-3557
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Dian Fu
 Attachments: Support TXT in YARN high level design doc.pdf


 Intel TXT defines platform-level enhancements that provide the building 
 blocks for creating trusted platforms. A TXT aware YARN scheduler can 
 schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 
 provides the capacity to restrict YARN applications to run only on cluster 
 nodes that have a specified node label. This is a good mechanism that be 
 utilized for TXT aware YARN scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken


[ 
https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540693#comment-14540693
 ] 

Hadoop QA commented on YARN-3634:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 27s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 54s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 34s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 37s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   6m  2s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  44m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732348/YARN-3634-YARN-2928.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b3b791b |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7899/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7899/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7899/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7899/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7899/console |


This message was automatically generated.

 TestMRTimelineEventHandling and TestApplication are broken
 --

 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3634-YARN-2928.001.patch, 
 YARN-3634-YARN-2928.002.patch


 TestMRTimelineEventHandling is broken. Relevant error message:
 {noformat}
 2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12

[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-12 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540715#comment-14540715
 ] 

zhihai xu commented on YARN-3591:
-

Hi [~lavkesh], thanks for working on this issue. It looks like a good catch. 
The parent directory is generated by {{uniqueNumberGenerator}} for each 
LocalizedResource, so most likely fileList.length will be one.
Some comments about your patch:
{{getParentFile}} may return null, Should we check whether it is null to avoid 
NPE?
Can we add comments in the code about the change?

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling and TestApplication are broken


 [ 
https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3634:
--
Attachment: YARN-3634-YARN-2928.003.patch

Patch v3. posted

- fixed whitespace

 TestMRTimelineEventHandling and TestApplication are broken
 --

 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3634-YARN-2928.001.patch, 
 YARN-3634-YARN-2928.002.patch, YARN-3634-YARN-2928.003.patch


 TestMRTimelineEventHandling is broken. Relevant error message:
 {noformat}
 2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] 
 collector.NodeTimelineCollectorManager 
 (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with 
 NM Collector Service for application_1431412130291_0001
 2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] 
 containermanager.AuxServices 
 (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The 
 auxService name is timeline_collector and it got an error at event: 
 CONTAINER_INIT
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 
 to asf904.gq1.ygridcore.net:0 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at

[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests


[ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540633#comment-14540633
 ] 

Zhijie Shen commented on YARN-3529:
---

+1 for the last patch. Will commit it.

Noticed there's performance related properties used for test case. We should 
evaluate if they could help POC perf too. We can deal with it later.
{code}
120 props.put(QueryServices.QUEUE_SIZE_ATTRIB, Integer.toString(5000));
121 props.put(IndexWriterUtils.HTABLE_THREAD_KEY, Integer.toString(100));
122 // Make a small batch size to test multiple calls to reserve sequences
123 props.put(QueryServices.SEQUENCE_CACHE_SIZE_ATTRIB,
124 Long.toString(BATCH_SIZE));
{code}

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: AbstractMiniHBaseClusterTest.java, 
 YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, 
 YARN-3529-YARN-2928.002.patch, YARN-3529-YARN-2928.003.patch, 
 output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3636) Abstraction for LocalDirAllocator

2015-05-12 Thread Colin Patrick McCabe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe moved HADOOP-11905 to YARN-3636:
-

  Component/s: (was: fs)
Fix Version/s: (was: 2.7.1)
 Assignee: (was: Kannan Rajah)
Affects Version/s: (was: 2.5.2)
   2.5.2
   Issue Type: New Feature  (was: Bug)
  Key: YARN-3636  (was: HADOOP-11905)
  Project: Hadoop YARN  (was: Hadoop Common)

 Abstraction for LocalDirAllocator
 -

 Key: YARN-3636
 URL: https://issues.apache.org/jira/browse/YARN-3636
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.5.2
Reporter: Kannan Rajah
  Labels: BB2015-05-TBR
 Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch


 There are 2 abstractions used to write data to local disk.
 LocalDirAllocator: Allocate paths from a set of configured local directories.
 LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
 In the current implementation, local disk is managed by guest OS and not 
 HDFS. The proposal is to provide a new abstraction that encapsulates the 
 above 2 abstractions and hides who manages the local disks. This enables us 
 to provide an alternate implementation where a DFS can manage the local disks 
 and it can be accessed using HDFS APIs. This means the DFS maintains a 
 namespace for node local directories and can create paths that are guaranteed 
 to be present on a specific node.
 Here is an example use case for Shuffle: When a mapper writes intermediate 
 data using this new implementation, it will continue write to local disk. 
 When a reducer needs to access data from a remote node, it can use HDFS APIs 
 with a path that points to that node’s local namespace instead of having to 
 use HTTP server to transfer the data across nodes.
 New Abstractions
 1. LocalDiskPathAllocator
 Interface to get file/directory paths from the local disk namespace.
 This contains all the APIs that are currently supported by LocalDirAllocator. 
 So we just need to change LocalDirAllocator to implement this new interface.
 2. LocalDiskUtil
 Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
 that is used to manage those paths.
 By default, it will return LocalDirAllocator and LocalFileSystem.
 A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
 3. DFSLocalDirAllocator
 This is a generic implementation. An allocator is created for a specific 
 node. It uses Configuration object to get user configured base directory and 
 appends the node hostname to it. Hence the returned paths are within the node 
 local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state


[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540829#comment-14540829
 ] 

Hadoop QA commented on YARN-2421:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 46s | The applied patch generated  2 
new checkstyle issues (total was 30, now 31). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 14s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |  52m 11s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  88m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732346/YARN-2421.6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f24452d |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7900/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7900/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7900/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7900/console |


This message was automatically generated.

 CapacityScheduler still allocates containers to an app in the FINISHING state
 -

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 yarn2421.patch, yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2015-05-12 Thread Chris Trezzo (JIRA)

Chris Trezzo created YARN-3637:
--

 Summary: Handle localization sym-linking correctly at the YARN 
level
 Key: YARN-3637
 URL: https://issues.apache.org/jira/browse/YARN-3637
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo


The shared cache needs to handle resource sym-linking at the YARN layer. 
Currently, we let the application layer (i.e. mapreduce) handle this, but it is 
probably better for all applications if it is handled transparently.

Here is the scenario:
Imagine two separate jars (with unique checksums) that have the same name 
job.jar.

They are stored in the shared cache as two separate resources:
checksum1/job.jar
checksum2/job.jar

A new application tries to use both of these resources, but internally refers 
to them as different names:
foo.jar maps to checksum1
bar.jar maps to checksum2

When the shared cache returns the path to the resources, both resources are 
named the same (i.e. job.jar). Because of this, when the resources are 
localized one of them clobbers the other. This is because both symlinks in the 
container_id directory are the same name (i.e. job.jar) even though they point 
to two separate resource directories.

Originally we tackled this in the MapReduce client by using the fragment 
portion of the resource url. This, however, seems like something that should be 
solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests

2015-05-12 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540844#comment-14540844
 ] 

Li Lu commented on YARN-3529:
-

Thanks [~zjshen] for the review and commit! [~vrushalic] if you observe any 
problems with the new pom settings, please feel free to reopen it. Thanks! 

 Add miniHBase cluster and Phoenix support to ATS v2 unit tests
 --

 Key: YARN-3529
 URL: https://issues.apache.org/jira/browse/YARN-3529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: AbstractMiniHBaseClusterTest.java, 
 YARN-3529-YARN-2928.000.patch, YARN-3529-YARN-2928.001.patch, 
 YARN-3529-YARN-2928.002.patch, YARN-3529-YARN-2928.003.patch, 
 output_minicluster2.txt


 After we have our HBase and Phoenix writer implementations, we may want to 
 find a way to set up HBase and Phoenix in our unit tests. We need to do this 
 integration before the branch got merged back to trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state


 [ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-2421:
---
Attachment: YARN-2421.7.patch

 CapacityScheduler still allocates containers to an app in the FINISHING state
 -

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, yarn2421.patch, yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues


[ 
https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540328#comment-14540328
 ] 

Sangjin Lee commented on YARN-3634:
---

It is not so much caused by YARN-3562 as uncovered by it.

First, the NMCollectorService does not update the config upon binding to the 
port. Second, the NodeTimelineCollectorManager reads the NMCollectorService too 
early (serviceInit) so that it does not get the updated address.

The fix is to update the config when NMCollectorService binds to a port and 
initialize the address as late as possible in the NodeTimelineCollectorManager.

 TestMRTimelineEventHandling is broken due to timing issues
 --

 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 TestMRTimelineEventHandling is broken. Relevant error message:
 {noformat}
 2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] 
 collector.NodeTimelineCollectorManager 
 (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with 
 NM Collector Service for application_1431412130291_0001
 2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] 
 containermanager.AuxServices 
 (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The 
 auxService name is timeline_collector and it got an error at event: 
 CONTAINER_INIT
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException:

[jira] [Updated] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues


 [ 
https://issues.apache.org/jira/browse/YARN-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3634:
--
Attachment: YARN-3634-YARN-2928.001.patch

Patch v.1 posted.

 TestMRTimelineEventHandling is broken due to timing issues
 --

 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3634-YARN-2928.001.patch


 TestMRTimelineEventHandling is broken. Relevant error message:
 {noformat}
 2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client 
 (Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
 asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry 
 policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] 
 collector.NodeTimelineCollectorManager 
 (NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with 
 NM Collector Service for application_1431412130291_0001
 2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] 
 containermanager.AuxServices 
 (AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The 
 auxService name is timeline_collector and it got an error at event: 
 CONTAINER_INIT
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 
 to asf904.gq1.ygridcore.net:0 failed on connection exception: 
 java.net.ConnectException: Connection refused; For more details see:  
 http://wiki.apache.org/hadoop/ConnectionRefused
   at

[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540393#comment-14540393
 ] 

Karthik Kambatla commented on YARN-3635:


I am in favor of doing this, but would like to be extra careful. Can we make 
sure either [~sandyr] or I get to review this before it is committed. 

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan

 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541154#comment-14541154
 ] 

Karthik Kambatla commented on YARN-1197:


bq. We thought about launching the JVM based container with -Xmx set to the 
physical memory of the node, and use cgroup memory control to enforce the 
resource limit, but we don't think LCE supports memory isolation right now . We 
cannot use YARN's default memory enforcement as we don't want long running 
services to be killed.

A JVM with a larger value for Xmx will *likely* be less aggressive with GC. Any 
resultant increase in heap size might or might not be a good thing. If you 
think this is something viable that people care about, we could consider adding 
a memory-enforcement option to LCE. 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3627) Preemption not triggered in Fair scheduler when maxResources is set on parent queue

2015-05-12 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541201#comment-14541201
 ] 

Bibin A Chundatt commented on YARN-3627:


[~kasha] seems related to YARN-3405 .Will try the patch soon. Would be great if 
 YARN-3405 gets resolved.

 Preemption not triggered in Fair scheduler when maxResources is set on parent 
 queue
 ---

 Key: YARN-3627
 URL: https://issues.apache.org/jira/browse/YARN-3627
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, scheduler
 Environment: Suse 11 SP3, 2 NM 
Reporter: Bibin A Chundatt

 Consider the below scenario of fair configuration 
  
 Root (10Gb cluster resource)
 --Q1 (maxResources  4gb) 
 Q1.1 (maxResources 4gb) 
 Q1.2  (maxResources  4gb) 
 --Q2 (maxResources 6GB)
  
 No applications are running in Q2
  
 Submit one application with to Q1.1 with 50 maps   4Gb gets allocated to Q1.1
 Now submit application to  Q1.2 the same will be starving for memory always.
  
 Preemption will never get triggered since 
 yarn.scheduler.fair.preemption.cluster-utilization-threshold =.8 and the 
 cluster utilization is below .8.
  
 *Fairscheduler.java*
 {code}
   private boolean shouldAttemptPreemption() {
 if (preemptionEnabled) {
   return (preemptionUtilizationThreshold  Math.max(
   (float) rootMetrics.getAllocatedMB() / clusterResource.getMemory(),
   (float) rootMetrics.getAllocatedVirtualCores() /
   clusterResource.getVirtualCores()));
 }
 return false;
   }
 {code}
 Are we supposed to configure in running cluster maxResources  0mb and 0 
 cores  so that all queues can take full cluster resources always if 
 available??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3127) Apphistory url crashes when RM switches with ATS enabled

2015-05-12 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3127:
---
Priority: Critical  (was: Major)

 Apphistory url crashes when RM switches with ATS enabled
 

 Key: YARN-3127
 URL: https://issues.apache.org/jira/browse/YARN-3127
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.6.0
 Environment: RM HA with ATS
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: YARN-3127.20150213-1.patch, YARN-3127.20150329-1.patch


 1.Start RM with HA and ATS configured and run some yarn applications
 2.Once applications are finished sucessfully start timeline server
 3.Now failover HA form active to standby
 4.Access timeline server URL IP:PORT/applicationhistory
 Result: Application history URL fails with below info
 {quote}
 2015-02-03 20:28:09,511 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
 read the applications.
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:80)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   ...
 Caused by: 
 org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The 
 entity for application attempt appattempt_1422972608379_0001_01 doesn't 
 exist in the timeline store
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:151)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:499)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAllApplications(ApplicationHistoryManagerOnTimelineStore.java:108)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:84)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:81)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   ... 51 more
 2015-02-03 20:28:09,512 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /applicationhistory
 org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: 
 nestLevel=6 expected 5
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
 {quote}
 Behaviour with AHS with file based history store
   -Apphistory url is working 
   -No attempt entries are shown for each application.
   
 Based on inital analysis when RM switches ,application attempts from state 
 store  are not replayed but only applications are.
 So when /applicaitonhistory url is accessed it tries for all attempt id and 
 fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3628) ContainerMetrics should support always-flush mode.


[ 
https://issues.apache.org/jira/browse/YARN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541212#comment-14541212
 ] 

Hadoop QA commented on YARN-3628:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 37s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 23s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m  1s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  44m 49s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732440/YARN-3628.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f24452d |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7907/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7907/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7907/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7907/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7907/console |


This message was automatically generated.

 ContainerMetrics should support always-flush mode.
 --

 Key: YARN-3628
 URL: https://issues.apache.org/jira/browse/YARN-3628
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3628.000.patch


 ContainerMetrics should support always-flush mode.
 It will be good to set ContainerMetrics as always-flush mode if 
 yarn.nodemanager.container-metrics.period-ms is configured as 0.
 Currently both 0 and -1 mean flush on completion.
 Also the current default value for 
 yarn.nodemanager.container-metrics.period-ms is -1 and the default value  for 
 yarn.nodemanager.container-metrics.enable is true. So the empty content is 
 shown for the active container metrics until it is finished.
 The default value for yarn.nodemanager.container-metrics.period-ms should not 
 be -1.
 flushOnPeriod is always false if flushPeriodMs is -1,
 the content will only be shown when the container is finished.
 {code}
 if (finished || flushOnPeriod) {
   registry.snapshot(collector.addRecord(registry.info()), all);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3629) NodeID is always printed as null in node manager initialization log.