[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-16 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499002#comment-14499002
 ] 

Li Lu commented on YARN-3431:
-

Hi [~zjshen], thanks for the work! I reviewed your v3 patch. 

A general comment about the idea, if I understand correctly: Now we're 
requiring subclasses to decide a strategy to encapsulate extended information 
into fields of TimelineEntity. We will use TimelineEntity as standard object 
type for web services and storage. This introduces conversion problems for 
subclasses. We need to provide some logic to rebuild subclasses based on their 
entity types. In this patch the rebuild process is implemented as 
TimelineCollectorWebService.processTimelineEntities. 

I have some questions:
# My main problem is with the "{{prototype}}" field of TimelineEntity. Firstly, 
the name is a little bit awkward to me. It gives me an illusion that a class 
has a prototype of exactly the same type, which is a little bit weird to me. 
Secondly, since all extended information will be stored in TimelineEntity, the 
only thing different between subclass instances will be in their type fields. 
If so, do we still need to have a separate "prototype" section for web 
services? Thirdly, I searched the whole patch and seems like the only place to 
write to this prototype field is in the constructor of TimelineEntity, where it 
simply stores the incoming entity's prototype. I'm a little bit confused on 
this field overall. 
# For {{HierarchicalTimelineEntity}}, seems like we're not adding any special 
tags when we {{addIsRelatedToEntity()}} in {{setParent()}}. We're also 
requiring the keySet of isRelatedToEntities only have one key. Are we 
prohibiting the users from using {{isRelatedToEntities}} in 
{{HierarchicalTimelineEntity}} completely to avoid problems? 
# Now {{processTimelineEntities}} is called in {{TimelineCollectorWebService}}, 
in {{putEntities}}. From a storage layer perspective, I'm not sure if we really 
need the subclass information. We definitely need the logic of 
{{processTimelineEntities}} in the reader side, and maybe in our timeline 
collector implementation. 
# There are two ".*" imports in this patch, one in 
TestTimelineServiceClientIntegration and the other in 
TimelineCollectorWebService. Maybe we'd like to list them explicitly? 

> Sub resources of timeline entity needs to be passed to a separate endpoint.
> ---
>
> Key: YARN-3431
> URL: https://issues.apache.org/jira/browse/YARN-3431
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch
>
>
> We have TimelineEntity and some other entities as subclass that inherit from 
> it. However, we only have a single endpoint, which consume TimelineEntity 
> rather than sub-classes and this endpoint will check the incoming request 
> body contains exactly TimelineEntity object. However, the json data which is 
> serialized from sub-class object seems not to be treated as an TimelineEntity 
> object, and won't be deserialized into the corresponding sub-class object 
> which cause deserialization failure as some discussions in YARN-3334 : 
> https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3500) Optimize ResourceManager Web loading speed

2015-04-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499183#comment-14499183
 ] 

Naganarasimha G R commented on YARN-3500:
-

I think by mistake submitter raised the same issue twice this & yarn-3499

> Optimize ResourceManager Web loading speed
> --
>
> Key: YARN-3500
> URL: https://issues.apache.org/jira/browse/YARN-3500
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Peter Shi
>Priority: Minor
>
> after running 10k jobs, resoucemanager webui load speed become slow. As 
> server side send 10k jobs information in one response, parsing and rendering 
> page will cost a long time. Current paging logic is done in browser side. 
> This issue makes server side to do the paging logic, so that the loading will 
> be fast.
> Loading 10k jobs costs 55 sec. loading 2k costs 7 sec



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499160#comment-14499160
 ] 

Hudson commented on YARN-3021:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7602 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7602/])
YARN-3021. YARN's delegation-token handling disallows certain trust setups to 
operate properly over DistCp. Contributed by Yongjun Zhang (jianhe: rev 
bb6dde68f19be1885a9e7f7949316a03825b6f3e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java


> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Fix For: 2.8.0
>
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
> YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498992#comment-14498992
 ] 

Wangda Tan commented on YARN-2003:
--

Some comments:
1) Don't need to change Leaf/ParentQueue, change AbstractCSQueue should be 
enough. As same as Fair queue
2) How about adding an update-application-priority API to scheduler?
3) Patch need rebase

> Support to process Job priority from Submission Context in 
> AppAttemptAddedSchedulerEvent [RM side]
> --
>
> Key: YARN-2003
> URL: https://issues.apache.org/jira/browse/YARN-2003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 
> 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 
> 0006-YARN-2003.patch
>
>
> AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
> Submission Context and store.
> Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499058#comment-14499058
 ] 

Hadoop QA commented on YARN-2605:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726001/YARN-2605.1.patch
  against trunk revision 4308910.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7367//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7367//console

This message is automatically generated.

> [RM HA] Rest api endpoints doing redirect incorrectly
> -
>
> Key: YARN-2605
> URL: https://issues.apache.org/jira/browse/YARN-2605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: bc Wong
>Assignee: Xuan Gong
>  Labels: newbie
> Attachments: YARN-2605.1.patch
>
>
> The standby RM's webui tries to do a redirect via meta-refresh. That is fine 
> for pages designed to be viewed by web browsers. But the API endpoints 
> shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd 
> suggest HTTP 303, or return a well-defined error message (json or xml) 
> stating that the standby status and a link to the active RM.
> The standby RM is returning this today:
> {noformat}
> $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Thu, 25 Sep 2014 18:34:53 GMT
> Date: Thu, 25 Sep 2014 18:34:53 GMT
> Pragma: no-cache
> Expires: Thu, 25 Sep 2014 18:34:53 GMT
> Date: Thu, 25 Sep 2014 18:34:53 GMT
> Pragma: no-cache
> Content-Type: text/plain; charset=UTF-8
> Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
> Content-Length: 117
> Server: Jetty(6.1.26)
> This is standby RM. Redirecting to the current active RM: 
> http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499101#comment-14499101
 ] 

Xuan Gong commented on YARN-1402:
-

bq. Also, there is one important thing we need to think seriously here: do we 
need to cleanup nodes' log aggregation reports in RMApps? I am afraid we have 
to do it because RMApp will always stay in RMContext. Also, each RMApp could 
contains thousands of node reports in large cluster which could occupy so much 
memory of Resource Manager. Thoughts?

This is a good point. The reports also need to be used in generating the 
RMAppLogAggregationWebUI. So, we can not simply delete them. 

updated a new patch and addressed all other comments

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
> YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch, YARN-1402.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499171#comment-14499171
 ] 

Hadoop QA commented on YARN-1402:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726037/YARN-1402.4.patch
  against trunk revision 4308910.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7370//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7370//console

This message is automatically generated.

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
> YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch, YARN-1402.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499142#comment-14499142
 ] 

Jian He commented on YARN-3493:
---

bq. This causes throw InvalidResourceRequestException to AM which AM do not 
expect it.
[~rohithsharma], thanks for your review !
 I agree. I think this can also happen in general if the men settings are 
changed while AM is running. 

Upload a new patch to fix the test failures. The failure is due to the the node 
label expression in the AM resource request is not normalized on recovery which 
causes NPE.

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
> yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.re

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498928#comment-14498928
 ] 

Zhijie Shen commented on YARN-3051:
---

Thanks for summarize the proposal, Sangjin! Just think it out loudly. My major 
concern about this proposal is compatibility. Previously in v1, timeline entity 
is globally unique, such that when fetching a single entity before, users only 
need to provide .  is 
required to locate one entity, and theoretically   will refer to multiple entities. It probably makes difficult to be 
compatible to existing use cases.

/cc [~hitesh]



> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-16 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499137#comment-14499137
 ] 

Inigo Goiri commented on YARN-3481:
---

I could cover the first one (creating a ResourceUtilization) in this JIRA. It 
replicates some of the code of Resource but I guess is fine.

For the second one, I think we should tackle this in a different JIRA as it has 
a deeper impact in other parts of the code. To minimize this impact, we could 
keep the current interfaces (with ints), change the stored type from int to 
double, and add the interfaces with doubles. Obviously, this is not superclean 
but it minimizes possible problems.

I'm open to implement either one.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499099#comment-14499099
 ] 

Hadoop QA commented on YARN-2740:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12726016/YARN-2740.20150417-1.patch
  against trunk revision 4308910.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7369//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7369//console

This message is automatically generated.

> ResourceManager side should properly handle node label modifications when 
> distributed node label configuration enabled
> --
>
> Key: YARN-2740
> URL: https://issues.apache.org/jira/browse/YARN-2740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
> YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
> YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
> YARN-2740.20150417-1.patch
>
>
> According to YARN-2495, when distributed node label configuration is enabled:
> - RMAdmin / REST API should reject change labels on node operations.
> - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
> heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499028#comment-14499028
 ] 

Wangda Tan commented on YARN-2004:
--

Some comments:
1) I noticed default of default-priority is -1, do you think we should limit 
priority >= 0? With existing interface in queue, we don't limit the "lowest" 
priority, so maybe we should limit it ourselves.
2) Beyond priority settings on queue, do you think we should have per-user 
priority setting? If we don't limit user's priority, we will end up with all 
users asking for max-priority in the queue. And also user's default could be 
different, CEO's "default" may be max-priority. But this needs input of real 
world use cases. ([~jlowe], thoughts?)
3) null check in app priority comparator still exist, did you mention to remove 
it?
bq.  i can remove NULL check. Will only have a direct compareTo check for 
priority.

> Priority scheduling support in Capacity scheduler
> -
>
> Key: YARN-2004
> URL: https://issues.apache.org/jira/browse/YARN-2004
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch, 
> 0003-YARN-2004.patch, 0004-YARN-2004.patch, 0005-YARN-2004.patch
>
>
> Based on the priority of the application, Capacity Scheduler should be able 
> to give preference to application while doing scheduling.
> Comparator applicationComparator can be changed as below.   
> 
> 1.Check for Application priority. If priority is available, then return 
> the highest priority job.
> 2.Otherwise continue with existing logic such as App ID comparison and 
> then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2003) Support to process Job priority from Submission Context in AppAttemptAddedSchedulerEvent [RM side]

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499009#comment-14499009
 ] 

Wangda Tan commented on YARN-2003:
--

Another comment is about authenticateApplicationPriority, I'm wondering if we 
really need it.

Some other fields like ACL, queue state, etc. are checked in 
LeafQueue.submitApplication. So I think queue app priority setting may be 
better handled within queue.

Thoughts?

> Support to process Job priority from Submission Context in 
> AppAttemptAddedSchedulerEvent [RM side]
> --
>
> Key: YARN-2003
> URL: https://issues.apache.org/jira/browse/YARN-2003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2003.patch, 0002-YARN-2003.patch, 
> 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 
> 0006-YARN-2003.patch
>
>
> AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from 
> Submission Context and store.
> Later this can be used by Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499037#comment-14499037
 ] 

Carlo Curino commented on YARN-3481:


Creating a new ResourceUtilization class, I think would be particularly 
relevant if we start tracking more resources that YARN enforces. I.e., if YARN 
only enforces  and we care to monitor: disk queues, disk bandwidth for 
writs/reads, disk IOPS, network bandwidth, CPU IO-wait-time, etc.. etc.. than a 
new object is probably a good way to go. 

If the set of resources we will monitor and enforce is the same, I would vote 
for evolving Resource to express everything as doubles (also RAM). I stumble on 
limitations of Resources in the context of the "reservation" work, where I was 
tracking cpu-over-time and running out of range of Integer (e.g., counting 
memory over time for large reservations). This would allow us to simplify that 
code too (removing local classes used only to handle integral resources).

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM

2015-04-16 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499039#comment-14499039
 ] 

Li Lu commented on YARN-3390:
-

I skimmed through both patches. The two patches got significant overlap. 
Personally I'd incline to focus on YARN-3437 for now since it's critical to 
writer performance benchmark, which will further block the writer 
implementations. Writer implementation is on our critical path now to deliver 
an end-to-end preview of timeline v2. Therefore I'd prefer to give 3437 higher 
priority for now

For the code, are there any special reasons why ConcurrentHashMap's 
fine-grained locking is not sufficient for collector manager? I think it may 
always be attractive to use finer-grained locking if we do not need strong 
consistency semantics, like snapshot or iterations. 

> Reuse TimelineCollectorManager for RM
> -
>
> Key: YARN-3390
> URL: https://issues.apache.org/jira/browse/YARN-3390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3390.1.patch
>
>
> RMTimelineCollector should have the context info of each app whose entity  
> has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499000#comment-14499000
 ] 

Junping Du commented on YARN-1402:
--

Thanks [~xgong] for updating the patch!
bq. No need to do it. We only call getLogAggregationStatusForAppReport() if 
needed.
Sync offline with Xuan that if we update LogAggregationStatus of an app in 
every node's LA report, that would take race condition for status update of 
different node report so additional lock could be needed which is completely 
unnecessary. So I am good with current way. 

Some other minor comments for updated patch:
{code}
+public enum LogAggregationStatus {
+  DISABLED,
+  NOT_START,
+  RUNNING,
+  SUCCEEDED,
+  FAILED,
+  TIME_OUT
+}
{code}
We should document semantics for LogAggregationStatus somewhere, at least for 
FAILED and TIME_OUT, or other developers could get confused here.

{code}
+  private static final String LOGAGGREGATION_STATUS_PREFIX = "LOG_";
{code}
LOGAGGREGATION_STATUS_PREFIX => LOG_AGGREGATION_STATUS_PREFIX?

{code}
this.logAggregationStatusForAppReport== LogAggregationStatus.FAILED
{code}
add a space between logAggregationStatusForAppReport and ==

For getLogAggregationStatusForAppReport(), we should add some comments that why 
we didn't failed early (if app hasn't been completed) for the reason we 
discussed offline.

{code}
+  case NOT_START:
+logNotStartCount ++;
...
+logCompletedCount ++;
{code}
Omit the space between logNotStartCount and ++ (and for some other places) to 
conform general code conversion.

Also, there is one important thing we need to think seriously here: do we need 
to cleanup nodes' log aggregation reports in RMApps? I am afraid we have to do 
it because RMApp will always stay in RMContext. Also, each RMApp could contains 
thousands of node reports in large cluster which could occupy so much memory of 
Resource Manager. Thoughts?

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
> YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-16 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498951#comment-14498951
 ] 

Rohith commented on YARN-3476:
--

Thanks [~sunilg] for sharing your thoughts.
Going for retention logic or time, thinking about NM recovery that retention 
logic should be stored in state store.  Then NM should support for state store 
update in AggregatddLogService similar to NonAggregatedLogHandler

[~jlowe] I attached patch with straightforward fix that handling exception and 
do post aggregation clean up. Kindly share your opinion on 2 approaches i.e 1. 
handling exception and 2. retention logic

> Nodemanager can fail to delete local logs if log aggregation fails
> --
>
> Key: YARN-3476
> URL: https://issues.apache.org/jira/browse/YARN-3476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Rohith
> Attachments: 0001-YARN-3476.patch
>
>
> If log aggregation encounters an error trying to upload the file then the 
> underlying TFile can throw an illegalstateexception which will bubble up 
> through the top of the thread and prevent the application logs from being 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499200#comment-14499200
 ] 

Hadoop QA commented on YARN-3493:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726049/YARN-3493.3.patch
  against trunk revision 9595cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7371//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7371//console

This message is automatically generated.

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-3493.1.patch, YARN-3493.2.patch, YARN-3493.3.patch, 
> yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManag

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498986#comment-14498986
 ] 

Wangda Tan commented on YARN-1963:
--

Agree that we need to make basic functionality works, I suggest to keep the 
simple tight-packed alias for now, we should try to get both in, but 
label-based which shouldn't block int-based priority development.

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
> Attachments: 0001-YARN-1963-prototype.patch, YARN Application 
> Priorities Design.pdf, YARN Application Priorities Design_01.pdf
>
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3500) Optimize ResourceManager Web loading speed

2015-04-16 Thread Peter Shi (JIRA)
Peter Shi created YARN-3500:
---

 Summary: Optimize ResourceManager Web loading speed
 Key: YARN-3500
 URL: https://issues.apache.org/jira/browse/YARN-3500
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Peter Shi
Priority: Minor


after running 10k jobs, resoucemanager webui load speed become slow. As server 
side send 10k jobs information in one response, parsing and rendering page will 
cost a long time. Current paging logic is done in browser side. 
This issue makes server side to do the paging logic, so that the loading will 
be fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499033#comment-14499033
 ] 

Wangda Tan commented on YARN-2740:
--

Thanks for your quick update [~Naganarasimha].

bq. Actually dint get completely your opinion about throw exception when 
distributedConfiguration enabled for removeClusterNodeLabels; Did you want to 
throw ? 
I think it's better now prevent admin remove clusterNodeLabel when distributed 
enabled.

bq. if you want to allow admin to remove ClusterNodeLabels then there is one 
case which i can see potential problem: Assume NM informs RM with a valid node 
label "x" through HB/Resgistration and then admin removes x from cluster node 
labels. But this is not communicated back to NM and NM will not send labels as 
part of HB unless there is change in labels in NM side. So NM is not aware of 
"x" being removed at all. I agree we need to allow Admin to control valid 
partitions but in that case we need to add some logic in RM to request NM to 
resubmit labels. Please provide ur views. 
I think it's not a big problem, NM doesn't need to know "x" being removed, the 
logic should be, NM reports label, and RM allocate according to label, NM 
should just move on if adding label failed as what we done in YARN-2495. My 
opinion here is not add extra RM->NM communicate.

I will review patch again when we decided above one.



> ResourceManager side should properly handle node label modifications when 
> distributed node label configuration enabled
> --
>
> Key: YARN-2740
> URL: https://issues.apache.org/jira/browse/YARN-2740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
> YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
> YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
> YARN-2740.20150417-1.patch
>
>
> According to YARN-2495, when distributed node label configuration is enabled:
> - RMAdmin / REST API should reject change labels on node operations.
> - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
> heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498946#comment-14498946
 ] 

Hadoop QA commented on YARN-1402:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725986/YARN-1402.3.2.patch
  against trunk revision 75bbcc8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7366//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7366//console

This message is automatically generated.

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
> YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1402:

Attachment: YARN-1402.4.patch

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
> YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch, YARN-1402.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly

2015-04-16 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-2605:
---

Assignee: Xuan Gong  (was: Anubhav Dhoot)

> [RM HA] Rest api endpoints doing redirect incorrectly
> -
>
> Key: YARN-2605
> URL: https://issues.apache.org/jira/browse/YARN-2605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: bc Wong
>Assignee: Xuan Gong
>  Labels: newbie
> Attachments: YARN-2605.1.patch
>
>
> The standby RM's webui tries to do a redirect via meta-refresh. That is fine 
> for pages designed to be viewed by web browsers. But the API endpoints 
> shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd 
> suggest HTTP 303, or return a well-defined error message (json or xml) 
> stating that the standby status and a link to the active RM.
> The standby RM is returning this today:
> {noformat}
> $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Thu, 25 Sep 2014 18:34:53 GMT
> Date: Thu, 25 Sep 2014 18:34:53 GMT
> Pragma: no-cache
> Expires: Thu, 25 Sep 2014 18:34:53 GMT
> Date: Thu, 25 Sep 2014 18:34:53 GMT
> Pragma: no-cache
> Content-Type: text/plain; charset=UTF-8
> Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
> Content-Length: 117
> Server: Jetty(6.1.26)
> This is standby RM. Redirecting to the current active RM: 
> http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499197#comment-14499197
 ] 

Xuan Gong commented on YARN-1402:
-

testcase failure is not related

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
> YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch, YARN-1402.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS

2015-04-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499087#comment-14499087
 ] 

Robert Kanter commented on YARN-3046:
-

LGTM +1
But let's wait on any additional comments from [~zjshen]

One minor thing: Is there a JIRA for this TODO?
{noformat}
// TODO remove threadPool after adding non-blocking call in TimelineClient
{noformat}

> [Event producers] Implement MapReduce AM writing some MR metrics to ATS
> ---
>
> Key: YARN-3046
> URL: https://issues.apache.org/jira/browse/YARN-3046
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch, 
> YARN-3046-v1-rebase.patch, YARN-3046-v1.patch, YARN-3046-v2.patch
>
>
> Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes 
> written) and have the MR AM write the framework-specific metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3500) Optimize ResourceManager Web loading speed

2015-04-16 Thread Peter Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Shi updated YARN-3500:

Description: 
after running 10k jobs, resoucemanager webui load speed become slow. As server 
side send 10k jobs information in one response, parsing and rendering page will 
cost a long time. Current paging logic is done in browser side. 
This issue makes server side to do the paging logic, so that the loading will 
be fast.

Loading 10k jobs costs 55 sec. loading 2k costs 7 sec

  was:
after running 10k jobs, resoucemanager webui load speed become slow. As server 
side send 10k jobs information in one response, parsing and rendering page will 
cost a long time. Current paging logic is done in browser side. 
This issue makes server side to do the paging logic, so that the loading will 
be fast.


> Optimize ResourceManager Web loading speed
> --
>
> Key: YARN-3500
> URL: https://issues.apache.org/jira/browse/YARN-3500
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Peter Shi
>Priority: Minor
>
> after running 10k jobs, resoucemanager webui load speed become slow. As 
> server side send 10k jobs information in one response, parsing and rendering 
> page will cost a long time. Current paging logic is done in browser side. 
> This issue makes server side to do the paging logic, so that the loading will 
> be fast.
> Loading 10k jobs costs 55 sec. loading 2k costs 7 sec



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499045#comment-14499045
 ] 

Wangda Tan commented on YARN-3410:
--

Patch generally looks good, but I suggest to improve the parameter checking a 
little:

When args.length >=1, if args[0] == remove-app, check if args.length == 2, user 
can get more precise error about args not matching.

> YARN admin should be able to remove individual application records from 
> RMStateStore
> 
>
> Key: YARN-3410
> URL: https://issues.apache.org/jira/browse/YARN-3410
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, yarn
>Reporter: Wangda Tan
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3410-v1.patch, 0001-YARN-3410.patch, 
> 0001-YARN-3410.patch, 0002-YARN-3410.patch
>
>
> When RM state store entered an unexpected state, one example is YARN-2340, 
> when an attempt is not in final state but app already completed, RM can never 
> get up unless format RMStateStore.
> I think we should support remove individual application records from 
> RMStateStore to unblock RM admin make choice of either waiting for a fix or 
> format state store.
> In addition, RM should be able to report all fatal errors (which will 
> shutdown RM) when doing app recovery, this can save admin some time to remove 
> apps in bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2696:
-
Attachment: YARN-2696.4.patch

bq. could we add a validation for clusterResource/queueCapacity updates when 
container is released
Addressed (ver.4)


> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch, 
> YARN-2696.4.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-04-16 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2740:

Attachment: YARN-2740.20150417-1.patch

updated the patch [~wangda]'s review comments

> ResourceManager side should properly handle node label modifications when 
> distributed node label configuration enabled
> --
>
> Key: YARN-2740
> URL: https://issues.apache.org/jira/browse/YARN-2740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
> YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
> YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch, 
> YARN-2740.20150417-1.patch
>
>
> According to YARN-2495, when distributed node label configuration is enabled:
> - RMAdmin / REST API should reject change labels on node operations.
> - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
> heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-16 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3493:
--
Attachment: YARN-3493.3.patch

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-3493.1.patch, YARN-3493.2.patch, YARN-3493.3.patch, 
> yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.res

[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499069#comment-14499069
 ] 

Hadoop QA commented on YARN-2696:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726010/YARN-2696.4.patch
  against trunk revision 4308910.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7368//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7368//console

This message is automatically generated.

> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch, 
> YARN-2696.4.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3487) CapacityScheduler scheduler lock obtained unnecessarily

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499038#comment-14499038
 ] 

Wangda Tan commented on YARN-3487:
--

[~sunilg], Sorry I didn't get what you meant, could you elaborate it?

> CapacityScheduler scheduler lock obtained unnecessarily
> ---
>
> Key: YARN-3487
> URL: https://issues.apache.org/jira/browse/YARN-3487
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-3487.001.patch, YARN-3487.002.patch
>
>
> Recently saw a significant slowdown of applications on a large cluster, and 
> we noticed there were a large number of blocked threads on the RM.  Most of 
> the blocked threads were waiting for the CapacityScheduler lock while calling 
> getQueueInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly

2015-04-16 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2605:

Attachment: YARN-2605.1.patch

> [RM HA] Rest api endpoints doing redirect incorrectly
> -
>
> Key: YARN-2605
> URL: https://issues.apache.org/jira/browse/YARN-2605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: bc Wong
>Assignee: Anubhav Dhoot
>  Labels: newbie
> Attachments: YARN-2605.1.patch
>
>
> The standby RM's webui tries to do a redirect via meta-refresh. That is fine 
> for pages designed to be viewed by web browsers. But the API endpoints 
> shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd 
> suggest HTTP 303, or return a well-defined error message (json or xml) 
> stating that the standby status and a link to the active RM.
> The standby RM is returning this today:
> {noformat}
> $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Thu, 25 Sep 2014 18:34:53 GMT
> Date: Thu, 25 Sep 2014 18:34:53 GMT
> Pragma: no-cache
> Expires: Thu, 25 Sep 2014 18:34:53 GMT
> Date: Thu, 25 Sep 2014 18:34:53 GMT
> Pragma: no-cache
> Content-Type: text/plain; charset=UTF-8
> Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
> Content-Length: 117
> Server: Jetty(6.1.26)
> This is standby RM. Redirecting to the current active RM: 
> http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498924#comment-14498924
 ] 

Junping Du commented on YARN-3134:
--

Thanks [~gtCarrera9] for delivering a patch here! 
Just start to look at the patch. Some initiative comments so far:

{code}
+  String sql = "CREATE TABLE IF NOT EXISTS " + ENTITY_TABLE_NAME
+  + "(cluster VARCHAR NOT NULL, user VARCHAR NOT NULL, "
+  + "flow VARCHAR NOT NULL, run UNSIGNED_LONG NOT NULL, "
+  + "appid VARCHAR NOT NULL, type VARCHAR NOT NULL, "
+  + "entityid VARCHAR NOT NULL, "
+  + "creationtime UNSIGNED_LONG, modifiedtime UNSIGNED_LONG, "
...
+  stmt.executeUpdate(sql);
+  stmt.close();
+  conn.commit();
{code}
Putting raw SQL sentences in this way sounds a little headache to me as this 
means difficult to debug/maintain in future. Given we could have more tables in 
pipeline, we may want to refactor this in some way to be more maintainable? 
BTW, I don't think HBase support any atomic operation across multiple tables. 
Here we create 3 tables but only one commit which means if 2nd table created 
failed, 1st table should still be created and commit successfully and won't be 
rollback. These partial success after commit doesn't sounds a good practice to 
me.
Additional problem is we didn't close connection here but we need to.

{code}
+  private class TimelineEntityCtxt {
{code}
TimelineEntityCtxt => TimelineEntityContext, better not omit full name (except 
very obviously, like conf for configuration) in method name. It looks like 
exactly the same with TimelineCollectorContext.java. Can we reuse that class 
instead of creating a new duplicated one?

{code}
+  private  int setStringsForCf(
{code}
What Cf means? Just like I mentioned above, don't omit the character of a word 
in a method which break code's readability.

{code}
+  private int setStringsForPk(PreparedStatement ps, String clusterId, String 
userId,
{code}
setStringsForPk => setStringsForPrimaryKeys

{code}
+  ResultSet executeQuery(String sql) {
+ResultSet rs = null;
+try {
+  Statement stmt = conn.createStatement();
+  rs = stmt.executeQuery(sql);
+} catch (SQLException se) {
+  LOG.error("SQL exception! " + se.getLocalizedMessage());
+}
+return rs;
+  }
{code}
Does getLocalizedMessage contains enough info (at least the SQL sentences 
executed)? If not, I would prefer we add raw SQL sentences in error message 
when Exception get throw. 

{code}
+// Execute and close
+psConfigInfo.execute();
+psConfigInfo.close();
{code}
Many places like here that we are forgetting to put closable resources to 
finally block. We should close it even exception get throw. 

More comments could comes later.

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Li Lu
> Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3482) Report NM available resources in heartbeat

2015-04-16 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498926#comment-14498926
 ] 

Inigo Goiri commented on YARN-3482:
---

After some testing, I figured that it might be better to report the resources 
utilized by the machine.
With this and the containers utilization, we can estimate the utilization of 
external processes.
In this way, there's no need for an external interface and the scheduler could 
take the right decisions using the node utilization.
Thoughts?

> Report NM available resources in heartbeat
> --
>
> Key: YARN-3482
> URL: https://issues.apache.org/jira/browse/YARN-3482
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> NMs are usually collocated with other processes like HDFS, Impala or HBase. 
> To manage this scenario correctly, YARN should be aware of the actual 
> available resources. The proposal is to have an interface to dynamically 
> change the available resources and report this to the RM in every heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498773#comment-14498773
 ] 

Yongjun Zhang commented on YARN-3021:
-

Thanks a lot [~jianhe]!


> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
> YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1402:

Attachment: YARN-1402.3.2.patch

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
> YARN-1402.3.1.patch, YARN-1402.3.2.patch, YARN-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3481) Report NM aggregated container resource utilization in heartbeat

2015-04-16 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498760#comment-14498760
 ] 

Inigo Goiri commented on YARN-3481:
---

I started implementing this and I hit a philosophical issue: how to report the 
utilization?

Initially, I thought about using the Resource class (with the ResourceProto for 
transfer) but it has an issue with accuracy. If we have one container using 10% 
of a CPU, Resource only allows to specify 0 or 1 for the VCores.

There are multiple possible approaches to solve this issue:
*Modify Resource to define VCores as a double. The problem with this is that 
we'd need to change many interfaces.
*Modify Resource to store milliVCores internally and keep all the public 
interfaces with VCores.
*Create a new type called ResourceUtilization that would have a float instead 
of an int. We would use this new type to send utilization data. This new class 
would also be suitable to send other utilizations like disk queue length, etc.
*Keep using Resource as is but when working with utilization, put milliVCores 
there. In this case, we would have a weird semantics for Resource where 
sometimes we send milliVCores and other times we send VCores.
*Define 1 VCore as 0.001 CPUs in the cluster. The problem with this is that 
applications would have to change how many VCores they ask for.

Note that YARN-3122 is storing a metric called milliVCores for this.

I would like to see what people thinks is the best option. Ideas?

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-3481
> URL: https://issues.apache.org/jira/browse/YARN-3481
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Inigo Goiri
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> To allow the RM take better scheduling decisions, it should be aware of the 
> actual utilization of the containers. The NM would aggregate the 
> ContainerMetrics and report it in every heartbeat.
> Related to YARN-1012 but aggregated to reduce the heartbeat overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1402:

Attachment: YARN-1402.3.1.patch

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, 
> YARN-1402.3.1.patch, YARN-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498754#comment-14498754
 ] 

Hadoop QA commented on YARN-3493:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725743/YARN-3493.2.patch
  against trunk revision 80a2a12.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7365//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7365//console

This message is automatically generated.

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
> yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)

[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498752#comment-14498752
 ] 

Xuan Gong commented on YARN-1402:
-

bq. if only one container (other containers haven't report for some reasons, 
e.g. pending on scheduled by RM) has a report to RM which completed, we 
shouldn't mark log aggregation as completed for this app. We should check app's 
status also?

Good point. We should check whether the app is at the final state or not, too

bq. Shall we return app's LA status still as running even some node LA (log 
aggregation) report already get failed. I don't think app's LA status can 
finish successfully later in this case. So may be return failed earlier could 
be a good choice?

Change LogAggregationStatus.FINISHED to LogAggregationStatus.SUCCEEDED.
For my understand, FAILED and SUCCEEDED are the final state. It means that the 
log aggregation is stopped. Based on whether the aggregated logs are completed 
or in-completed, we will give either FAILED or SUCCEEDED.

bq. I think a better way to generate app's log aggregation status is not when 
do get() but when each node's LA report do update (in method of 
aggregateLogReport()). What do you think?

No need to do it. We only call getLogAggregationStatusForAppReport() if needed.


> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, YARN-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-16 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1402:

Attachment: YARN-1402.3.patch

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch, YARN-1402.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498692#comment-14498692
 ] 

Hadoop QA commented on YARN-3366:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725923/YARN-3366.006.patch
  against trunk revision 80a2a12.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7364//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7364//console

This message is automatically generated.

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
> YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
> YARN-3366.006.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498674#comment-14498674
 ] 

Jian He commented on YARN-3021:
---

[~yzhangal], I think the failure is not related. 
Patch looks good ,  +1. 
I'll commit this today if no comments from others.

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
> YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics

2015-04-16 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3494:
--
Issue Type: Improvement  (was: Bug)

> Expose AM resource limit and user limit in QueueMetrics 
> 
>
> Key: YARN-3494
> URL: https://issues.apache.org/jira/browse/YARN-3494
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
>
> Now we have the AM resource limit and user limit shown on the web UI, it 
> would be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics

2015-04-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498662#comment-14498662
 ] 

Jian He commented on YARN-3494:
---

I think AM resource usage and limit can be in the QueueMetrics, since FS also 
has the AM resource limit notion.
We can put the user usage and limit in CSQueueMetrics, similarly FS has the 
FSQueueMetrics class for their FS queue metrics specific bits.

> Expose AM resource limit and user limit in QueueMetrics 
> 
>
> Key: YARN-3494
> URL: https://issues.apache.org/jira/browse/YARN-3494
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Rohith
>
> Now we have the AM resource limit and user limit shown on the web UI, it 
> would be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498661#comment-14498661
 ] 

Hadoop QA commented on YARN-3463:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725950/YARN-3463.67.patch
  against trunk revision 80a2a12.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1154 javac 
compiler warnings (more than the trunk's current 1153 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7363//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7363//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7363//console

This message is automatically generated.

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498603#comment-14498603
 ] 

Wangda Tan commented on YARN-3463:
--

bq. No, applications still start as they did before, no reason to change it
May not, for example, when priority comparator enabled, how to activate apps 
could be determined by priority of the app instead of submission order. But 
it's a future requirement, we can postpone the changes to when they really 
needed.

bq. No, this comes from the interface definition and it's needed to enable 
another scheduler, say FS, be able to use the same code with their derived 
application types of choice
I don't think FS will use the CapacitySchedulerConfiguration, it's not a big 
deal, let's keep it as-is.

bq. So this configuration is not for defining the policy, that is 
"ORDERING_POLICY" (which is where you would have "fair", "fifo", etc), this is 
for configuration elements which may modify the behavior of the policy, such as 
sizeBasedWeight, & it's needed for that purpose.
Hmm.. That could be a problem after take a look at YARN-3319. IMO, Configure 
OrderingPolicy should follow the style of how we configure other modules, which 
is using capacity-scheduler.xml in CS and using fair-scheduler.xml in FS. For 
individual options such as sizeBasedWeight, it's better to make it to be a 
separated option instead of putting all of them together. And the 
configure(String config) for OrderingPolicy could be problematic, I missed this 
point while reviewing YARN-3318, how about change it to Map to 
explicitly pass option_key=value pairs to configure OrderingPolicy?

bq. I'm actually not getting any there, I think because this is used for mocking
I can see while using eclipse, you can suppress them to avoid javac warning.

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-16 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498601#comment-14498601
 ] 

Sidharta Seethana commented on YARN-3366:
-

All the -1s are not valid for this patch. It appears that build was triggered 
incorrectly. This patch has RM changes at all.

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
> YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
> YARN-3366.006.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498565#comment-14498565
 ] 

Yongjun Zhang commented on YARN-3021:
-

I don't see 
{code}
java.lang.AssertionError: AppAttempt state is not correct (timedout) 
expected: but was:
at org.junit.Assert.fail(Assert.java:88)
{code}
reported in YARN-2483 here, so the failure here may be for a different reason.

The same patch finished successfully in previous jenkins run, which indicates 
some flakiness of the failed test. Will throw another jenkins run.


> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
> YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-04-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498528#comment-14498528
 ] 

Naganarasimha G R commented on YARN-2740:
-

Thanks for the review [~wangda],
bq. Should we throw exception when distributedConfiguration enabled for 
removeClusterNodeLabels? remove will change labels on node, after removed, node 
heartbeat with the removed partition will be identified as error, it seems 
reasonable to me. Admin should be able to control "valid-partitions" in the 
cluster.
Actually dint get completely your opinion about throw exception when 
distributedConfiguration enabled for removeClusterNodeLabels;  Did you want to 
throw ? if you want to allow admin to remove ClusterNodeLabels  then there is 
one case which i can see potential problem:  Assume NM informs RM with a valid 
node label "x" through HB/Resgistration and then admin removes x from cluster 
node labels. But this is not communicated back to NM and NM will not send 
labels as part of HB *unless there is change in labels in NM side*. So NM is 
not aware of "x" being removed at all. I agree we need to allow Admin to 
control valid partitions but in that case we need to add some logic in RM to 
request NM to resubmit labels.  Please provide ur views. 
Will correct other issues as part of next patch.

> ResourceManager side should properly handle node label modifications when 
> distributed node label configuration enabled
> --
>
> Key: YARN-2740
> URL: https://issues.apache.org/jira/browse/YARN-2740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
> YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
> YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch
>
>
> According to YARN-2495, when distributed node label configuration is enabled:
> - RMAdmin / REST API should reject change labels on node operations.
> - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
> heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498524#comment-14498524
 ] 

Hadoop QA commented on YARN-3136:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725922/00011-YARN-3136.patch
  against trunk revision 1fa8075.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7360//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7360//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7360//console

This message is automatically generated.

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, 
> 00011-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 
> 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 
> 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-16 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3463:
--
Attachment: YARN-3463.67.patch

rm unneeded capached change

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch, YARN-3463.67.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498510#comment-14498510
 ] 

Hadoop QA commented on YARN-3366:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725923/YARN-3366.006.patch
  against trunk revision 1fa8075.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
  
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRTestTests
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestarTests
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7359//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7359//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7359//console

This message is automatically generated.

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
> YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
> YARN-3366.006.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in 

[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498502#comment-14498502
 ] 

Jian He commented on YARN-2696:
---

could we add a validation for clusterResource/queueCapacity updates when 
container is released  

> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498493#comment-14498493
 ] 

Hitesh Shah commented on YARN-2976:
---

Agreed. The newer one would take precedence.

> Invalid docs for specifying 
> yarn.nodemanager.docker-container-executor.exec-name
> 
>
> Key: YARN-2976
> URL: https://issues.apache.org/jira/browse/YARN-2976
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Hitesh Shah
>Assignee: Vijay Bhat
>Priority: Minor
>
> Docs on 
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
>  mention setting "docker -H=tcp://0.0.0.0:4243" for 
> yarn.nodemanager.docker-container-executor.exec-name. 
> However, the actual implementation does a fileExists for the specified value. 
> Either the docs need to be fixed or the impl changed to allow relative paths 
> or commands with additional args



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-16 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498492#comment-14498492
 ] 

Craig Welch commented on YARN-3463:
---

bq. should we make pendingApplications to use customized comparator
No, applications still start as they did before,  no reason to change it

bq. can we make it not use generic type now for simpler
No, this comes from the interface definition and it's needed to enable another 
scheduler, say FS, be able to use the same code with their derived application 
types of choice

bq. I think we may carefully add ORDERING_POLICY_CONFIG, since this will be a 
public config. I understand the reason to add the policy_config is to support 
policy=fair, config=fair+fifo usecase
So this configuration is not for defining the policy, that is "ORDERING_POLICY" 
(which is where you would have "fair", "fifo", etc), this is for configuration 
elements which may modify the behavior of the policy, such as sizeBasedWeight, 
& it's needed for that purpose.

bq. Why change disposableLeafQueue.getApplications().size()
I think this is left over from earlier changes & is no longer needed, will 
remove.

bq. Suppress generic warning? Javac warning?
I'm actually not getting any there, I think because this is used for mocking

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498488#comment-14498488
 ] 

Yongjun Zhang commented on YARN-3021:
-

The test failure is likely YARN-2483.


> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
> YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3497) ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498486#comment-14498486
 ] 

Hadoop QA commented on YARN-3497:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725933/YARN-3497.001.patch
  against trunk revision 1fa8075.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7362//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7362//console

This message is automatically generated.

> ContainerManagementProtocolProxy modifies IPC timeout conf without making a 
> copy
> 
>
> Key: YARN-3497
> URL: https://issues.apache.org/jira/browse/YARN-3497
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3497.001.patch
>
>
> yarn-client's ContainerManagementProtocolProxy is updating 
> ipc.client.connection.maxidletime in the conf passed in without making a copy 
> of it. That modification "leaks" into other systems using the same conf and 
> can cause them to setup RPC connections with a timeout of zero as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name

2015-04-16 Thread Vijay Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498487#comment-14498487
 ] 

Vijay Bhat commented on YARN-2976:
--

[~hitesh], yes that makes sense. I can call the new config property 
yarn.nodemanager.docker-container-executor.exec-cmd. I would think that this 
setting should take precedence over 
yarn.nodemanager.docker-container-executor.exec-name in case both are present, 
what do you think?

> Invalid docs for specifying 
> yarn.nodemanager.docker-container-executor.exec-name
> 
>
> Key: YARN-2976
> URL: https://issues.apache.org/jira/browse/YARN-2976
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Hitesh Shah
>Assignee: Vijay Bhat
>Priority: Minor
>
> Docs on 
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
>  mention setting "docker -H=tcp://0.0.0.0:4243" for 
> yarn.nodemanager.docker-container-executor.exec-name. 
> However, the actual implementation does a fileExists for the specified value. 
> Either the docs need to be fixed or the impl changed to allow relative paths 
> or commands with additional args



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498480#comment-14498480
 ] 

Hadoop QA commented on YARN-3448:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725932/YARN-3448.10.patch
  against trunk revision 1fa8075.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7361//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7361//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7361//console

This message is automatically generated.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.10.patch, 
> YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, 
> YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498470#comment-14498470
 ] 

Hitesh Shah commented on YARN-2976:
---

Typo: meant to say that it *does" clash with the current config property name. 

> Invalid docs for specifying 
> yarn.nodemanager.docker-container-executor.exec-name
> 
>
> Key: YARN-2976
> URL: https://issues.apache.org/jira/browse/YARN-2976
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Hitesh Shah
>Assignee: Vijay Bhat
>Priority: Minor
>
> Docs on 
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
>  mention setting "docker -H=tcp://0.0.0.0:4243" for 
> yarn.nodemanager.docker-container-executor.exec-name. 
> However, the actual implementation does a fileExists for the specified value. 
> Either the docs need to be fixed or the impl changed to allow relative paths 
> or commands with additional args



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name

2015-04-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498466#comment-14498466
 ] 

Hitesh Shah commented on YARN-2976:
---

The latter definitely makes more sense but it does not clash with the config 
property name. Maybe, we can deprecate the old one in favor  of the newer 
config property which supports a flexible command ( relative path, args, etc)? 
For the old/cuurent one, we can fix the docs to say that it does a file exists 
check and does not support additional args? 

> Invalid docs for specifying 
> yarn.nodemanager.docker-container-executor.exec-name
> 
>
> Key: YARN-2976
> URL: https://issues.apache.org/jira/browse/YARN-2976
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Hitesh Shah
>Assignee: Vijay Bhat
>Priority: Minor
>
> Docs on 
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
>  mention setting "docker -H=tcp://0.0.0.0:4243" for 
> yarn.nodemanager.docker-container-executor.exec-name. 
> However, the actual implementation does a fileExists for the specified value. 
> Either the docs need to be fixed or the impl changed to allow relative paths 
> or commands with additional args



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498416#comment-14498416
 ] 

Hadoop QA commented on YARN-3021:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725911/YARN-3021.007.patch
  against trunk revision 1fa8075.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7358//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7358//console

This message is automatically generated.

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
> YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3497) ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy

2015-04-16 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3497:
-
Attachment: YARN-3497.001.patch

This causes the Tez AM to reconnect to the RM on every AM heartbeat.

Patch to create a copy of the configuration when initializing the proxy cache.

> ContainerManagementProtocolProxy modifies IPC timeout conf without making a 
> copy
> 
>
> Key: YARN-3497
> URL: https://issues.apache.org/jira/browse/YARN-3497
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3497.001.patch
>
>
> yarn-client's ContainerManagementProtocolProxy is updating 
> ipc.client.connection.maxidletime in the conf passed in without making a copy 
> of it. That modification "leaks" into other systems using the same conf and 
> can cause them to setup RPC connections with a timeout of zero as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-16 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3448:
--
Attachment: YARN-3448.10.patch

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.10.patch, 
> YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, 
> YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name

2015-04-16 Thread Vijay Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498389#comment-14498389
 ] 

Vijay Bhat commented on YARN-2976:
--

[~hitesh], I can take this on. Would your preference be for fixing the 
documentation or extending the current behavior? I would think it makes more 
sense to have a flexible configuration.

> Invalid docs for specifying 
> yarn.nodemanager.docker-container-executor.exec-name
> 
>
> Key: YARN-2976
> URL: https://issues.apache.org/jira/browse/YARN-2976
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Hitesh Shah
>Assignee: Vijay Bhat
>Priority: Minor
>
> Docs on 
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
>  mention setting "docker -H=tcp://0.0.0.0:4243" for 
> yarn.nodemanager.docker-container-executor.exec-name. 
> However, the actual implementation does a fileExists for the specified value. 
> Either the docs need to be fixed or the impl changed to allow relative paths 
> or commands with additional args



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3498) Improve Timeline Server Memory Usage

2015-04-16 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-3498:
-

 Summary: Improve Timeline Server Memory Usage
 Key: YARN-3498
 URL: https://issues.apache.org/jira/browse/YARN-3498
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498387#comment-14498387
 ] 

Sunil G commented on YARN-3476:
---

HI [~jlowe] and [~rohithsharma]

A retention logic to handle this error may become more complex when multiple 
failures seen during aggretion across application. If this happens rarely, a 
strong retention logic with  a timer s helpful.

On a generic level, by considering more failures, a clean up after aggression 
can save the disk. Which s acceptable as we encountered error and there may not 
be real pressure to give 100% good logs with an error while aggretion.

> Nodemanager can fail to delete local logs if log aggregation fails
> --
>
> Key: YARN-3476
> URL: https://issues.apache.org/jira/browse/YARN-3476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Rohith
> Attachments: 0001-YARN-3476.patch
>
>
> If log aggregation encounters an error trying to upload the file then the 
> underlying TFile can throw an illegalstateexception which will bubble up 
> through the top of the thread and prevent the application logs from being 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name

2015-04-16 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat reassigned YARN-2976:


Assignee: Vijay Bhat

> Invalid docs for specifying 
> yarn.nodemanager.docker-container-executor.exec-name
> 
>
> Key: YARN-2976
> URL: https://issues.apache.org/jira/browse/YARN-2976
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Hitesh Shah
>Assignee: Vijay Bhat
>Priority: Minor
>
> Docs on 
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
>  mention setting "docker -H=tcp://0.0.0.0:4243" for 
> yarn.nodemanager.docker-container-executor.exec-name. 
> However, the actual implementation does a fileExists for the specified value. 
> Either the docs need to be fixed or the impl changed to allow relative paths 
> or commands with additional args



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3497) ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy

2015-04-16 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3497:


 Summary: ContainerManagementProtocolProxy modifies IPC timeout 
conf without making a copy
 Key: YARN-3497
 URL: https://issues.apache.org/jira/browse/YARN-3497
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe


yarn-client's ContainerManagementProtocolProxy is updating 
ipc.client.connection.maxidletime in the conf passed in without making a copy 
of it. That modification "leaks" into other systems using the same conf and can 
cause them to setup RPC connections with a timeout of zero as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498372#comment-14498372
 ] 

Wangda Tan commented on YARN-2740:
--

1. CommonNodeLabelsManager:
1.1) {{@VisibleForTesting}} for isDistributedNodeLabelConfiguration is not 
necessary?

1.2) How to set isDistributedNodeLabelConfiguration:
It's better to add a helper method in YarnConfiguration instead of doing 
following check:
{code}
isDistributedNodeLabelConfiguration  =
YarnConfiguration.DISTRIBUTED_NODELABEL_CONFIGURATION_TYPE.equals(conf
.get(YarnConfiguration.NODELABEL_CONFIGURATION_TYPE,
YarnConfiguration.DEFAULT_NODELABEL_CONFIGURATION_TYPE));
{code}

I noticed in ResourceTrackerService can leavage the helper method I suggest too:
{code}
String nodeLabelConfigurationType =
conf.get(YarnConfiguration.NODELABEL_CONFIGURATION_TYPE,
YarnConfiguration.DEFAULT_NODELABEL_CONFIGURATION_TYPE);

isDistributesNodeLabelsConf =
YarnConfiguration.DISTRIBUTED_NODELABEL_CONFIGURATION_TYPE
.equals(nodeLabelConfigurationType);
{code}

1.3) Better to add a brief comment about why we don't send message to store

2. FileSystemNodeLabelStore:
Add a note about why ignoreNodeLabelsMapping is needed

3. AdminService:
Should we throw exception when distributedConfiguration enabled for 
removeClusterNodeLabels? remove will change labels on node, after removed, node 
heartbeat with the removed partition will be identified as error, it seems 
reasonable to me. Admin should be able to control "valid-partitions" in the 
cluster.

Same as RMWebServices.

Test looks good to me, but may need to change if you agree with 2/3.

> ResourceManager side should properly handle node label modifications when 
> distributed node label configuration enabled
> --
>
> Key: YARN-2740
> URL: https://issues.apache.org/jira/browse/YARN-2740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
> YARN-2740.20150327-1.patch, YARN-2740.20150411-1.patch, 
> YARN-2740.20150411-2.patch, YARN-2740.20150411-3.patch
>
>
> According to YARN-2495, when distributed node label configuration is enabled:
> - RMAdmin / REST API should reject change labels on node operations.
> - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
> heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-16 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498366#comment-14498366
 ] 

Vrushali C commented on YARN-3411:
--

Thanks [~djp] ! 
bq. Just quickly go through the poc patch which is good but only have 
EntityTable so far. Do we have plan to split other tables to other JIRAs?

yes, we can have jiras for other tables as we add in those functionalities. 
Right now, the PoC is focussed only on entity writes, hence this patch has only 
that table related stuff.

bq. Some quick comments on poc patch is we should reuse many operations here 
like split() or join() in other classes, so better to create a utility class 
with putting common methods to share.
Absolutely agreed, I am refining the patch. With hRaven we have a bunch of such 
utility classes.  I was trying to see how many I can put in, since it's not 
confirmed that this would be the way to go. I did not want to mix up too much 
code. But I will be uploading a refined patch + some more changes like Metric 
writing soon. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-16 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3366:

Attachment: YARN-3366.006.patch

hi [~djp] ,

Uploading a new patch based on your feedback.

Thanks,
-Sidharta

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
> YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, 
> YARN-3366.006.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-04-16 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3136:
--
Attachment: 00011-YARN-3136.patch

Thank you [~jianhe]
I have added the check in exclude file. Will kick in jenkins with this patch.

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, 
> 00011-YARN-3136.patch, 0002-YARN-3136.patch, 0003-YARN-3136.patch, 
> 0004-YARN-3136.patch, 0005-YARN-3136.patch, 0006-YARN-3136.patch, 
> 0007-YARN-3136.patch, 0008-YARN-3136.patch, 0009-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3487) CapacityScheduler scheduler lock obtained unnecessarily

2015-04-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498324#comment-14498324
 ] 

Sunil G commented on YARN-3487:
---

Hi [~leftnoteasy] and [~jlowe]
During reloading CS config, if a queue s removed, eventhough 'queues' is a 
concurrent one, operation for delete may be in progress. And here we may try 
for checkAccess and pass it with success.


> CapacityScheduler scheduler lock obtained unnecessarily
> ---
>
> Key: YARN-3487
> URL: https://issues.apache.org/jira/browse/YARN-3487
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-3487.001.patch, YARN-3487.002.patch
>
>
> Recently saw a significant slowdown of applications on a large cluster, and 
> we noticed there were a large number of blocked threads on the RM.  Most of 
> the blocked threads were waiting for the CapacityScheduler lock while calling 
> getQueueInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498326#comment-14498326
 ] 

Wangda Tan commented on YARN-3463:
--

Some comments:

LeafQueue:
- activeApplications now becomes orderingPolicy, should we make 
pendingApplications to use customized comparator?
- getOrderingPolicy is located in CapacityScheduler, so SchedulerEntity can be 
FicaSchedulerAppp, can we make it not use generic type now for simpler?

CapacitySchedulerConfiguration:
- I think we may carefully add ORDERING_POLICY_CONFIG, since this will be a 
public config. I understand the reason to add the policy_config is to support 
policy=fair, config=fair+fifo usecase, but I'm not sure if we really need that. 
For now, to satisfy fair scheduling only, I suggest to support policy only, and 
the config="fair+fifo" will be set directly, we can discuss and add 
configurable items in the future. And since the ordering_policy potentially 
could change (the ordering_policy is a new feature and isn't under extensive 
review), it's better to add a note to indicate it's a experimental option in 
the upcoming doc.

CapacityScheduler:
- Why change this?
{code}
-  + " is not empty " + disposableLeafQueue.getApplications().size()
+  + " is not empty " + disposableLeafQueue.getNumActiveApplications()
{code}

TestProportional...Policy:
- Suppress generic warning?
{code}
OrderingPolicy so = mock(OrderingPolicy.class);
when(so.getPreemptionIterator()).thenAnswer(new Answer() {
{code}

Javac warning?

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-16 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498321#comment-14498321
 ] 

Zhijie Shen commented on YARN-3431:
---

bq. There is always exception get thrown here. We should find some way to 
figure it out. e.g. adding a boolean value of check type?

It won't always throw exception. In this example, the exception will be thrown 
only when you use a TimelineEntity object whose type is not YARN_CLUSTER to 
construct a ClusterEntity, because this is logically invalid construction.

bq. If so, I don't understand what benefit we gain comparing with type casting 
directly. Am I missing something here?

At the web service side, we will only receive the generic TimelineEntity 
object, but not its subclass object. We can't do casting but use the generic 
object to construct the subclass object again.

> Sub resources of timeline entity needs to be passed to a separate endpoint.
> ---
>
> Key: YARN-3431
> URL: https://issues.apache.org/jira/browse/YARN-3431
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch
>
>
> We have TimelineEntity and some other entities as subclass that inherit from 
> it. However, we only have a single endpoint, which consume TimelineEntity 
> rather than sub-classes and this endpoint will check the incoming request 
> body contains exactly TimelineEntity object. However, the json data which is 
> serialized from sub-class object seems not to be treated as an TimelineEntity 
> object, and won't be deserialized into the corresponding sub-class object 
> which cause deserialization failure as some discussions in YARN-3334 : 
> https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3136) getTransferredContainers can be a bottleneck during AM registration

2015-04-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498322#comment-14498322
 ] 

Jian He commented on YARN-3136:
---

Hi [~sunilg], for that I think we can add these two in the 
hadoop-yarn/dev-support/findbugs-exclude.xml to suppress them

> getTransferredContainers can be a bottleneck during AM registration
> ---
>
> Key: YARN-3136
> URL: https://issues.apache.org/jira/browse/YARN-3136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-3136.patch, 00010-YARN-3136.patch, 
> 0002-YARN-3136.patch, 0003-YARN-3136.patch, 0004-YARN-3136.patch, 
> 0005-YARN-3136.patch, 0006-YARN-3136.patch, 0007-YARN-3136.patch, 
> 0008-YARN-3136.patch, 0009-YARN-3136.patch
>
>
> While examining RM stack traces on a busy cluster I noticed a pattern of AMs 
> stuck waiting for the scheduler lock trying to call getTransferredContainers. 
>  The scheduler lock is highly contended, especially on a large cluster with 
> many nodes heartbeating, and it would be nice if we could find a way to 
> eliminate the need to grab this lock during this call.  We've already done 
> similar work during AM allocate calls to make sure they don't needlessly grab 
> the scheduler lock, and it would be good to do so here as well, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-04-16 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498312#comment-14498312
 ] 

zhihai xu commented on YARN-3491:
-

Yes, I agree Storing asynchronously is going to be a bit dangerous.
Yes, I will do more profiling in PublicLocalizer#addResource to get the detail 
information for the time spending of each sub-code segment.


> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>
> Improve the public resource localization to do both FSDownload submission to 
> the thread pool and completed localization handling in one thread 
> (PublicLocalizer).
> Currently FSDownload submission to the thread pool is done in 
> PublicLocalizer#addResource which is running in Dispatcher thread and 
> completed localization handling is done in PublicLocalizer#run which is 
> running in PublicLocalizer thread.
> Because PublicLocalizer#addResource is time consuming, the thread pool can't 
> be fully utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> Also there are two more benefits with this change:
> 1. The Dispatcher thread won't be blocked by PublicLocalizer#addResource . 
> Dispatcher thread handles most of time critical events at Node manager.
> 2. don't need synchronization on HashMap (pending).
> Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics

2015-04-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498306#comment-14498306
 ] 

Sunil G commented on YARN-3494:
---

Could we define the metric in QueueMetrics, but the setter for same can be 
updated from CSQueueUtils. And can be used only by CS for now.


> Expose AM resource limit and user limit in QueueMetrics 
> 
>
> Key: YARN-3494
> URL: https://issues.apache.org/jira/browse/YARN-3494
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Rohith
>
> Now we have the AM resource limit and user limit shown on the web UI, it 
> would be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498278#comment-14498278
 ] 

Junping Du commented on YARN-3411:
--

Just quickly go through the poc patch which is good but only have EntityTable 
so far. Do we have plan to split other tables to other JIRAs? I would support 
that because mid size patch (not too large, not small) can make 
development/review iteration moving faster. 
Some quick comments on poc patch is we should reuse many operations here like 
split() or join() in other classes, so better to create a utility class with 
putting common methods to share.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-16 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498275#comment-14498275
 ] 

Sidharta Seethana commented on YARN-3366:
-

hi [~djp] ,

Thanks for review. Responses inline :
1. About YarnConfiguration.java changes - the new configs are designated 
@Private - this is an alpha/preview feature and things could change in 
subsequent releases. Unfortunately, there doesn't seem to be a better way 
handle "preview" configurations. Please correct me if I am wrong.
2. About making the interface config an array : At this point we only support 
traffic shaping on *one* interface (the primary interface being used for 
intra-cluster traffic). We could consider adding support for multiple 
interfaces in the future, but it is not supported for the time being. 
3. About empty resource handler chain : If there are no resource handlers 
configured, all the 'hooks' - bootstrap, preStart etc are no-ops. No exceptions 
are thrown. 
4. About TC_MODIFY_STATE - yes, I'll add the break. Good catch, thanks. 
5. About null check missing - I'll add this. I think this got deleted by 
accident - not sure how.
6. About returning a boolean - sure, I'll remove the unused boolean return
7. About MAX_CONTAINER_COUNT : Unfortunately, we cannot make this dynamic for 
the time being. We'll need to compute the bandwidth assigned during 
bootstrap/startup time. Once we have scheduling support for this resource type, 
this computation will go away. I will however, reduce the max count to 50. 
Please note that, unless strict usage is enabled, these is a soft-limit. A 
container is allowed to use more bandwidth if available.
8. postComplete returning an op instead of null : this is by design. We 
currently do not have a way of batching operations in the container-executor 
binary apart from launch container - i.e a separate invocation is necessary 
anyway. Another reason for executing this op inline is that this is not in the 
performance critical "launch container" path.
9. Debug log lines : Sure, I'll merge the lines. I think we should create a 
document somewhere with such unwritten rules ( or add this to the contributor 
document ) that this is a required convention. 




> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
> YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498253#comment-14498253
 ] 

Yongjun Zhang commented on YARN-3021:
-

HI [~jianhe],

Thanks for looking at it again and sorry for late response, I was out for some 
time myself too.

It turned out that the same patch 007 applies for me with today's trunk, and I 
uploaded it again.



> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
> YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-16 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.007.patch

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.007.patch, 
> YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498183#comment-14498183
 ] 

Hudson commented on YARN-3326:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2115 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2115/])
YARN-3326. Support RESTful API for getLabelsToNodes. Contributed by 
Naganarasimha G R. (ozawa: rev e48cedc663b8a26fd62140c8e2907f9b4edd9785)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LabelsToNodesInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodeIDsInfo.java


> Support RESTful API for getLabelsToNodes 
> -
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498178#comment-14498178
 ] 

Hudson commented on YARN-3354:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2115 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2115/])
YARN-3354. Add node label expression in ContainerTokenIdentifier to support RM 
recovery. Contributed by Wangda Tan (jianhe: rev 
1b89a3e173f8e905074ed6714a7be5c003c0e2c4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NMContainerStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestWorkPreservingRMRestartForNodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java


> Container should contains node-labels asked by original ResourceRequests
> 
>
> Key: YARN-3354
> URL: https://issues.apache.org/jira/browse/YARN-3354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0
>
> Attachments: YARN-3354.1.patch, YARN-3354.2.patch
>
>
> We proposed non-exclusive node labels in YARN-3214, makes non-labeled 
> resource requests can be allocated on labeled nodes which has idle resources.
> To make preemption work, we need know an allocated container's original node 
> label: when labeled resource requests comes back, we need kill non-labeled 
> containers running on labeled nodes.
> This requires add node-labels in Container, and also, NM need store this 
> information and send back to RM when RM restart to recover original container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3404) View the queue name to YARN Application page

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498179#comment-14498179
 ] 

Hudson commented on YARN-3404:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2115 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2115/])
YARN-3404. Display queue name on application page. Contributed by Ryu Kobayashi 
(jianhe: rev b2e6cf607f1712d103520ca6b3ff21ecc07cd265)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* hadoop-yarn-project/CHANGES.txt


> View the queue name to YARN Application page
> 
>
> Key: YARN-3404
> URL: https://issues.apache.org/jira/browse/YARN-3404
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, 
> YARN-3404.4.patch, screenshot.png
>
>
> It want to display the name of the queue that is used to YARN Application 
> page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498180#comment-14498180
 ] 

Hudson commented on YARN-3318:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2115 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2115/])
YARN-3318. Create Initial OrderingPolicy Framework and FifoOrderingPolicy. 
(Craig Welch via wangda) (wangda: rev 5004e753322084e42dfda4be1d2db66677f86a1e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/TestFifoOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/SchedulableEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoComparator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/MockSchedulableEntity.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java


> Create Initial OrderingPolicy Framework and FifoOrderingPolicy
> --
>
> Key: YARN-3318
> URL: https://issues.apache.org/jira/browse/YARN-3318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.8.0
>
> Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
> YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
> YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, 
> YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, 
> YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, 
> YARN-3318.58.patch, YARN-3318.59.patch, YARN-3318.60.patch, YARN-3318.61.patch
>
>
> Create the initial framework required for using OrderingPolicies and an 
> initial FifoOrderingPolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498169#comment-14498169
 ] 

Junping Du commented on YARN-3431:
--

Thanks [~zjshen] for delivering a updated patch to fix it! 
The solution here looks good to me in general. Some comments:
In HierarchicalTimelineEntity.java,
{code}
+  public ClusterEntity(TimelineEntity entity) {
+super(entity);
+if (!entity.getType().equals(TimelineEntityType.YARN_CLUSTER.toString())) {
+  throw new IllegalArgumentException("Incompatible entity type: " + 
getId());
+}
+  }
{code}
Sounds like a serious bug here: we have subclass of HierarchicalTimelineEntity, 
it will call the type check in sub class and parent class when doing 
construction. There is always exception get thrown here. We should find some 
way to figure it out. e.g. adding a boolean value of check type?

{code}
+  public TimelineEntity(TimelineEntity entity) {
+prototype = entity.getPrototype();
+  }
...
+  protected TimelineEntity getPrototype() {
+return prototype == null ? this : prototype;
+  }
{code}
I think we have a prototype TimelineEntity here to create a TimelineEntity 
object from a subclass object of TimelineEnity. Isn't it? If so, I don't 
understand what benefit we gain comparing with type casting directly. Am I 
missing something here? 


> Sub resources of timeline entity needs to be passed to a separate endpoint.
> ---
>
> Key: YARN-3431
> URL: https://issues.apache.org/jira/browse/YARN-3431
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch
>
>
> We have TimelineEntity and some other entities as subclass that inherit from 
> it. However, we only have a single endpoint, which consume TimelineEntity 
> rather than sub-classes and this endpoint will check the incoming request 
> body contains exactly TimelineEntity object. However, the json data which is 
> serialized from sub-class object seems not to be treated as an TimelineEntity 
> object, and won't be deserialized into the corresponding sub-class object 
> which cause deserialization failure as some discussions in YARN-3334 : 
> https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498146#comment-14498146
 ] 

Hadoop QA commented on YARN-3448:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725873/YARN-3448.9.patch
  against trunk revision 2e8ea78.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7357//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7357//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7357//console

This message is automatically generated.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch, 
> YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, 
> YARN-3448.9.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498125#comment-14498125
 ] 

Hudson commented on YARN-3354:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #166 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/166/])
YARN-3354. Add node label expression in ContainerTokenIdentifier to support RM 
recovery. Contributed by Wangda Tan (jianhe: rev 
1b89a3e173f8e905074ed6714a7be5c003c0e2c4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestWorkPreservingRMRestartForNodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NMContainerStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


> Container should contains node-labels asked by original ResourceRequests
> 
>
> Key: YARN-3354
> URL: https://issues.apache.org/jira/browse/YARN-3354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0
>
> Attachments: YARN-3354.1.patch, YARN-3354.2.patch
>
>
> We proposed non-exclusive node labels in YARN-3214, makes non-labeled 
> resource requests can be allocated on labeled nodes which has idle resources.
> To make preemption work, we need know an allocated container's original node 
> label: when labeled resource requests comes back, we need kill non-labeled 
> containers running on labeled nodes.
> This requires add node-labels in Container, and also, NM need store this 
> information and send back to RM when RM restart to recover original container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498129#comment-14498129
 ] 

Hudson commented on YARN-3326:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #166 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/166/])
YARN-3326. Support RESTful API for getLabelsToNodes. Contributed by 
Naganarasimha G R. (ozawa: rev e48cedc663b8a26fd62140c8e2907f9b4edd9785)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LabelsToNodesInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodeIDsInfo.java


> Support RESTful API for getLabelsToNodes 
> -
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3404) View the queue name to YARN Application page

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498126#comment-14498126
 ] 

Hudson commented on YARN-3404:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #166 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/166/])
YARN-3404. Display queue name on application page. Contributed by Ryu Kobayashi 
(jianhe: rev b2e6cf607f1712d103520ca6b3ff21ecc07cd265)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java


> View the queue name to YARN Application page
> 
>
> Key: YARN-3404
> URL: https://issues.apache.org/jira/browse/YARN-3404
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, 
> YARN-3404.4.patch, screenshot.png
>
>
> It want to display the name of the queue that is used to YARN Application 
> page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy

2015-04-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498127#comment-14498127
 ] 

Hudson commented on YARN-3318:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #166 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/166/])
YARN-3318. Create Initial OrderingPolicy Framework and FifoOrderingPolicy. 
(Craig Welch via wangda) (wangda: rev 5004e753322084e42dfda4be1d2db66677f86a1e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/TestFifoOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoComparator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/MockSchedulableEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/SchedulableEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java


> Create Initial OrderingPolicy Framework and FifoOrderingPolicy
> --
>
> Key: YARN-3318
> URL: https://issues.apache.org/jira/browse/YARN-3318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.8.0
>
> Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
> YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
> YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, 
> YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, 
> YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, 
> YARN-3318.58.patch, YARN-3318.59.patch, YARN-3318.60.patch, YARN-3318.61.patch
>
>
> Create the initial framework required for using OrderingPolicies and an 
> initial FifoOrderingPolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498100#comment-14498100
 ] 

Junping Du commented on YARN-3366:
--

Thanks [~sidharta-s] for delivering the patch! A few comments so far:
In YarnConfiguration.java,
{code}
+  /** This setting controls if resource handling for network bandwidth is 
enabled **/
+  /* Work in progress: This configuration parameter may be changed/removed in 
the future */
+  @Private
+  public static final String NM_NETWORK_RESOURCE_ENABLED =
+  NM_NETWORK_RESOURCE_PREFIX + "enabled";
+  /** Network as a resource is disabled by default **/
+  public static final boolean DEFAULT_NM_NETWORK_RESOURCE_ENABLED = false;
{code}
Why we are explicitly saying "Work in progress: This configuration parameter 
may be changed/removed in the future" here and in other places? These 
configuration properties, once go in and released, can only deprecated but 
cannot removed without major release update.

{code}
+  public static final String NM_NETWORK_RESOURCE_INTERFACE =
+  NM_NETWORK_RESOURCE_PREFIX + "interface";
+  public static final String DEFAULT_NM_NETWORK_RESOURCE_INTERFACE = "eth0";
{code}
Shall we support the case for multiple network interfaces? I know the user can 
do something like nic teaming in OS configuration layer. However, YARN 
shouldn't assume user have to do this. Isn't it? If so, better to update to 
String array.

In LinuxContainerExecutor.java,
{code}
+try {
+  resourceHandlerChain = ResourceHandlerModule
+  .getConfiguredResourceHandlerChain(super.getConf());
+  if (resourceHandlerChain != null) {
+resourceHandlerChain.bootstrap(super.getConf());
+  }
+} catch (ResourceHandlerException e) {
+  LOG.error("Failed to bootstrap configured resource subsystems! ", e);
+  throw new IOException("Failed to bootstrap configured resource 
subsystems!");
+}
{code}
If "NM_NETWORK_RESOURCE_ENABLED" = false, the resourceHandlerChain will still 
be initiated with empty handler but not null. So 
resourceHandlerChain.bootstrap() still get called which is not necessary and 
possible get exception thrown out. I think we should make sure we don't involve 
any operations if all resources configuration are disabled (NETWORK_RESOURCE 
only so far).
In this case, may be we should make resourceHandlerChain to be null when 
NM_NETWORK_RESOURCE_ENABLED is false (assume other resources haven't onboard so 
far)? Also, other operations like: postComplete, reacquireContainer, etc. has 
the same issue.

{code}
+  for (PrivilegedOperation op : ops) {
+switch (op.getOperationType()) {
+  case ADD_PID_TO_CGROUP:
+resourceOps.add(op);
+break;
+  case TC_MODIFY_STATE:
+tcCommandFile = op.getArguments().get(0);
+  default:
+LOG.warn("PrivilegedOperation type unsupported in launch: "
++ op.getOperationType());
+continue;
+}
+  }
{code}
We miss a break in case TC_MODIFY_STATE: ?

In ResourceHandlerModule.java,
{code}
+if (cGroupsHandler == null) {
+  synchronized (CGroupsHandler.class) {
+cGroupsHandler = new CGroupsHandlerImpl(conf,
+PrivilegedOperationExecutor.getInstance(conf));
+  }
+}
{code}
We miss a null check again inside of the synchronized (CGroupsHandler.class) 
block.


{code}
+  private static boolean addHandlerIfNotNull(List handlerList,
+  ResourceHandler handler) {
+return (handler != null) && handlerList.add(handler);
+  }
{code}
Return a boolean value is not necessary?

In TrafficControlBandwidthHandlerImpl.java,
{code}
+  //In the absence of 'scheduling' support, we'll 'infer' the guaranteed
+  //outbound bandwidth for each container based on this number. This will
+  //likely go away once we add support on the RM for this resource type.
+  private static final int MAX_CONTAINER_COUNT = 100;
...
+containerBandwidthMbit = (int) Math.ceil((double) yarnBandwidthMbit /
+MAX_CONTAINER_COUNT);
{code}
Can we make containerBandwidthMbit being calculated dynamically by number of 
containers running on NM? If not, setting MAX_CONTAINER_COUNT to 100 sounds too 
large to me which could make containerBandwidthMbit smaller than necessary. 
Typically, containers on a powerful machine in production environment should be 
10 - 50, may be we could set something like: 50 here? 

For postComplete(ContainerId containerId), we should return op instead of null?

In TrafficController.java,
{code}
+  if (LOG.isDebugEnabled()) {
+LOG.debug("TC state: ");
+LOG.debug(output);
+  }
...
+  if (LOG.isDebugEnabled()) {
+LOG.debug("classId -> bytes sent");
+LOG.debug(classIdBytesStats);
+  }
{code}
Can we merge two LOG.debug() into one line with %n if we want a new lin

[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-04-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498092#comment-14498092
 ] 

Jason Lowe commented on YARN-3491:
--

Storing asynchronously is going to be a bit dangerous -- we do not want to 
create a situation where a resource has started localizing but we haven't 
recorded the fact that we started it.  Theoretically we could end up doing a 
recovery where we leak a resource or fail to realize a localization started but 
did not complete and we need to clean it up.

I think it's best at this point to have some hard evidence from a profiler or 
targeted log statements around the suspected code where all the time is being 
spent in the NM rather than guessing.

> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>
> Improve the public resource localization to do both FSDownload submission to 
> the thread pool and completed localization handling in one thread 
> (PublicLocalizer).
> Currently FSDownload submission to the thread pool is done in 
> PublicLocalizer#addResource which is running in Dispatcher thread and 
> completed localization handling is done in PublicLocalizer#run which is 
> running in PublicLocalizer thread.
> Because PublicLocalizer#addResource is time consuming, the thread pool can't 
> be fully utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> Also there are two more benefits with this change:
> 1. The Dispatcher thread won't be blocked by PublicLocalizer#addResource . 
> Dispatcher thread handles most of time critical events at Node manager.
> 2. don't need synchronization on HashMap (pending).
> Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3496) Add a configuration to disable/enable storing localization state in NMLeveldbStateStore

2015-04-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498088#comment-14498088
 ] 

Jason Lowe commented on YARN-3496:
--

How are we going to support recovery of containers without tracking which 
resources we have localized?  We cannot re-localize resources for already 
running containers -- they're already running on the existing localized 
resources.  If we fail to track those resources then we leak them on the local 
filesystem, potentially colliding with them later.  So from my perspective 
there already is a way to enable/disable this behavior, and that's 
enabling/disabling NM recovery.  I don't see how we can support recovery 
without storing localization state.

Do we have hard evidence storing to leveldb is a huge NM performance impact, or 
is this theoretical?  In our testing we have not seen enabling NM recovery as 
having a significant impact on NM performance.

> Add a configuration to disable/enable storing localization state in 
> NMLeveldbStateStore
> ---
>
> Key: YARN-3496
> URL: https://issues.apache.org/jira/browse/YARN-3496
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> Add a configuration to disable/enable storing localization state in 
> NMLeveldbStateStore.
> Store Localization state in the levelDB may have some overhead, which may 
> affect NM performance.
> It would better to have a configuration to disable/enable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-16 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3448:
--
Attachment: YARN-3448.9.patch

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch, 
> YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, 
> YARN-3448.9.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >