date:20150114


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277501#comment-14277501
 ] 

Sangjin Lee commented on YARN-2928:
---

I agree with Karthik's comments.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param

2015-01-14 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-3062:
--
Affects Version/s: 2.4.0
   2.5.0
   2.6.0

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-14 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277452#comment-14277452
 ] 

Xuan Gong commented on YARN-2807:
-

+1 LGTM. Will commit it.

 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, 
 YARN-2807.4.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-14 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277491#comment-14277491
 ] 

Robert Kanter commented on YARN-2928:
-

+1 to starting with a clean slate.  If we need to use something from the 
original ATS, it's pretty easy to import or copy/paste it.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277508#comment-14277508
]

Sangjin Lee commented on YARN-2928:
---

bq. In the following, depending on how the writer is implemented, we may want
to preserve the outstanding timeline data that is received by ATS companion but
is still not be persisted into the storage backend. IAC, it seem to be the
common requirement no matter it's per-node (e.g., restarting) or per-app (e.g.,
crashing).

The point taken about the need to recover the in-memory data in either approach.

I am fine with starting with the per-node companion approach with the
understanding that at some point we need to have at least an option of the
per-app companion. We can reword YARN-3033 to do that. What do others think?
[~jrottinghuis]?

Application Timeline Server (ATS) next gen: phase 1
---

Key: YARN-2928
URL: https://issues.apache.org/jira/browse/YARN-2928
Project: Hadoop YARN
Issue Type: New Feature
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf

We have the application timeline server implemented in yarn per YARN-1530 and
YARN-321. Although it is a great feature, we have recognized several critical
issues and features that need to be addressed.
This JIRA proposes the design and implementation changes to address those.
This is phase 1 of this effort.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param

2015-01-14 Thread Hitesh Shah (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277693#comment-14277693
]

Hitesh Shah commented on YARN-3062:
---

bq. Do you keep primaryFilters field set to the same filters when the entity is
created? I understand it may be counter intuitive, but to post again to the
existing entity and make all the records indexed by primaryFilters be updated
to, you need to make sure primaryFilters field is properly set.

bq. It doesn't update, but append. Say you have primaryFilter key1:value1. Then
you update key1:value2. Finally you will get key1:[value1, value2].

Where is all of the above documented?

timelineserver gives inconsistent data for otherinfo field based on the
filter param

Key: YARN-3062
URL: https://issues.apache.org/jira/browse/YARN-3062
Project: Hadoop YARN
Issue Type: Bug
Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
Attachments: withfilter.json, withoutfilter.json

When otherinfo field gets updated, in some cases the data returned for an
entity is dependent on the filter usage.
for ex in the attached files for the
- entity: vertex_1421164610335_0020_1_01,
- entitytype: TEZ_VERTEX_ID,
for the otherinfo.numTasks, got updated from 1009 to 253
- using
{code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
{code} gives the updated value: 253
- using
{code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
gives the old value: 1009

for the otherinfo.status field, which gets updated, both of them show the
updated value.
TEZ-1942 has more details.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery


[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277478#comment-14277478
 ] 

Jian He commented on YARN-3019:
---

thanks for committing, Junping !

 Make work-preserving-recovery the default mechanism for RM recovery
 ---

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3019.1.patch


 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default   
 to flip recovery mode to work-preserving recovery from non-work-preserving 
 recovery. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277567#comment-14277567
 ] 

Zhijie Shen commented on YARN-2928:
---

bq. My vote is to start from a clean slate with a new source project 

Hm... It makes sense to me.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive


[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277483#comment-14277483
 ] 

Hudson commented on YARN-2807:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6859 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6859/])
YARN-2807. Option --forceactive not works as described in usage of (xgong: 
rev d15cbae73c7ae22d5d60d8cba16cba565e8e8b20)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerHA.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch, 
 YARN-2807.4.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param

[
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277603#comment-14277603
]

Zhijie Shen commented on YARN-3062:
---

bq. if I have 2 primaryFilters while creating the entity, any update to entity
has to use the TimelineEntity.addPrimaryFilter for both primaryFilters?

Yes

bq. can a new primaryfilter be added in an update

You can, but when you filter with 3rd primaryfilter, you will miss the
information that is posted before. It's a limitation of primaryfilter. The
recommended way to use primaryfilter is to come up with all filters you may
want to use.

bq. can the primaryFilter value be changed in an update?

It doesn't update, but append. Say you have primaryFilter key1:value1. Then you
update key1:value2. Finally you will get key1:[value1, value2].

timelineserver gives inconsistent data for otherinfo field based on the
filter param

for the otherinfo.status field, which gets updated, both of them show the
updated value.
TEZ-1942 has more details.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param


[ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277573#comment-14277573
 ] 

Prakash Ramachandran commented on YARN-3062:


thanks for the clarification.[~zjshen] could be the reason, will check and 
update. 

so to clarify 
* if I have 2 primaryFilters while creating the entity,  any update to entity 
has to use the TimelineEntity.addPrimaryFilter for both primaryFilters?
* can a new primaryfilter be added in an update. (2 primaryfilters at entity 
creation time, add a new filter in update to make 3 filters, every subsequent 
update sets all 3 filters)
* can the primaryFilter value be changed in an update?
 


 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

[
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277516#comment-14277516
]

Hadoop QA commented on YARN-2933:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12692282/YARN-2933-8.patch
against trunk revision d336d13.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

The following test timeouts occurred in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6332//console

This message is automatically generated.

Capacity Scheduler preemption policy should only consider capacity without
labels temporarily
-

Key: YARN-2933
URL: https://issues.apache.org/jira/browse/YARN-2933
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch,
YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch,
YARN-2933-8.patch

Currently, we have capacity enforcement on each queue for each label in
CapacityScheduler, but we don't have preemption policy to support that.
YARN-2498 is targeting to support preemption respect node labels, but we have
some gaps in code base, like queues/FiCaScheduler should be able to get
usedResource/pendingResource, etc. by label. These items potentially need to
refactor CS which we need spend some time carefully think about.
For now, what immediately we can do is allow calculate ideal_allocation and
preempt containers only for resources on nodes without labels, to avoid
regression like: A cluster has some nodes with labels and some not, assume
queueA isn't satisfied for resource without label, but for now, preemption
policy may preempt resource from nodes with labels for queueA, that is not
correct.
Again, it is just a short-term enhancement, YARN-2498 will consider
preemption respecting node-labels for Capacity Scheduler which is our final
target.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param


[ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277468#comment-14277468
 ] 

Zhijie Shen commented on YARN-3062:
---

[~pramachandran], how did you update the entity? Do you keep *primaryFilters* 
field set to the same filters when the entity is created? I understand it may 
be counter intuitive, but to post again to the existing entity and make all the 
records indexed by *primaryFilters* be updated to, you need to make sure 
*primaryFilters* field is properly set.

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277597#comment-14277597
]

Zhijie Shen commented on YARN-2928:
---

One correlated issue I want to raise here is aggregated log service. Currently,
only JHS is serving the aggregated logs for completed MR jobs and control the
log file retention. It doesn't cover the different workloads on YARN as well as
the long running services that will never end. We thought of making ATS the hub
to serve aggregated logs before, but didn't achieve it in the time frame of ATS
current gen.

Therefore, though aggregated log service is the part of the major goal of ATS
next gen - scalability, I hope we'd better take into account in the future,
when designing the reader and GUI.

Application Timeline Server (ATS) next gen: phase 1
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp


 [ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3026:
-
Attachment: YARN-3026.2.patch

Rebased against trunk and fixed findbugs warning.

 Move application-specific container allocation logic from LeafQueue to 
 FiCaSchedulerApp
 ---

 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3026.1.patch, YARN-3026.2.patch


 Have a discussion with [~vinodkv] and [~jianhe]: 
 In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
 are located in LeafQueue.java in implementation. To make a cleaner scope of 
 LeafQueue, we'd better move some of them to FiCaSchedulerApp.
 Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
 from ParentQueue (like 15% of cluster resource), and it distributes resources 
 to children apps, and it should be agnostic to internal logic of children 
 apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
 application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1422#comment-1422
 ] 

Zhijie Shen edited comment on YARN-2928 at 1/14/15 10:23 PM:
-

how about timelineserver or yarntimelineserver because it doesn't limit to 
application data only, and other sub modules are all er.

bq. Also, any volunteers for creating the project skeleton?

I can help on creating the new module.


was (Author: zjshen):
how about timelineservice or yarntimelineservice because it doesn't limit 
to application data only.

bq. Also, any volunteers for creating the project skeleton?

I can help on creating the new module.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2217) Shared cache client side changes

2015-01-14 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2217:
---
Attachment: YARN-2217-trunk-v8.patch

[~kasha] Attached is V8.

I am not really sure what the issue is. It was complaining about a 
NoSuchMethodException so that seems like a classpath issue. I am not sure how 
to see what classpath the jenkins job used on the jenkins slave.

For now I have changed the syntax of the expected exception in the error test 
cases to use the annotation based syntax. Let's see if the build likes that 
better.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


 [ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-2928:
-

Assignee: Vinod Kumar Vavilapalli  (was: Sangjin Lee)

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3063) Bootstrap TimelineServer Next Gen Module


 [ 
https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3063:
--
Attachment: YARN-3063.1.patch

 Bootstrap TimelineServer Next Gen Module
 

 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3063.1.patch


 Based on the discussion on the umbrella Jira, we need to create a new 
 sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-01-14 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277879#comment-14277879
]

Jason Lowe commented on YARN-3055:
--

Agree with Jian. I believe the concern being raised here is not specific to
the changes in YARN-2704 and YARN-2964 but a long-standing issue with YARN's
handling of delegation tokens shared between jobs.

The token is not renewed properly if it's shared by jobs (oozie) in
DelegationTokenRenewer
--

Key: YARN-3055
URL: https://issues.apache.org/jira/browse/YARN-3055
Project: Hadoop YARN
Issue Type: Bug
Components: security
Reporter: Yi Liu
Assignee: Yi Liu
Attachments: YARN-3055.001.patch, YARN-3055.002.patch

After YARN-2964, there is only one timer to renew the token if it's shared by
jobs.
In {{removeApplicationFromRenewal}}, when going to remove a token, and the
token is shared by other jobs, we will not cancel the token.
Meanwhile, we should not cancel the _timerTask_, also we should not remove it
from {{allTokens}}. Otherwise for the existing submitted applications which
share this token will not get renew any more, and for new submitted
applications which share this token, the token will be renew immediately.
For example, we have 3 applications: app1, app2, app3. And they share the
token1. See following scenario:
*1).* app1 is submitted firstly, then app2, and then app3. In this case,
there is only one token renewal timer for token1, and is scheduled when app1
is submitted
*2).* app1 is finished, then the renewal timer is cancelled. token1 will not
be renewed any more, but app2 and app3 still use it, so there is problem.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2015-01-14 Thread Swapnil Daingade (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277969#comment-14277969
 ] 

Swapnil Daingade commented on YARN-2139:


Had a look at the latest design doc and was wondering if it would be possible 
to make the isolation part separate and optional from the avoiding 
over-allocation part. Enforcing isolation using Cgroups may not always work, 
especially in cases where HDFS is not the default dfs.



 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Isolation_Scheduling_3.pdf, 
 Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, 
 YARN-2139-prototype-2.patch, YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.


[ 
https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277833#comment-14277833
 ] 

Jian He commented on YARN-2861:
---

looks good, +1

 Timeline DT secret manager should not reuse the RM's configs.
 -

 Key: YARN-2861
 URL: https://issues.apache.org/jira/browse/YARN-2861
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2861.1.patch, YARN-2861.2.patch


 This is the configs for RM DT secret manager. We should create separate ones 
 for timeline DT only.
 {code}
   @Override
   protected void serviceInit(Configuration conf) throws Exception {
 long secretKeyInterval =
 conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
 long tokenMaxLifetime =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
 YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
 long tokenRenewInterval =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
 secretManager = new 
 TimelineDelegationTokenSecretManager(secretKeyInterval,
 tokenMaxLifetime, tokenRenewInterval,
 360);
 secretManager.startThreads();
 serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
 super.init(conf);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module


[ 
https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277940#comment-14277940
 ] 

Hadoop QA commented on YARN-3063:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692368/YARN-3063.1.patch
  against trunk revision 6464a89.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineserver.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6335//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6335//console

This message is automatically generated.

 Bootstrap TimelineServer Next Gen Module
 

 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3063.1.patch


 Based on the discussion on the umbrella Jira, we need to create a new 
 sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module


[ 
https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278056#comment-14278056
 ] 

Hadoop QA commented on YARN-3063:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692396/YARN-3063.2.patch
  against trunk revision 6464a89.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6336//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6336//console

This message is automatically generated.

 Bootstrap TimelineServer Next Gen Module
 

 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3063.1.patch, YARN-3063.2.patch


 Based on the discussion on the umbrella Jira, we need to create a new 
 sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-01-14 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277888#comment-14277888
 ] 

Robert Kanter commented on YARN-2928:
-

For the aggregated log service, I'm guessing we're talking about the viewer 
only and not the aggregator service itself (which runs in the NodeManagers)?  
The viewer is just one of those HTML Block Java files.  Anyway, I agree that 
the Timeline Service reader should service log files; we can reuse that code.

On a related note, the JHS currently has a service that deletes aggregated log 
files after some amount of time.  If we eventually get rid of the JHS, we'll 
have to move this somewhere else.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module


[ 
https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277889#comment-14277889
 ] 

Sangjin Lee commented on YARN-3063:
---

Thanks for the prompt patch [~zjshen]!

Some quick comments:
- Vinod suggested timeline *service* as opposed to timeline server; perhaps we 
should change timeline server in names to timeline service?
-- let's also change the package name to 
org.apache.hadoop.yarn.server.timelineservice
- how about adding applicationhistoryservice as a dependency? the idea is to 
depend on it and start using it soon

 Bootstrap TimelineServer Next Gen Module
 

 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3063.1.patch


 Based on the discussion on the umbrella Jira, we need to create a new 
 sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module


[ 
https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278003#comment-14278003
 ] 

Zhijie Shen commented on YARN-3063:
---

[~sjlee0] Thanks for review!

bq. Vinod suggested timeline service as opposed to timeline server; perhaps we 
should change timeline server in names to timeline service?

I noticed that. It makes sense to me. Updated the name accordingly.

bq. how about adding applicationhistoryservice as a dependency? the idea is to 
depend on it and start using it soon

Sounds good. Added applicationhistoryservice as a dependency. Later on, we can 
add more dependency on demand.

In addition, I fixed some project dependency issues and change the default 
class to TimelineAggregator.

 Bootstrap TimelineServer Next Gen Module
 

 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3063.1.patch, YARN-3063.2.patch


 Based on the discussion on the umbrella Jira, we need to create a new 
 sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277761#comment-14277761
 ] 

Sangjin Lee commented on YARN-2928:
---

Let's settle on the name of the project. My vote was 
applicationtimelineservice, but I'm open to suggestions. :) And we can change 
our minds about this of course.

Also, any volunteers for creating the project skeleton? I'm not sure if I have 
commit privileges for the branch yet, but if so I can do it as well.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param


[ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1427#comment-1427
 ] 

Zhijie Shen commented on YARN-3062:
---

Unfortunately, we're still lacking the documentation.

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277868#comment-14277868
 ] 

Vinod Kumar Vavilapalli commented on YARN-2928:
---

I've been calling it timelineservice everywhere I go - the word server has a 
specific connotation.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277872#comment-14277872
 ] 

Sangjin Lee commented on YARN-2928:
---

+1 with that.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2194) Add Cgroup support for RedHat 7

2015-01-14 Thread bc Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bc Wong updated YARN-2194:
--
Description:In previous versions of RedHat, we can build custom cgroup 
hierarchies with use of the cgconfig command from the libcgroup package. From 
RedHat 7, package libcgroup is deprecated and it is not recommended to use it 
since it can easily create conflicts with the default cgroup hierarchy. The 
systemd is provided and recommended for cgroup management. We need to add 
support for this.  (was: In previous versions of RedHat, we can build custom 
cgroup hierarchies with use of the cgconfig command from the libcgroup package. 
From RedHat 7, package libcgroup is deprecated and it is not recommended to use 
it since it can easily create conflicts with the default cgroup hierarchy. The 
systemd is provided and recommended for cgroup management. We need to add 
support for this.)

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2194-1.patch


In previous versions of RedHat, we can build custom cgroup hierarchies 
 with use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277880#comment-14277880
]

Vinod Kumar Vavilapalli commented on YARN-2928:
---

[~sjlee0]

bq. (1) While it may be faster to allocate with the per-node companions,
capacity-wise you would end up spending more capacity with the per-node
approach. Since these per-node companions are always up although they may be
idle for large amount of time. So if capacity is a concern you may lose out.
Under what circumstances would per-node companions be more advantageous in
terms of capacity?
Agreed, we will have to carve out some capacity for the per-node companions. I
see some sort of static allocation like 1GB similar to NodeManager. I've never
seen anyone change the NM capacity as it usually simply forgets things or
persists state to local store. The per-node agent can also take the same
approach - a limited heap, and forget or spill over to the Timeline Storage
(e.g. HBase). Only when we want to utilize some memory for short term
aggregations, capacity will be a concern. The other point is that we anyways
have to carve out this capacity for things like YARN-2965.

bq. (2) I do have a question about the work-preserving aspect of the per-node
ATS companion. One implication of making this a per-node thing (i.e.
long-running) is that we need to handle the work-preserving restart. What if we
need to restart the ATS companion? Since other YARN daemons (RM and NM) allow
for work-preserving restarts, we cannot have the ATS companion break that. So
that seems to be a requirement?
Yes, recoverability is a requirement for ALA. I'd design it such that it is the
responsibility of each app's aggregator (living inside the node agent) instead
of of the node-agent itself.

bq. (3) We still need to handle the lifecycle management aspects of it.
Previously we said that when RM allocates an AM it would tell the NM so the NM
could spawn the special container. With the per-node approach, the RM would
still need to tell the NM so that the NM can talk to the per-node ATS companion
to initialize the data structure for the given app.
Yes again. That doesn't change. And it would exactly work the way you said - at
no place in the system will it be assumed that the aggregator is running per
node - except for the final 'launcher' who launches the aggregator.

bq. These are quick observations. While I do see value in the per-node
approach, it's not totally clear how much work it would save over the per-app
approach given these observations. What do you think?
Like I mentioned, it won't save anything. It does two things in my mind (1) Let
us focus on the wire up first without thinking about scheduling aspects in RM
and (2) Let's us figure out other parallel efforts like YARN-1012, YARN-2965,
YARN-2984, YARN-2141 can be unified in terms of per-node stats collection.

Application Timeline Server (ATS) next gen: phase 1
---

Key: YARN-2928
URL: https://issues.apache.org/jira/browse/YARN-2928
Project: Hadoop YARN
Issue Type: New Feature
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

[
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277913#comment-14277913
]

Sangjin Lee commented on YARN-2928:
---

I agree with the aggregated log viewer being part of ATS. Probably not for the
first phase, but eventually. I'll update the doc accordingly.

Speaking of which, I think we may want to standardize the names of the
components, especially the one we launch to write data. We've used several
names to refer to the same thing, and it'd be good if we just settle on one
name so there is no confusion. May sound like a small thing, but it'd help
discussing things rapidly.

We used: ATS writer, ATS writer companion, aggregator, and ALA. I'm not
married to any names here. How about Timeline aggregator? I'm open to
suggestions.

Application Timeline Server (ATS) next gen: phase 1
---

Key: YARN-2928
URL: https://issues.apache.org/jira/browse/YARN-2928
Project: Hadoop YARN
Issue Type: New Feature
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily


[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277786#comment-14277786
 ] 

Wangda Tan commented on YARN-2933:
--

In addition to Jian's comment, it's better to use enum type instead of int in 
mockContainer, which can avoid call getValue() from enum.


 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277808#comment-14277808
 ] 

Sangjin Lee commented on YARN-2928:
---

timelineserver sounds like a good name to me. Thanks!

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277873#comment-14277873
 ] 

Vinod Kumar Vavilapalli commented on YARN-2928:
---

[~zjshen],

bq. One correlated issue I want to raise here is aggregated log service. [..] 
Therefore, though aggregated log service is [not] the part of the major goal of 
ATS next gen - scalability, I hope we'd better take into account in the future, 
when designing the reader and GUI.
+1, we need a home for log viewer service and we were veering towards the 
Timeline Service itself.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3063) Bootstrap TimelineServer Next Gen Module


 [ 
https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3063:
--
Target Version/s: YARN-2928

 Bootstrap TimelineServer Next Gen Module
 

 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3063.1.patch


 Based on the discussion on the umbrella Jira, we need to create a new 
 sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param


 [ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3062:
--
Target Version/s:   (was: YARN-2928)

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param


 [ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3062:
--
Target Version/s: YARN-2928

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2015-01-14 Thread Swapnil Daingade (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277979#comment-14277979
 ] 

Swapnil Daingade commented on YARN-2791:


Hi [~kasha]] and [~vinodkv]],
Went over the latest design doc for YARN-2139 and posted my comments.


 Add Disk as a resource for scheduling
 -

 Key: YARN-2791
 URL: https://issues.apache.org/jira/browse/YARN-2791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Swapnil Daingade
Assignee: Yuliya Feldman
 Attachments: DiskDriveAsResourceInYARN.pdf


 Currently, the number of disks present on a node is not considered a factor 
 while scheduling containers on that node. Having large amount of memory on a 
 node can lead to high number of containers being launched on that node, all 
 of which compete for I/O bandwidth. This multiplexing of I/O across 
 containers can lead to slower overall progress and sub-optimal resource 
 utilization as containers starved for I/O bandwidth hold on to other 
 resources like cpu and memory. This problem can be solved by considering disk 
 as a resource and including it in deciding how many containers can be 
 concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277977#comment-14277977
 ] 

Zhijie Shen commented on YARN-2928:
---

+1 for Timeline Aggregator

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily


[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277784#comment-14277784
 ] 

Jian He commented on YARN-2933:
---

- looks good overall, we should use priority.AMCONTAINER here ? 
{code}
  if(setAMContainer  i == 0){
cLive.add(mockContainer(appAttId, cAlloc, unit, priority.CONTAINER
.getValue()));
   if (priority.CONTAINER.getValue() == cpriority) {
  when(mC.isAMContainer()).thenReturn(true);
}
{code}

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3063) Bootstrap TimelineServer Next Gen Module

Zhijie Shen created YARN-3063:
-

 Summary: Bootstrap TimelineServer Next Gen Module
 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Based on the discussion on the umbrella Jira, we need to create a new 
sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1055) Handle app recovery differently for AM failures and RM restart


[ 
https://issues.apache.org/jira/browse/YARN-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277951#comment-14277951
 ] 

Jian He commented on YARN-1055:
---

With work-preserving RM restart, the max-attempts is not required any more. we 
may close this out ?

 Handle app recovery differently for AM failures and RM restart
 --

 Key: YARN-1055
 URL: https://issues.apache.org/jira/browse/YARN-1055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla

 Ideally, we would like to tolerate container, AM, RM failures. App recovery 
 for AM and RM currently relies on the max-attempts config; tolerating AM 
 failures requires it to be  1 and tolerating RM failure/restart requires it 
 to be = 1.
 We should handle these two differently, with two separate configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.


[ 
https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277962#comment-14277962
 ] 

Hadoop QA commented on YARN-2861:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692169/YARN-2861.2.patch
  against trunk revision 6464a89.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6334//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6334//console

This message is automatically generated.

 Timeline DT secret manager should not reuse the RM's configs.
 -

 Key: YARN-2861
 URL: https://issues.apache.org/jira/browse/YARN-2861
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2861.1.patch, YARN-2861.2.patch


 This is the configs for RM DT secret manager. We should create separate ones 
 for timeline DT only.
 {code}
   @Override
   protected void serviceInit(Configuration conf) throws Exception {
 long secretKeyInterval =
 conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
 long tokenMaxLifetime =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
 YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
 long tokenRenewInterval =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
 secretManager = new 
 TimelineDelegationTokenSecretManager(secretKeyInterval,
 tokenMaxLifetime, tokenRenewInterval,
 360);
 secretManager.startThreads();
 serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
 super.init(conf);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1422#comment-1422
 ] 

Zhijie Shen commented on YARN-2928:
---

how about timelineservice or yarntimelineservice because it doesn't limit 
to application data only.

bq. Also, any volunteers for creating the project skeleton?

I can help on creating the new module.

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp


[ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277854#comment-14277854
 ] 

Hadoop QA commented on YARN-3026:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692338/YARN-3026.2.patch
  against trunk revision 7fe0f25.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6333//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6333//console

This message is automatically generated.

 Move application-specific container allocation logic from LeafQueue to 
 FiCaSchedulerApp
 ---

 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3026.1.patch, YARN-3026.2.patch


 Have a discussion with [~vinodkv] and [~jianhe]: 
 In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
 are located in LeafQueue.java in implementation. To make a cleaner scope of 
 LeafQueue, we'd better move some of them to FiCaSchedulerApp.
 Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
 from ParentQueue (like 15% of cluster resource), and it distributes resources 
 to children apps, and it should be agnostic to internal logic of children 
 apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
 application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2797) TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase

2015-01-14 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277966#comment-14277966
 ] 

Robert Kanter commented on YARN-2797:
-

+1

 TestWorkPreservingRMRestart should use ParametrizedSchedulerTestBase
 

 Key: YARN-2797
 URL: https://issues.apache.org/jira/browse/YARN-2797
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Attachments: yarn-2797-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3065) TestNodeManagerResync errors

Karthik Kambatla created YARN-3065:
--

 Summary: TestNodeManagerResync errors
 Key: YARN-3065
 URL: https://issues.apache.org/jira/browse/YARN-3065
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla


TestNodeManagerResync started failing recently, mostly due to a test timeout. 
See attachment for a sample test output. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3065) TestNodeManagerResync errors


 [ 
https://issues.apache.org/jira/browse/YARN-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3065:
---
Attachment: 
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.txt

 TestNodeManagerResync errors
 

 Key: YARN-3065
 URL: https://issues.apache.org/jira/browse/YARN-3065
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
 Attachments: 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.txt


 TestNodeManagerResync started failing recently, mostly due to a test timeout. 
 See attachment for a sample test output. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2217) Shared cache client side changes


[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278072#comment-14278072
 ] 

Hadoop QA commented on YARN-2217:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12692399/YARN-2217-trunk-v8.patch
  against trunk revision 6464a89.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.cli.TestRMAdminCLI

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6337//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6337//console

This message is automatically generated.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3064) TestRMRestart/TestContainerResourceUsage failure with allocation timeout in trunk

Wangda Tan created YARN-3064:


 Summary: TestRMRestart/TestContainerResourceUsage failure with 
allocation timeout in trunk
 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Priority: Critical


Noticed consistent tests failure, see:
https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/

Logs like:
{code}
Error Message

Attempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED
Stacktrace

java.lang.AssertionError: Attempt state is not correct (timedout) 
expected:ALLOCATED but was:SCHEDULED
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
{code}

I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk


 [ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3064:
--
Summary: TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync 
failure with allocation timeout in trunk  (was: 
TestRMRestart/TestContainerResourceUsage failure with allocation timeout in 
trunk)

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical

 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk


[ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278276#comment-14278276
 ] 

Jian He commented on YARN-3064:
---

caused by YARN-3019. Some tests are still based on the non-work-preserving 
recovery mechanism
Uploaded a patch to fix the tests.

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2217) Shared cache client side changes


[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278287#comment-14278287
 ] 

Karthik Kambatla commented on YARN-2217:


v8 patch looked mostly good, but for one nit: 
SharedCacheClientImpl#stopClientProxy checks for scmClient not null but doesn't 
set it to null. 

Posted v9 that fixes it. I am +1 on the v9 patch. Will commit it if Jenkins 
doesn't complain of anything. 

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp


[ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278073#comment-14278073
 ] 

Wangda Tan commented on YARN-3026:
--

Test failures should not relate to this patch, filed YARN-3064 to track it.

 Move application-specific container allocation logic from LeafQueue to 
 FiCaSchedulerApp
 ---

 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3026.1.patch, YARN-3026.2.patch


 Have a discussion with [~vinodkv] and [~jianhe]: 
 In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
 are located in LeafQueue.java in implementation. To make a cleaner scope of 
 LeafQueue, we'd better move some of them to FiCaSchedulerApp.
 Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
 from ParentQueue (like 15% of cluster resource), and it distributes resources 
 to children apps, and it should be agnostic to internal logic of children 
 apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
 application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3064) TestRMRestart/TestContainerResourceUsage failure with allocation timeout in trunk


 [ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-3064:
-

Assignee: Jian He

 TestRMRestart/TestContainerResourceUsage failure with allocation timeout in 
 trunk
 -

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical

 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage


[ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278251#comment-14278251
 ] 

Karthik Kambatla commented on YARN-2984:


The test failure is unrelated. It fails on trunk as well, filed YARN-3065 to 
fix it. 

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-1.patch, yarn-2984-2.patch, yarn-2984-3.patch, 
 yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module


[ 
https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278273#comment-14278273
 ] 

Zhijie Shen commented on YARN-3063:
---

[~sjlee0], thanks for review. Will commit it to branch YARN-2928.

 Bootstrap TimelineServer Next Gen Module
 

 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3063.1.patch, YARN-3063.2.patch


 Based on the discussion on the umbrella Jira, we need to create a new 
 sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk


 [ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3064:
--
Attachment: YARN-3064.1.patch

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3063) Bootstrap TimelineServer Next Gen Module


[ 
https://issues.apache.org/jira/browse/YARN-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278118#comment-14278118
 ] 

Sangjin Lee commented on YARN-3063:
---

Thanks [~zjshen]! LGTM.

 Bootstrap TimelineServer Next Gen Module
 

 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3063.1.patch, YARN-3063.2.patch


 Based on the discussion on the umbrella Jira, we need to create a new 
 sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2217) Shared cache client side changes


 [ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2217:
---
Attachment: YARN-2217-trunk-v9.patch

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2990) FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests


[ 
https://issues.apache.org/jira/browse/YARN-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278300#comment-14278300
 ] 

Karthik Kambatla commented on YARN-2990:


Looked more closely, unfortunately this appears to be by design. Each 
application has an allowed locality level - initially node-local, and 
transitions to rack-local and off-switch after the corresponding delays. 
Instead, it might be better to track this allowed locality level per 
{{ResourceRequest}}. I propose:
# In the short-term, to address the case where the AM has to go through the 
node-local and rack-local delays, we could start with the default locality 
level of off-switch and reset it node-local after the AM is allocated. 
# In the long-term, let us augment ResourceRequest to include allowed locality 
level. 

Thoughts? 

 FairScheduler's delay-scheduling always waits for node-local and rack-local 
 delays, even for off-rack-only requests
 ---

 Key: YARN-2990
 URL: https://issues.apache.org/jira/browse/YARN-2990
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2990-test.patch


 Looking at the FairScheduler, it appears the node/rack locality delays are 
 used for all requests, even those that are only off-rack. 
 More details in comments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2896) Server side PB changes for Priority Label Manager and Admin CLI support


[ 
https://issues.apache.org/jira/browse/YARN-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276967#comment-14276967
 ] 

Hadoop QA commented on YARN-2896:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691683/0003-YARN-2896.patch
  against trunk revision d336d13.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6331//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6331//console

This message is automatically generated.

 Server side PB changes for Priority Label Manager and Admin CLI support
 ---

 Key: YARN-2896
 URL: https://issues.apache.org/jira/browse/YARN-2896
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2896.patch, 0002-YARN-2896.patch, 
 0003-YARN-2896.patch


 Common changes:
  * PB support changes required for Admin APIs 
  * PB support for File System store (Priority Label Store)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery


[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276978#comment-14276978
 ] 

Hudson commented on YARN-3019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2005 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2005/])
YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. 
(Contributed by Jian He) (junping_du: rev 
f92e5038000a012229c304bc6e5281411eff2883)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


 Make work-preserving-recovery the default mechanism for RM recovery
 ---

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3019.1.patch


 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default   
 to flip recovery mode to work-preserving recovery from non-work-preserving 
 recovery. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery


[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277057#comment-14277057
 ] 

Hudson commented on YARN-3019:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2024 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2024/])
YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. 
(Contributed by Jian He) (junping_du: rev 
f92e5038000a012229c304bc6e5281411eff2883)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 Make work-preserving-recovery the default mechanism for RM recovery
 ---

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3019.1.patch


 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default   
 to flip recovery mode to work-preserving recovery from non-work-preserving 
 recovery. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277058#comment-14277058
 ] 

Hudson commented on YARN-2637:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2024 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2024/])
YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when 
activating applications. Contributed by Craig Welch (jianhe: rev 
c53420f58364b11fbda1dace7679d45534533382)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java


 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch,

[jira] [Commented] (YARN-3061) NPE in RM AppBlock render


[ 
https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276945#comment-14276945
 ] 

Steve Loughran commented on YARN-3061:
--

{code}
2015-01-14 14:25:46,894 ERROR webapp.Dispatcher (Dispatcher.java:service(162)) 
- error handling URI: /cluster/app/application_1420734007650_0010
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:116)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)

[jira] [Created] (YARN-3061) NPE in RM AppBlock render

Steve Loughran created YARN-3061:


 Summary: NPE in RM AppBlock render
 Key: YARN-3061
 URL: https://issues.apache.org/jira/browse/YARN-3061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Priority: Minor


An RM (running in a VM which did a sleep/resume) overnight no longer launches 
apps, and when you try to look at the logs, Web UI says 500 look at the logs, 
which show a stack trace



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276952#comment-14276952
 ] 

Hudson commented on YARN-2637:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #70 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/70/])
YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when 
activating applications. Contributed by Craig Welch (jianhe: rev 
c53420f58364b11fbda1dace7679d45534533382)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java


 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch,

[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery


[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276951#comment-14276951
 ] 

Hudson commented on YARN-3019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #70 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/70/])
YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. 
(Contributed by Jian He) (junping_du: rev 
f92e5038000a012229c304bc6e5281411eff2883)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Make work-preserving-recovery the default mechanism for RM recovery
 ---

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3019.1.patch


 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default   
 to flip recovery mode to work-preserving recovery from non-work-preserving 
 recovery. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3019) Make work-preserving-recovery the default mechanism for RM recovery


[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277038#comment-14277038
 ] 

Hudson commented on YARN-3019:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #74 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/74/])
YARN-3019. Make work-preserving-recovery the default mechanism for RM recovery. 
(Contributed by Jian He) (junping_du: rev 
f92e5038000a012229c304bc6e5281411eff2883)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 Make work-preserving-recovery the default mechanism for RM recovery
 ---

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3019.1.patch


 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default   
 to flip recovery mode to work-preserving recovery from non-work-preserving 
 recovery. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1


[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277071#comment-14277071
 ] 

Karthik Kambatla commented on YARN-2928:


Starting from a clean slate sounds reasonable. I would punt on copying source 
during the development though, just add a dependency on 
applicationhistoryservice and use the necessary classes. Once phase 1 dev is 
mostly done, we will be able to make a call whether to merge the modules or 
copy the requirements over. 

 Application Timeline Server (ATS) next gen: phase 1
 ---

 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf


 We have the application timeline server implemented in yarn per YARN-1530 and 
 YARN-321. Although it is a great feature, we have recognized several critical 
 issues and features that need to be addressed.
 This JIRA proposes the design and implementation changes to address those. 
 This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3061) NPE in RM AppBlock render


[ 
https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276956#comment-14276956
 ] 

Steve Loughran commented on YARN-3061:
--

in the source {{RMAppAttemptMetrics attemptMetrics =
rmApp.getCurrentAppAttempt().getRMAppAttemptMetrics();}}

clearly the app failed *before any app attempt was created*

The root cause looks like some token renewal thing probably caused by the VM 
save/resume, related to kerberos renewal by the look of things

{code}
org.apache.slider.funtest.lifecycle.AgentWebPagesIT
testAgentWeb(org.apache.slider.funtest.lifecycle.AgentWebPagesIT)  Time 
elapsed: 194.768 sec   FAILURE!
java.lang.AssertionError: Application Launch Failure, exit code  65
Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 
192.168.1.134:8188, Ident: (owner=stevel, renewer=yarn, realUser=, 
issueDate=1421245210012, maxDate=1421850010012, sequenceNumber=11, 
masterKeyId=6)
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.slider.funtest.framework.CommandTestBase.createTemplatedSliderApplication(CommandTestBase.groovy:691)
at 
org.apache.slider.funtest.lifecycle.AgentWebPagesIT.testAgentWeb(AgentWebPagesIT.groovy:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

{code}

Server side
{code}
2015-01-14 14:20:16,993 ERROR metrics.SystemMetricsPublisher 
(SystemMetricsPublisher.java:putEntity(427)) - Error when publishing entity 
[YARN_APPLICATION,application_1420734007650_0010]
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
from the timeline server.
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:425)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationCreatedEvent(SystemMetricsPublisher.java:258)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:213)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:442)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:437)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
2015-01-14 14:20:35,026 INFO  impl.TimelineClientImpl 
(TimelineClientImpl.java:serviceInit(285)) - Timeline service address: 
http://devix.cotham.uk:8188/ws/v1/timeline/
2015-01-14 14:20:35,766 WARN  security.DelegationTokenRenewer 
(DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(785)) - Unable to 
add the application to the delegation token renewer.
java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, 
Service: 192.168.1.134:8188, Ident: (owner=stevel, renewer=yarn, realUser=, 
issueDate=1421245210012, maxDate=1421850010012, sequenceNumber=11, 
masterKeyId=6)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
at

[jira] [Assigned] (YARN-3061) NPE in RM AppBlock render


 [ 
https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3061:
--

Assignee: Varun Saxena

 NPE in RM AppBlock render
 -

 Key: YARN-3061
 URL: https://issues.apache.org/jira/browse/YARN-3061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Varun Saxena
Priority: Minor

 An RM (running in a VM which did a sleep/resume) overnight no longer launches 
 apps, and when you try to look at the logs, Web UI says 500 look at the 
 logs, which show a stack trace



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276979#comment-14276979
 ] 

Hudson commented on YARN-2637:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2005 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2005/])
YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when 
activating applications. Contributed by Craig Welch (jianhe: rev 
c53420f58364b11fbda1dace7679d45534533382)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch,

[jira] [Commented] (YARN-3061) NPE in RM AppBlock render


[ 
https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276997#comment-14276997
 ] 

Varun Saxena commented on YARN-3061:


[~ste...@apache.org], this seems fixed by YARN-2414

 NPE in RM AppBlock render
 -

 Key: YARN-3061
 URL: https://issues.apache.org/jira/browse/YARN-3061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Varun Saxena
Priority: Minor

 An RM (running in a VM which did a sleep/resume) overnight no longer launches 
 apps, and when you try to look at the logs, Web UI says 500 look at the 
 logs, which show a stack trace



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277039#comment-14277039
 ] 

Hudson commented on YARN-2637:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #74 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/74/])
YARN-2637. Fixed max-am-resource-percent calculation in CapacityScheduler when 
activating applications. Contributed by Craig Welch (jianhe: rev 
c53420f58364b11fbda1dace7679d45534533382)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java


 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch,

[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.


[ 
https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278376#comment-14278376
 ] 

Zhijie Shen commented on YARN-2861:
---

The test failure is not related and reported in YARN-3064.

 Timeline DT secret manager should not reuse the RM's configs.
 -

 Key: YARN-2861
 URL: https://issues.apache.org/jira/browse/YARN-2861
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2861.1.patch, YARN-2861.2.patch


 This is the configs for RM DT secret manager. We should create separate ones 
 for timeline DT only.
 {code}
   @Override
   protected void serviceInit(Configuration conf) throws Exception {
 long secretKeyInterval =
 conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
 long tokenMaxLifetime =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
 YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
 long tokenRenewInterval =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
 secretManager = new 
 TimelineDelegationTokenSecretManager(secretKeyInterval,
 tokenMaxLifetime, tokenRenewInterval,
 360);
 secretManager.startThreads();
 serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
 super.init(conf);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2217) Shared cache client side changes


[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278326#comment-14278326
 ] 

Hadoop QA commented on YARN-2217:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12692454/YARN-2217-trunk-v9.patch
  against trunk revision 5805dc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  org.apache.hadoop.yarn.client.cli.TestRMAdminCLI
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6339//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6339//console

This message is automatically generated.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch, 
 YARN-2217-trunk-v9.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3064) TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with allocation timeout in trunk


[ 
https://issues.apache.org/jira/browse/YARN-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278355#comment-14278355
 ] 

Hadoop QA commented on YARN-3064:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692450/YARN-3064.1.patch
  against trunk revision 5805dc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6338//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6338//console

This message is automatically generated.

 TestRMRestart/TestContainerResourceUsage/TestNodeManagerResync failure with 
 allocation timeout in trunk
 ---

 Key: YARN-3064
 URL: https://issues.apache.org/jira/browse/YARN-3064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Wangda Tan
Assignee: Jian He
Priority: Critical
 Attachments: YARN-3064.1.patch


 Noticed consistent tests failure, see:
 https://builds.apache.org/job/PreCommit-YARN-Build/6332//testReport/
 Logs like:
 {code}
 Error Message
 Attempt state is not correct (timedout) expected:ALLOCATED but 
 was:SCHEDULED
 Stacktrace
 java.lang.AssertionError: Attempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:152)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1794)
 {code}
 I can reproduce it in local environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-14 Thread Chris Trezzo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278127#comment-14278127
 ] 

Chris Trezzo commented on YARN-2217:


The last test failure is unrelated. The patch should be good to go!

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch, YARN-2217-trunk-v7.patch, YARN-2217-trunk-v8.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3065) TestNodeManagerResync errors


[ 
https://issues.apache.org/jira/browse/YARN-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278268#comment-14278268
 ] 

Jian He commented on YARN-3065:
---

fixing as part of YARN-3064, closing as a dup.

 TestNodeManagerResync errors
 

 Key: YARN-3065
 URL: https://issues.apache.org/jira/browse/YARN-3065
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
 Attachments: 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.txt


 TestNodeManagerResync started failing recently, mostly due to a test timeout. 
 See attachment for a sample test output. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3065) TestNodeManagerResync errors


 [ 
https://issues.apache.org/jira/browse/YARN-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-3065.
---
Resolution: Duplicate

 TestNodeManagerResync errors
 

 Key: YARN-3065
 URL: https://issues.apache.org/jira/browse/YARN-3065
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
 Attachments: 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.txt


 TestNodeManagerResync started failing recently, mostly due to a test timeout. 
 See attachment for a sample test output. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277106#comment-14277106
 ] 

Karthik Kambatla commented on YARN-2962:


The proposed approach sounds reasonable. How about adding a config that 
controls the number of digits (decimal places) to get at 2:2 or 1:3 split. 

 ZKRMStateStore: Limit the number of znodes under a znode
 

 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical

 We ran into this issue where we were hitting the default ZK server message 
 size configs, primarily because the message had too many znodes even though 
 they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277112#comment-14277112
 ] 

Varun Saxena commented on YARN-2962:


Yup, that sounds good. 

 ZKRMStateStore: Limit the number of znodes under a znode
 

 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical

 We ran into this issue where we were hitting the default ZK server message 
 size configs, primarily because the message had too many znodes even though 
 they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3061) NPE in RM AppBlock render


[ 
https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277173#comment-14277173
 ] 

Steve Loughran commented on YARN-3061:
--

you're right. closing as a duplicate

 NPE in RM AppBlock render
 -

 Key: YARN-3061
 URL: https://issues.apache.org/jira/browse/YARN-3061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0


 An RM (running in a VM which did a sleep/resume) overnight no longer launches 
 apps, and when you try to look at the logs, Web UI says 500 look at the 
 logs, which show a stack trace



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3061) NPE in RM AppBlock render


 [ 
https://issues.apache.org/jira/browse/YARN-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-3061.
--
   Resolution: Duplicate
Fix Version/s: 2.7.0

 NPE in RM AppBlock render
 -

 Key: YARN-3061
 URL: https://issues.apache.org/jira/browse/YARN-3061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0


 An RM (running in a VM which did a sleep/resume) overnight no longer launches 
 apps, and when you try to look at the logs, Web UI says 500 look at the 
 logs, which show a stack trace



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277099#comment-14277099
 ] 

Varun Saxena commented on YARN-2962:


[~kasha], your views on this

 ZKRMStateStore: Limit the number of znodes under a znode
 

 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical

 We ran into this issue where we were hitting the default ZK server message 
 size configs, primarily because the message had too many znodes even though 
 they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3054) Preempt policy in FairScheduler may cause mapreduce job never finish

2015-01-14 Thread Wei Yan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277174#comment-14277174
]

Wei Yan commented on YARN-3054:
---

Hi, [~peng.zhang]. Firstly, FairScheduler will check whether the usage is over
fairness.
{code}
private boolean preemptContainerPreCheck() {
return parent.getPolicy().checkIfUsageOverFairShare(getResourceUsage(),
getFairShare());
}
{code}

bq. Mapreduce jobs can get additional resources when others are idle.
I'm not sure what your idle meaning here. But in YARN, one queue can take the
over-fairshare resource, if the resources are not used by other queues. And in
FairScheduler, each queue has steady fairshare and dynamic fairshare. For
example, if we have two queues (Q1 and Q2), both with weight 1. So Q1's steady
share is 50%, and Q2 is also 50%. Assume only Q1 has jobs and no job submitted
to Q2, Q1's dynamic fairness is 100% and Q2 is 0. The dynamic fairshare
calculation only considers active queues.

bq. Mapreduce jobs for one user in one queue can still progress with its min
share when others preempt resources back.
As I said above, each queue is guaranted with minshare and fairshare. That
means, some jobs can still move on. We cannot assign a minshare to each job.
Otherwise, the job with multiple concurrent jobs may take over the cluster.

Preempt policy in FairScheduler may cause mapreduce job never finish

Key: YARN-3054
URL: https://issues.apache.org/jira/browse/YARN-3054
Project: Hadoop YARN
Issue Type: Bug
Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Peng Zhang

Preemption policy is related with schedule policy now. Using comparator of
schedule policy to find preemption candidate cannot guarantee a subset of
containers never be preempted. And this may cause tasks to be preempted
periodically before they finish. So job cannot make any progress.
I think preemption in YARN should got below assurance:
1. Mapreduce jobs can get additional resources when others are idle;
2. Mapreduce jobs for one user in one queue can still progress with its min
share when others preempt resources back.
Maybe always preempt the latest app and container can get this?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer


[ 
https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277142#comment-14277142
 ] 

Karthik Kambatla commented on YARN-2919:


Sorry for the delay here. The approach you suggest sounds about right. I might 
have more concrete comments based on the patch. 

 Potential race between renew and cancel in DelegationTokenRenwer 
 -

 Key: YARN-2919
 URL: https://issues.apache.org/jira/browse/YARN-2919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: YARN-2919.20141209-1.patch


 YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a 
 race because of which a renewal in flight isn't interrupted by a cancel. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param

Prakash Ramachandran created YARN-3062:
--

 Summary: timelineserver gives inconsistent data for otherinfo 
field based on the filter param
 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Prakash Ramachandran


When otherinfo field gets updated, in some cases the data returned for an 
entity is dependent on the filter usage. 

for ex in the attached files for the 
- entity: vertex_1421164610335_0020_1_01,
- entitytype: TEZ_VERTEX_ID,

for the otherinfo.numTasks,  got updated from 1009 to 253
- using 
{code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
 {code} gives the updated value: 1009
- using 
{code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
 gives the old value: 253
 
for the otherinfo.status field, which gets updated,   both of them show the 
updated value. 

TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-14 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-8.patch

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-14 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277328#comment-14277328
]

Mayank Bansal commented on YARN-2933:
-

Thanks [~wangda] for review.

bq. 1) ProportionalCapacityPreemptionPolicy.setNodeLabels is too simple to be a
method, it's better to remove it.
Getter and setters are usually simple but its good practice to have. I think we
should keep it.

bq. 2) It's better to use enum here instead of integer
Done.

Thanks,
Mayank

Capacity Scheduler preemption policy should only consider capacity without
labels temporarily
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param


 [ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated YARN-3062:
---
Attachment: withoutfilter.json
withfilter.json

attaching sample output files. 

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 1009
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 253
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param


 [ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated YARN-3062:
---
Description: 
When otherinfo field gets updated, in some cases the data returned for an 
entity is dependent on the filter usage. 

for ex in the attached files for the 
- entity: vertex_1421164610335_0020_1_01,
- entitytype: TEZ_VERTEX_ID,

for the otherinfo.numTasks,  got updated from 1009 to 253
- using 
{code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
 {code} gives the updated value: 253
- using 
{code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
 gives the old value: 1009
 
for the otherinfo.status field, which gets updated,   both of them show the 
updated value. 

TEZ-1942 has more details.

  was:
When otherinfo field gets updated, in some cases the data returned for an 
entity is dependent on the filter usage. 

for ex in the attached files for the 
- entity: vertex_1421164610335_0020_1_01,
- entitytype: TEZ_VERTEX_ID,

for the otherinfo.numTasks,  got updated from 1009 to 253
- using 
{code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
 {code} gives the updated value: 1009
- using 
{code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
 gives the old value: 253
 
for the otherinfo.status field, which gets updated,   both of them show the 
updated value. 

TEZ-1942 has more details.


 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.


[ 
https://issues.apache.org/jira/browse/YARN-2861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276625#comment-14276625
 ] 

Hadoop QA commented on YARN-2861:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692169/YARN-2861.2.patch
  against trunk revision f92e503.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6330//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6330//console

This message is automatically generated.

 Timeline DT secret manager should not reuse the RM's configs.
 -

 Key: YARN-2861
 URL: https://issues.apache.org/jira/browse/YARN-2861
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2861.1.patch, YARN-2861.2.patch


 This is the configs for RM DT secret manager. We should create separate ones 
 for timeline DT only.
 {code}
   @Override
   protected void serviceInit(Configuration conf) throws Exception {
 long secretKeyInterval =
 conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
 long tokenMaxLifetime =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
 YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
 long tokenRenewInterval =
 conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
 YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
 secretManager = new 
 TimelineDelegationTokenSecretManager(secretKeyInterval,
 tokenMaxLifetime, tokenRenewInterval,
 360);
 secretManager.startThreads();
 serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
 super.init(conf);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.