[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998324#comment-13998324
 ] 

Wangda Tan commented on YARN-2053:
--

Sure, I'll do that, thanks for review!

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval

2014-05-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998337#comment-13998337
 ] 

Sandy Ryza commented on YARN-2054:
--

+1

 Poor defaults for YARN ZK configs for retries and retry-inteval
 ---

 Key: YARN-2054
 URL: https://issues.apache.org/jira/browse/YARN-2054
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2054-1.patch


 Currenly, we have the following default values:
 # yarn.resourcemanager.zk-num-retries - 500
 # yarn.resourcemanager.zk-retry-interval-ms - 2000
 This leads to a cumulate 1000 seconds before the RM gives up trying to 
 connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1957:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-45

 ProportionalCapacitPreemptionPolicy handling of corner cases...
 ---

 Key: YARN-1957
 URL: https://issues.apache.org/jira/browse/YARN-1957
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler, preemption
 Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch


 The current version of ProportionalCapacityPreemptionPolicy should be 
 improved to deal with the following two scenarios:
 1) when rebalancing over-capacity allocations, it potentially preempts 
 without considering the maxCapacity constraints of a queue (i.e., preempting 
 possibly more than strictly necessary)
 2) a zero capacity queue is preempted even if there is no demand (coherent 
 with old use of zero-capacity to disabled queues)
 The proposed patch fixes both issues, and introduce few new test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997192#comment-13997192
 ] 

Wangda Tan commented on YARN-2017:
--

bq. On a second thought, user might pass in a resource request with null 
capability. I would prefer not changing it. In fact, we can add many other null 
checks in many places. Changed the patch back.
I think null capability should be checked by ApplicationMasterService and throw 
exception before passed in. So do or don't do null pointer checking should be 
fine :)

 Merge some of the common lib code in schedulers
 ---

 Key: YARN-2017
 URL: https://issues.apache.org/jira/browse/YARN-2017
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch


 A bunch of same code is repeated among schedulers, e.g:  between 
 FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
 common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998423#comment-13998423
 ] 

Vinod Kumar Vavilapalli commented on YARN-2054:
---

Sounds related to YARN-1878, though not exactly.

If we want these configs to match up with yarn.resourcemanager.zk-timeout-ms 
and (as YARN-1878 is trying) if that can change, we need to somehow make them 
linked dynamically?

 Poor defaults for YARN ZK configs for retries and retry-inteval
 ---

 Key: YARN-2054
 URL: https://issues.apache.org/jira/browse/YARN-2054
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2054-1.patch


 Currenly, we have the following default values:
 # yarn.resourcemanager.zk-num-retries - 500
 # yarn.resourcemanager.zk-retry-interval-ms - 2000
 This leads to a cumulate 1000 seconds before the RM gives up trying to 
 connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1916) Leveldb timeline store applies secondary filters incorrectly

2014-05-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1916:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

 Leveldb timeline store applies secondary filters incorrectly
 

 Key: YARN-1916
 URL: https://issues.apache.org/jira/browse/YARN-1916
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1916.1.patch


 When applying a secondary filter (fieldname:fieldvalue) in a get entities 
 query, LeveldbTimelineStore retrieves entities that do not have the specified 
 fieldname, in addition to correctly retrieving entities that have the 
 fieldname with the specified fieldvalue.  It should not return entities that 
 do not have the fieldname.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-05-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995057#comment-13995057
 ] 

Sunil G commented on YARN-2022:
---

Thank you very much Carlo for the review. 
As per your concern about AM Container priority, I am using a static final 
variable named AM_CONTAINER_PRIORITY from RMAppAttemptImpl to check whether a 
container is AM or not.
As per my code review, this variable is not been set by user [RM only uses this 
to create an AM container Resource Request]. Hence there is no much problem in 
using the same.

Secondly for the corner cases, I agree with your point. In a specific corner 
case it is possible that 100% AM can take over a queue.
1. maximum-am-resource-percent is in cluster level and we can get maximum 
runnable applications. Actual total running applications count can also be 
fetched from all leaf queues.
   With these two, a checkpoint can be derived as you have mentioned.
2. user-limit-factor will set a user limit quota among total resources for each 
user. If preemption has to be done among applications, currently only 
application timestamp is considered [reverse order].
   So how this factor can help in giving a checkpoint for saving AM. Could you 
please share your thoughts on this point.

I will work on defining checkpoint for saving AM and will update. Meanwhile 
please check whether my explanation is in-line with you thoughts. Thank you.


 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164
 ] 

Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:08 AM:
---

[~jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.



was (Author: tucu00):
[~ jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998212#comment-13998212
 ] 

Hudson commented on YARN-1957:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-1957. Consider the max capacity of the queue when computing the ideal
capacity for preemption. Contributed by Carlo Curino (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594414)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 ProportionalCapacitPreemptionPolicy handling of corner cases...
 ---

 Key: YARN-1957
 URL: https://issues.apache.org/jira/browse/YARN-1957
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler, preemption
 Fix For: 3.0.0, 2.5.0, 2.4.1

 Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch


 The current version of ProportionalCapacityPreemptionPolicy should be 
 improved to deal with the following two scenarios:
 1) when rebalancing over-capacity allocations, it potentially preempts 
 without considering the maxCapacity constraints of a queue (i.e., preempting 
 possibly more than strictly necessary)
 2) a zero capacity queue is preempted even if there is no demand (coherent 
 with old use of zero-capacity to disabled queues)
 The proposed patch fixes both issues, and introduce few new test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1982) Rename the daemon name to timelineserver

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998228#comment-13998228
 ] 

Hudson commented on YARN-1982:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-1982. Renamed the daemon name to be TimelineServer instead of History 
Server and deprecated the old usage. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593748)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh


 Rename the daemon name to timelineserver
 

 Key: YARN-1982
 URL: https://issues.apache.org/jira/browse/YARN-1982
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: cli
 Fix For: 2.5.0

 Attachments: YARN-1982.1.patch


 Nowadays, it's confusing that we call the new component timeline server, but 
 we use
 {code}
 yarn historyserver
 yarn-daemon.sh start historyserver
 {code}
 to start the daemon.
 Before the confusion keeps being propagated, we'd better to modify command 
 line asap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1975) Used resources shows escaped html in CapacityScheduler and FairScheduler page

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998210#comment-13998210
 ] 

Hudson commented on YARN-1975:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-1975. Fix yarn application CLI to print the scheme of the tracking url of 
failed/killed applications. Contributed by Junping Du (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593874)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Used resources shows escaped html in CapacityScheduler and FairScheduler page
 -

 Key: YARN-1975
 URL: https://issues.apache.org/jira/browse/YARN-1975
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.4.0
Reporter: Nathan Roberts
Assignee: Mit Desai
 Fix For: 3.0.0, 2.4.1

 Attachments: YARN-1975.patch, screenshot-1975.png


 Used resources displays as amp;lt;memory:, vCores;amp;gt; with capacity 
 scheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1976) Tracking url missing http protocol for FAILED application

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998173#comment-13998173
 ] 

Hudson commented on YARN-1976:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1976. Fix CHANGES.txt for YARN-1976. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594123)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Tracking url missing http protocol for FAILED application
 -

 Key: YARN-1976
 URL: https://issues.apache.org/jira/browse/YARN-1976
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Junping Du
 Fix For: 2.4.1

 Attachments: YARN-1976-v2.patch, YARN-1976.patch


 Run yarn application -list -appStates FAILED,  It does not print http 
 protocol name like FINISHED apps.
 {noformat}
 -bash-4.1$ yarn application -list -appStates FINISHED,FAILED,KILLED
 14/04/15 23:55:07 INFO client.RMProxy: Connecting to ResourceManager at host
 Total number of applications (application-types: [] and states: [FINISHED, 
 FAILED, KILLED]):4
 Application-IdApplication-Name
 Application-Type  User   Queue   State
  Final-State ProgressTracking-URL
 application_1397598467870_0004   Sleep job   
 MAPREDUCEhrt_qa defaultFINISHED   
 SUCCEEDED 100% 
 http://host:19888/jobhistory/job/job_1397598467870_0004
 application_1397598467870_0003   Sleep job   
 MAPREDUCEhrt_qa defaultFINISHED   
 SUCCEEDED 100% 
 http://host:19888/jobhistory/job/job_1397598467870_0003
 application_1397598467870_0002   Sleep job   
 MAPREDUCEhrt_qa default  FAILED   
FAILED 100% 
 host:8088/cluster/app/application_1397598467870_0002
 application_1397598467870_0001  word count   
 MAPREDUCEhrt_qa defaultFINISHED   
 SUCCEEDED 100% 
 http://host:19888/jobhistory/job/job_1397598467870_0001
 {noformat}
 It only prints 'host:8088/cluster/app/application_1397598467870_0002' instead 
 'http://host:8088/cluster/app/application_1397598467870_0002' 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2042) String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp()

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998167#comment-13998167
 ] 

Hudson commented on YARN-2042:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-2042. String shouldn't be compared using == in 
QueuePlacementRule#NestedUserQueue#getQueueForApp (Chen He via Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594482)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


 String shouldn't be compared using == in 
 QueuePlacementRule#NestedUserQueue#getQueueForApp()
 

 Key: YARN-2042
 URL: https://issues.apache.org/jira/browse/YARN-2042
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2042.patch


 {code}
   if (queueName != null  queueName != ) {
 {code}
 queueName.isEmpty() should be used instead of comparing against 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1937) Access control of per-framework data

2014-05-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1937:
--

Attachment: YARN-1937.1.patch

I created a patch to make a TimlineACLsManager, which will check whether the 
query user is going to be the owner of then timeline entity; if he is, he's 
going to retrieve the entity or the events of this entity; otherwise, he can 
not access the corresponding timeline data.

To support the ACLs, I need to record the owner information of the timeline 
data when it is posted. I leverage the primary filter to store the owner 
information by reserving the timeline system filter key. Of course the system 
information will be masked before returning the timeline data back to the user.

I upload the preliminary  patch to demonstrate the idea, and will work on the 
test cases and complete local test.

It is worth mentioning that:

1. I do access control at the granularity of timeline entity. We can definitely 
explore more fine-grained control, but I prefer keeping the thing simple 
initially.

2. Initially, I'm going to support access control that only the owner can 
access his timeline data. In the future, we can extend it to allow admin and 
configured user/group list. Will file a separate ticket for the follow-up work.

 Access control of per-framework data
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2017:
--

Attachment: YARN-2017.4.patch

Rebased the patch

 Merge some of the common lib code in schedulers
 ---

 Key: YARN-2017
 URL: https://issues.apache.org/jira/browse/YARN-2017
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch, 
 YARN-2017.4.patch


 A bunch of same code is repeated among schedulers, e.g:  between 
 FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
 common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2014-05-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997193#comment-13997193
 ] 

Akira AJISAKA commented on YARN-570:


Attached a patch. With the patch, yarn.util.Times.format() renders as Wed May 
14 10:24:29 JST 2014, which is consistent with MapReduce jobhistoryserver 
WebUI.
bq. Can you update format() as well to print in the same style, if you agree?
The format of JavaScript {{Date.toLocaleString()}} varies by the browser. In my 
environment: 
{code}
Chrome: 2014/5/14 10:25:08
Safari: 2014年5月14日 10:25:08 JST
{code}
Therefore, it's impossible to update {{format()}} to print in the same style.

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-05-15 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He reassigned YARN-1680:
-

Assignee: Chen He

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He

 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-896) Roll up for long-lived services in YARN

2014-05-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-896:
--

Assignee: Xuan Gong

 Roll up for long-lived services in YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans
Assignee: Xuan Gong

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2053:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1489

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-05-15 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-2052:
-

Description: Container ids are made unique by using the app identifier and 
appending a monotonically increasing sequence number to it. Since container 
creation is a high churn activity the RM does not store the sequence number per 
app. So after restart it does not know what the new sequence number should be 
for new allocations.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2022:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-45

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-05-15 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He reassigned YARN-2034:
-

Assignee: Chen He

 Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
 

 Key: YARN-2034
 URL: https://issues.apache.org/jira/browse/YARN-2034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2034.patch


 The description in yarn-default.xml for 
 yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
 local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1986) In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE

2014-05-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1986:
-

Summary: In Fifo Scheduler, node heartbeat in between creating app and 
attempt causes NPE  (was: After upgrade from 2.2.0 to 2.4.0, NPE on first job 
start.)

 In Fifo Scheduler, node heartbeat in between creating app and attempt causes 
 NPE
 

 Key: YARN-1986
 URL: https://issues.apache.org/jira/browse/YARN-1986
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jon Bringhurst
Assignee: Sandy Ryza
Priority: Critical
 Attachments: YARN-1986-2.patch, YARN-1986-3.patch, 
 YARN-1986-testcase.patch, YARN-1986.patch


 After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 -After RM was restarted, the job runs without a problem.-
 {noformat}
 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
 NODE_UPDATE to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
   at java.lang.Thread.run(Thread.java:744)
 19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-15 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1365:


Attachment: YARN-1365.initial.patch

This is change from the prototype that allows applications to register after an 
RM restart. Need to still add unit tests

 ApplicationMasterService to allow Register and Unregister of an app that was 
 running before restart
 ---

 Key: YARN-1365
 URL: https://issues.apache.org/jira/browse/YARN-1365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1365.initial.patch


 For an application that was running before restart, the 
 ApplicationMasterService currently throws an exception when the app tries to 
 make the initial register or final unregister call. These should succeed and 
 the RMApp state machine should transition to completed like normal. 
 Unregistration should succeed for an app that the RM considers complete since 
 the RM may have died after saving completion in the store but before 
 notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-05-15 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-2034:


 Summary: Description for 
yarn.nodemanager.localizer.cache.target-size-mb is incorrect
 Key: YARN-2034
 URL: https://issues.apache.org/jira/browse/YARN-2034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0, 0.23.10
Reporter: Jason Lowe
Priority: Minor


The description for yarn.nodemanager.localizer.cache.target-size-mb says that 
it is a setting per local directory, but according to the code it's a setting 
for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164
 ] 

Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:09 AM:
---

[~jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing [to the assignee/author of the original patch] the 
changes and offering to contribute/breakdown tasks. Please do so next time.



was (Author: tucu00):
[~jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1962) Timeline server is enabled by default

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998192#comment-13998192
 ] 

Hudson commented on YARN-1962:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1962. Changed Timeline Service client configuration to be off by default 
given the non-readiness of the feature yet. Contributed by Mohammad Kamrul 
Islam. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593750)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Timeline server is enabled by default
 -

 Key: YARN-1962
 URL: https://issues.apache.org/jira/browse/YARN-1962
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Fix For: 2.4.1

 Attachments: YARN-1962.1.patch, YARN-1962.2.patch


 Since Timeline server is not matured and secured yet, enabling  it by default 
 might create some confusion.
 We were playing with 2.4.0 and found a lot of exceptions for distributed 
 shell example related to connection refused error. Btw, we didn't run TS 
 because it is not secured yet.
 Although it is possible to explicitly turn it off through yarn-site config. 
 In my opinion,  this extra change for this new service is not worthy at this 
 point,.  
 This JIRA is to turn it off by default.
 If there is an agreement, i can put a simple patch about this.
 {noformat}
 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response 
 from the timeline server.
 com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
 Connection refused
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
   at 
 org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
 Caused by: java.net.ConnectException: Connection refused
   at java.net.PlainSocketImpl.socketConnect(Native Method)
   at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
   at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
   at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
   at java.net.Socket.connect(Socket.java:579)
   at java.net.Socket.connect(Socket.java:528)
   at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
   at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR 
 impl.TimelineClientImpl: Failed to get the response from the timeline server.
 com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
 Connection refused
   at 
 com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
   at com.sun.jersey.api.client.Client.handle(Client.java:648)
   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
   at 
 com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
   

[jira] [Commented] (YARN-2042) String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp()

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998205#comment-13998205
 ] 

Hudson commented on YARN-2042:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-2042. String shouldn't be compared using == in 
QueuePlacementRule#NestedUserQueue#getQueueForApp (Chen He via Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594482)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java


 String shouldn't be compared using == in 
 QueuePlacementRule#NestedUserQueue#getQueueForApp()
 

 Key: YARN-2042
 URL: https://issues.apache.org/jira/browse/YARN-2042
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2042.patch


 {code}
   if (queueName != null  queueName != ) {
 {code}
 queueName.isEmpty() should be used instead of comparing against 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2061) Revisit logging levels in ZKRMStateStore

2014-05-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998251#comment-13998251
 ] 

Karthik Kambatla commented on YARN-2061:


# After loading state corresponding to one application.
{code}
LOG.info(Done Loading applications from ZK state store);
{code}


 Revisit logging levels in ZKRMStateStore 
 -

 Key: YARN-2061
 URL: https://issues.apache.org/jira/browse/YARN-2061
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Ray Chiang
Priority: Minor
  Labels: newbie

 ZKRMStateStore has a few places where it is logging at the INFO level. We 
 should change these to DEBUG or TRACE level messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1987) Wrapper for leveldb DBIterator to aid in handling database exceptions

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998233#comment-13998233
 ] 

Hudson commented on YARN-1987:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-1987. Wrapper for leveldb DBIterator to aid in handling database 
exceptions. (Jason Lowe via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593757)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/LeveldbIterator.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/utils
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/utils/TestLeveldbIterator.java


 Wrapper for leveldb DBIterator to aid in handling database exceptions
 -

 Key: YARN-1987
 URL: https://issues.apache.org/jira/browse/YARN-1987
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.5.0

 Attachments: YARN-1987.patch, YARN-1987v2.patch


 Per discussions in YARN-1984 and MAPREDUCE-5652, it would be nice to have a 
 utility wrapper around leveldb's DBIterator to translate the raw 
 RuntimeExceptions it can throw into DBExceptions to make it easier to handle 
 database errors while iterating.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998174#comment-13998174
 ] 

Hudson commented on YARN-1957:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1957. Consider the max capacity of the queue when computing the ideal
capacity for preemption. Contributed by Carlo Curino (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594414)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 ProportionalCapacitPreemptionPolicy handling of corner cases...
 ---

 Key: YARN-1957
 URL: https://issues.apache.org/jira/browse/YARN-1957
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler, preemption
 Fix For: 3.0.0, 2.5.0, 2.4.1

 Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch


 The current version of ProportionalCapacityPreemptionPolicy should be 
 improved to deal with the following two scenarios:
 1) when rebalancing over-capacity allocations, it potentially preempts 
 without considering the maxCapacity constraints of a queue (i.e., preempting 
 possibly more than strictly necessary)
 2) a zero capacity queue is preempted even if there is no demand (coherent 
 with old use of zero-capacity to disabled queues)
 The proposed patch fixes both issues, and introduce few new test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1751) Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing

2014-05-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993565#comment-13993565
 ] 

Jason Lowe commented on YARN-1751:
--

Despite them both being small changes, I think these should be separate JIRA 
since they're otherwise unrelated changes for different problems and can stand 
on their own.  We can morph this JIRA into one of them and file a new one to 
cover the other.

For the LogCLIHelpers change, I think it should be calling 
FileContext.getFileContext(remoteAppLogDir.toUri(), conf) in case the 
remoteAppLogDir is not on the default filesystem.  There's also the question of 
whether it should guard against a null conf, since oddly despite LogCLIHelpers 
being Configurable it isn't using the config until after this change.  I think 
I'm leaning towards leaving it null and letting the NPE occur so callers will 
fix it.  We've had lots of performance problems and other weirdness in the past 
when code forgot to pass down a custom config and things sorta worked with the 
default one.

+1 for the MiniYarnCluster change.



 Improve MiniYarnCluster and LogCLIHelpers for log aggregation testing
 -

 Key: YARN-1751
 URL: https://issues.apache.org/jira/browse/YARN-1751
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1751-trunk.patch


 MiniYarnCluster specifies individual remote log aggregation root dir for each 
 NM. Test code that uses MiniYarnCluster won't be able to get the value of log 
 aggregation root dir. The following code isn't necessary in MiniYarnCluster.
   File remoteLogDir =
   new File(testWorkDir, MiniYARNCluster.this.getName()
   + -remoteLogDir-nm- + index);
   remoteLogDir.mkdir();
   config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
   remoteLogDir.getAbsolutePath());
 In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to 
 FileContext.getFileContext() call.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2064) MR job successful but Note: Container killed by the ApplicationMaster.

2014-05-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998467#comment-13998467
 ] 

Jian He commented on YARN-2064:
---

Hi  [~dlim1234], the message should be only saying some containers(map or 
reduce tasks) were killed by the AM during the runtime of AM. As long as you 
can see the SUCCEED state on the RM web UI, the job should be successful. You 
can also use yarn application -status to query the app status from CLI.

Also, please ask such questions in Hadoop user group mailing list rather than 
here next time. JIRA site is supposed to be used for reporting issues not for 
answering general questions. thanks.

 MR job successful but Note: Container killed by the ApplicationMaster.
 --

 Key: YARN-2064
 URL: https://issues.apache.org/jira/browse/YARN-2064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, scheduler
Reporter: dlim

 Hi, just a short question for everyone
 I got MR job run on YARN, normally for small jobs, it succeeded without any 
 note in the URL page.
 However, when running long-running job, it ends with successful status but 
 with note: Container killed by the ApplicationMaster.
 The job is still running and i hesitate to kill it. Anyone know if it is 
 actually successful or not ?? I know there is a previous post on this, but 
 the answers are not so clear for me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992689#comment-13992689
 ] 

Hudson commented on YARN-1864:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5597 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5597/])
YARN-1864. Add missing file FSQueueType.java (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593191)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueueType.java
YARN-1864. Fair Scheduler Dynamic Hierarchical User Queues (Ashwin Shankar via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593190)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationFileLoaderService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestQueuePlacementPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


 Fair Scheduler Dynamic Hierarchical User Queues
 ---

 Key: YARN-1864
 URL: https://issues.apache.org/jira/browse/YARN-1864
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Fix For: 2.5.0

 Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt, 
 YARN-1864-v4.txt, YARN-1864-v5.txt, YARN-1864-v6.txt, YARN-1864-v6.txt


 In Fair Scheduler, we want to be able to create user queues under any parent 
 queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
 called root.allUserQueues, we want be able to create a new queue called 
 root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
 by this user to root.allUserQueues will be run in this newly created 
 root.allUserQueues.user1.
 This is very similar to the 'user-as-default' feature in Fair Scheduler which 
 creates user queues under root queue. But we want the ability to create user 
 queues under ANY parent queue.
 Why do we want this ?
 1. Preemption : these dynamically created user queues can preempt each other 
 if its fair share is not met. So there is fairness among users.
 User queues can also preempt other non-user leaf queue as well if below fair 
 share.
 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
 only a fraction of resources in the shared cluster. By creating this 
 feature,we could do that by giving a fair share to the parent user queue 
 which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2061) Revisit logging levels in ZKRMStateStore

2014-05-15 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998322#comment-13998322
 ] 

Ray Chiang commented on YARN-2061:
--

One minor question.  Looking at the Apache Commons Log Interface, it looks like 
the API expects the developer to always call is*Enabled() API before calling 
the actual Log.* function, but that's not used consistently in this class.  
Should I add that as well?


 Revisit logging levels in ZKRMStateStore 
 -

 Key: YARN-2061
 URL: https://issues.apache.org/jira/browse/YARN-2061
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Ray Chiang
Priority: Minor
  Labels: newbie

 ZKRMStateStore has a few places where it is logging at the INFO level. We 
 should change these to DEBUG or TRACE level messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997623#comment-13997623
 ] 

Wangda Tan commented on YARN-2053:
--

And I think this should be marked as critical or blocker bug, agree?

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2042) String shouldn't be compared using == in QueuePlacementRule#NestedUserQueue#getQueueForApp()

2014-05-15 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996665#comment-13996665
 ] 

Henry Saputra commented on YARN-2042:
-

+1 for the patch

 String shouldn't be compared using == in 
 QueuePlacementRule#NestedUserQueue#getQueueForApp()
 

 Key: YARN-2042
 URL: https://issues.apache.org/jira/browse/YARN-2042
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2042.patch


 {code}
   if (queueName != null  queueName != ) {
 {code}
 queueName.isEmpty() should be used instead of comparing against 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1937:
--

Attachment: YARN-1937.2.patch

Upload a new patch:

1. Prevent modifying the timeline entity from being modified by other users 
(re-put a timeline entity)

2. Isolate the exception when checking the access for collection operations 
(getEntities/Events)

3. Add corresponding test cases to verify ACL behavior

4. Fixed a related bug in MemoryTimelineStore, which didn't do a deep copy 
before return an object.

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch, YARN-1937.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2040) Recover information about finished containers

2014-05-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993959#comment-13993959
 ] 

Karthik Kambatla commented on YARN-2040:


[~jlowe] - please close this as duplicate if any of the other sub-tasks are 
already handling this. Thanks.

 Recover information about finished containers
 -

 Key: YARN-2040
 URL: https://issues.apache.org/jira/browse/YARN-2040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 The NM should store and recover information about finished containers as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-570) Time strings are formated in different timezone

2014-05-15 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-570:
---

Attachment: YARN-570.2.patch

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2014-05-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998392#comment-13998392
 ] 

Xuan Gong commented on YARN-941:


I am starting to work on it. And will provide a proposal soon.

 RM Should have a way to update the tokens it has for a running application
 --

 Key: YARN-941
 URL: https://issues.apache.org/jira/browse/YARN-941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Xuan Gong

 When an application is submitted to the RM it includes with it a set of 
 tokens that the RM will renew on behalf of the application, that will be 
 passed to the AM when the application is launched, and will be used when 
 launching the application to access HDFS to download files on behalf of the 
 application.
 For long lived applications/services these tokens can expire, and then the 
 tokens that the AM has will be invalid, and the tokens that the RM had will 
 also not work to launch a new AM.
 We need to provide an API that will allow the RM to replace the current 
 tokens for this application with a new set.  To avoid any real race issues, I 
 think this API should be something that the AM calls, so that the client can 
 connect to the AM with a new set of tokens it got using kerberos, then the AM 
 can inform the RM of the new set of tokens and quickly update its tokens 
 internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-05-15 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2034:
-

Description: The description in yarn-default.xml for 
yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
local directory, but according to the code it's a setting for the entire node.  
(was: The description for yarn.nodemanager.localizer.cache.target-size-mb says 
that it is a setting per local directory, but according to the code it's a 
setting for the entire node.)

 Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
 

 Key: YARN-2034
 URL: https://issues.apache.org/jira/browse/YARN-2034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Priority: Minor

 The description in yarn-default.xml for 
 yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
 local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2018) TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992833#comment-13992833
 ] 

Hudson commented on YARN-2018:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1777 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1777/])
YARN-2018. TestClientRMService.testTokenRenewalWrongUser fails after 
HADOOP-10562. (Contributed by Ming Ma) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1592783)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


 TestClientRMService.testTokenRenewalWrongUser fails after HADOOP-10562  
 

 Key: YARN-2018
 URL: https://issues.apache.org/jira/browse/YARN-2018
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.5.0
Reporter: Tsuyoshi OZAWA
Assignee: Ming Ma
 Attachments: YARN-2018.patch


 The test failure is observed on YARN-1945 and YARN-1861.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval

2014-05-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998439#comment-13998439
 ] 

Xuan Gong commented on YARN-2054:
-

Agree with [~jianhe]. 

 Poor defaults for YARN ZK configs for retries and retry-inteval
 ---

 Key: YARN-2054
 URL: https://issues.apache.org/jira/browse/YARN-2054
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2054-1.patch


 Currenly, we have the following default values:
 # yarn.resourcemanager.zk-num-retries - 500
 # yarn.resourcemanager.zk-retry-interval-ms - 2000
 This leads to a cumulate 1000 seconds before the RM gives up trying to 
 connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1986) In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998194#comment-13998194
 ] 

Hudson commented on YARN-1986:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1986. In Fifo Scheduler, node heartbeat in between creating app and 
attempt causes NPE (Hong Zhiguo via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594476)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java


 In Fifo Scheduler, node heartbeat in between creating app and attempt causes 
 NPE
 

 Key: YARN-1986
 URL: https://issues.apache.org/jira/browse/YARN-1986
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jon Bringhurst
Assignee: Hong Zhiguo
Priority: Critical
 Fix For: 2.4.1

 Attachments: YARN-1986-2.patch, YARN-1986-3.patch, 
 YARN-1986-testcase.patch, YARN-1986.patch


 After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 -After RM was restarted, the job runs without a problem.-
 {noformat}
 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
 NODE_UPDATE to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
   at java.lang.Thread.run(Thread.java:744)
 19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998227#comment-13998227
 ] 

Hudson commented on YARN-1861:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-1861. Fixed a bug in RM to reset leader-election on fencing that was 
causing both RMs to be stuck in standby mode when automatic failover is 
enabled. Contributed by Karthik Kambatla and Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594356)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Both RM stuck in standby mode when automatic failover is enabled
 

 Key: YARN-1861
 URL: https://issues.apache.org/jira/browse/YARN-1861
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.4.1

 Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
 YARN-1861.5.patch, YARN-1861.7.patch, yarn-1861-1.patch, yarn-1861-6.patch


 In our HA tests we noticed that the tests got stuck because both RM's got 
 into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164
 ] 

Alejandro Abdelnur commented on YARN-1368:
--

[~ jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1751) Improve MiniYarnCluster for log aggregation testing

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998171#comment-13998171
 ] 

Hudson commented on YARN-1751:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1751. Improve MiniYarnCluster for log aggregation testing. Contributed by 
Ming Ma (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594275)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Improve MiniYarnCluster for log aggregation testing
 ---

 Key: YARN-1751
 URL: https://issues.apache.org/jira/browse/YARN-1751
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1751-trunk.patch, YARN-1751.patch


 MiniYarnCluster specifies individual remote log aggregation root dir for each 
 NM. Test code that uses MiniYarnCluster won't be able to get the value of log 
 aggregation root dir. The following code isn't necessary in MiniYarnCluster.
   File remoteLogDir =
   new File(testWorkDir, MiniYARNCluster.this.getName()
   + -remoteLogDir-nm- + index);
   remoteLogDir.mkdir();
   config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
   remoteLogDir.getAbsolutePath());
 In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to 
 FileContext.getFileContext() call.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2016) Yarn getApplicationRequest start time range is not honored

2014-05-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998507#comment-13998507
 ] 

Junping Du commented on YARN-2016:
--

bq.  It would be good to have a unit test as I mentioned before. The test case 
I uploaded was specific to one issue, but tests with directions of the wire 
transfers and something like that would be also. May be that is something I 
will consider adding.
[~venkatnrangan], you are right that end-to-end functional test (cover whole 
process of client, wire and server) like your demo test code is also very 
helpful. It would be great if you can file some JIRA and contribute it. I will 
help to review it. Thanks!


 Yarn getApplicationRequest start time range is not honored
 --

 Key: YARN-2016
 URL: https://issues.apache.org/jira/browse/YARN-2016
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Venkat Ranganathan
Assignee: Junping Du
 Fix For: 2.4.1

 Attachments: YARN-2016.patch, YarnTest.java


 When we query for the previous applications by creating an instance of 
 GetApplicationsRequest and setting the start time range and application tag, 
 we see that the start range provided is not honored and all applications with 
 the tag are returned
 Attaching a reproducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2014-05-15 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998513#comment-13998513
 ] 

Tsuyoshi OZAWA commented on YARN-556:
-

{code}
Oh. Forgot to mention that. Anubhav Dhoot offered to split up the prototype 
into multiple patches, one for each of the sub-tasks. If I understand right, 
his prototype covers almost all the sub-tasks already created.
{code}

[~adhoot], thanks for your great work. I noticed that you attached a patch on 
YARN-1367. I'll comment there about the patch.

 RM Restart phase 2 - Work preserving restart
 

 Key: YARN-556
 URL: https://issues.apache.org/jira/browse/YARN-556
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: Work Preserving RM Restart.pdf, 
 WorkPreservingRestartPrototype.001.patch


 YARN-128 covered storing the state needed for the RM to recover critical 
 information. This umbrella jira will track changes needed to recover the 
 running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2055:
--

Target Version/s: 2.5.0
   Fix Version/s: (was: 2.1.0-beta)

 Preemption: Jobs are failing due to AMs are getting launched and killed 
 multiple times
 --

 Key: YARN-2055
 URL: https://issues.apache.org/jira/browse/YARN-2055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal

 If Queue A does not have enough capacity to run AM, then AM will borrow 
 capacity from queue B to run AM in that case AM will be killed if queue B 
 will reclaim its capacity and again AM will be launched and killed again, in 
 that case job will be failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998321#comment-13998321
 ] 

Hadoop QA commented on YARN-2017:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644678/YARN-2017.3.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3745//console

This message is automatically generated.

 Merge some of the common lib code in schedulers
 ---

 Key: YARN-2017
 URL: https://issues.apache.org/jira/browse/YARN-2017
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch


 A bunch of same code is repeated among schedulers, e.g:  between 
 FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
 common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1550) NPE in FairSchedulerAppsBlock#render

2014-05-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997742#comment-13997742
 ] 

Karthik Kambatla commented on YARN-1550:


The patch doesn't apply anymore. [~fengshen] - mind updating the patch against 
latest trunk? 

 NPE in FairSchedulerAppsBlock#render
 

 Key: YARN-1550
 URL: https://issues.apache.org/jira/browse/YARN-1550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: caolong
Priority: Critical
 Fix For: 2.2.1

 Attachments: YARN-1550.patch


 three Steps :
 1、debug at RMAppManager#submitApplication after code
 if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
 null) {
   String message = Application with id  + applicationId
   +  is already present! Cannot add a duplicate!;
   LOG.warn(message);
   throw RPCUtil.getRemoteException(message);
 }
 2、submit one application:hadoop jar 
 ~/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-ydh2.2.0-tests.jar
  sleep -Dhadoop.job.ugi=test2,#11 -Dmapreduce.job.queuename=p1 -m 1 -mt 1 
 -r 1
 3、go in page :http://ip:50030/cluster/scheduler and find 500 ERROR!
 the log:
 {noformat}
 2013-12-30 11:51:43,795 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/scheduler
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:96)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2048) List all of the containers of an application from the yarn web

2014-05-15 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated YARN-2048:
---

Affects Version/s: 2.5.0
   2.3.0
   2.4.0

 List all of the containers of an application from the yarn web
 --

 Key: YARN-2048
 URL: https://issues.apache.org/jira/browse/YARN-2048
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, webapp
Affects Versions: 2.3.0, 2.4.0, 2.5.0
Reporter: Min Zhou
 Attachments: YARN-2048-trunk-v1.patch


 Currently, Yarn haven't provide a way to list all of the containers of an 
 application from its web. This kind of information is needed by the 
 application user. They can conveniently know how many containers their 
 applications already acquired as well as which nodes those containers were 
 launched on.  They also want to view the logs of each container of an 
 application.
 One approach is maintain a container list in RMAppImpl and expose this info 
 to Application page. I will submit a patch soon



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart

2014-05-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993962#comment-13993962
 ] 

Karthik Kambatla commented on YARN-1489:


Created a couple of sub-tasks based on an offline discussion with Anubhav, 
Bikas, Jian and Vinod.

 [Umbrella] Work-preserving ApplicationMaster restart
 

 Key: YARN-1489
 URL: https://issues.apache.org/jira/browse/YARN-1489
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: Work preserving AM restart.pdf


 Today if AMs go down,
  - RM kills all the containers of that ApplicationAttempt
  - New ApplicationAttempt doesn't know where the previous containers are 
 running
  - Old running containers don't know where the new AM is running.
 We need to fix this to enable work-preserving AM restart. The later two 
 potentially can be done at the app level, but it is good to have a common 
 solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2036) Document yarn.resourcemanager.hostname in ClusterSetup

2014-05-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993888#comment-13993888
 ] 

Karthik Kambatla commented on YARN-2036:


Looks good to me. +1, pending Jenkins.

 Document yarn.resourcemanager.hostname in ClusterSetup
 --

 Key: YARN-2036
 URL: https://issues.apache.org/jira/browse/YARN-2036
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Ray Chiang
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN2036-01.patch, YARN2036-02.patch


 ClusterSetup doesn't talk about yarn.resourcemanager.hostname - most people 
 should just be able to use that directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-05-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1039:
---

Assignee: Xuan Gong

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Xuan Gong
Priority: Minor

 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-05-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998396#comment-13998396
 ] 

Xuan Gong commented on YARN-1039:
-

Start to work on it. Will provide a proposal soon.

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Xuan Gong
Priority: Minor

 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2053:
-

Attachment: YARN-2053.patch

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
 Attachments: YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1227) Update Single Cluster doc to use yarn.resourcemanager.hostname

2014-05-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992678#comment-13992678
 ] 

Akira AJISAKA commented on YARN-1227:
-

Single Cluster doc was updated in HADOOP-10139 to set the minimal 
configuration, and that's why yarn.resourcemanager.address, 
yarn.resourcemanager.scheduler.address, etc., were removed.

 Update Single Cluster doc to use yarn.resourcemanager.hostname
 --

 Key: YARN-1227
 URL: https://issues.apache.org/jira/browse/YARN-1227
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Ray Chiang
  Labels: newbie

 Now that yarn.resourcemanager.hostname can be used in place or 
 yarn.resourcemanager.address, yarn.resourcemanager.scheduler.address, etc., 
 we should update the doc to use it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-05-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996282#comment-13996282
 ] 

Carlo Curino commented on YARN-2022:


Sunil,

The problem with AM_CONTAINER_PRIORITY is that it is just a short cut for 
setting Priority = 0; The use can easily do so from its own code, and unless 
there are explicit checks that prevent ResourceRequest to assign priority = 0 
to all their containers, we have no defense against user abuses. The two 
options I see are:
 * we track which container is the AM not via Priority and protect the AM 
container from preemption whenever possible 
 * we assign a quota of protected-from-preemption containers, and save 
whichever containers have the lowest priority and fit within the quota. This 
way the user can specify multiple containers at Priority=0 (think a 
replicated-AM or some other critical service for the job) and we will save as 
many of those as it fits in the quota.

I think we are agreeing on max-am-percentage... the final goal is to make sure 
that after preemption the max-am-resource-percent is respected (i.e., no more 
than a certain amount of the queue is dedicated to AMs).

The problem with user-limit-factor goes like this:  
 * Given a queue A of capacity: 10%, max-capacity = 50%, and user-limit-factor 
= 2 (i.e., a single user can go up to 20% of total resources)
 * Only one user is active in this queue and it gets 20% of resources (this 
also require low activity in other queues)
 * The overall cluster capacity is reduced (e.g., a failing rack) or a refresh 
of the queues as reduced this queue capacity 
 * The LeafQueue scheduler keeps skipping the scheduling for this user (since 
he is now over its user-limit-factor) although no other user in the cluster is 
asking for resources
  * If we ever get to this situation with the user holding only AMs the system 
is completely wedged, with the AMs waiting for more containers, and the system 
systematically skipping this user (as he is above its user-limit-factor).
 
If preemption proceeds systematically killing resources *including* AMs, the 
chances of this happening are rather low (the head of the queue is only AMs, 
while the tail contained AMs and other containers), but as we save AMs from 
preemption, this bad corner case is maybe a little more likely to happen. 

What I am trying to affect with my comments is that as we try to evolve 
preemption further, we should look at all the invariants of a queue, and try to 
make sure that our preemption policy can re-establish not only the capacity 
invariant but also all others invariants. The CS relies on those invariants 
heavily, and misbehave if they are violated.  An example of this is YARN-1957, 
where we introduce better handling for max-capacity and zero-size queues.

The changes you are proposing are not creating the problem, just making it 
more likely to happen in practice. A well tuned CS and reasonable load are 
unlikely to trigger this, but we should build for robustness as much as 
possible, since we cannot rely on users to understand this internals and tune 
the CS defensively.

[~acmurthy] any thoughts on this?


 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2048) List all of the containers of an application from the yarn web

2014-05-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996122#comment-13996122
 ] 

Wangda Tan commented on YARN-2048:
--

Thanks [~zjshen] and [~coderplay] for this explanation. Now I can understand 
contexts of them.

 List all of the containers of an application from the yarn web
 --

 Key: YARN-2048
 URL: https://issues.apache.org/jira/browse/YARN-2048
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, webapp
Affects Versions: 2.3.0, 2.4.0, 2.5.0
Reporter: Min Zhou
 Attachments: YARN-2048-trunk-v1.patch


 Currently, Yarn haven't provide a way to list all of the containers of an 
 application from its web. This kind of information is needed by the 
 application user. They can conveniently know how many containers their 
 applications already acquired as well as which nodes those containers were 
 launched on.  They also want to view the logs of each container of an 
 application.
 One approach is maintain a container list in RMAppImpl and expose this info 
 to Application page. I will submit a patch soon



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992718#comment-13992718
 ] 

Hadoop QA commented on YARN-1702:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12643931/apache-yarn-1702.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3719//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3719//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3719//console

This message is automatically generated.

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, 
 apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, apache-yarn-1702.7.patch, 
 apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1981) Nodemanager version is not updated when a node reconnects

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998229#comment-13998229
 ] 

Hudson commented on YARN-1981:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-1981. Nodemanager version is not updated when a node reconnects (Jason 
Lowe via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594358)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


 Nodemanager version is not updated when a node reconnects
 -

 Key: YARN-1981
 URL: https://issues.apache.org/jira/browse/YARN-1981
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1981.patch


 When a nodemanager is quickly restarted and happens to change versions during 
 the restart (e.g.: rolling upgrade scenario) the NM version as reported by 
 the RM is not updated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1976) Tracking url missing http protocol for FAILED application

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998211#comment-13998211
 ] 

Hudson commented on YARN-1976:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-1976. Fix CHANGES.txt for YARN-1976. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594123)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Tracking url missing http protocol for FAILED application
 -

 Key: YARN-1976
 URL: https://issues.apache.org/jira/browse/YARN-1976
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Junping Du
 Fix For: 2.4.1

 Attachments: YARN-1976-v2.patch, YARN-1976.patch


 Run yarn application -list -appStates FAILED,  It does not print http 
 protocol name like FINISHED apps.
 {noformat}
 -bash-4.1$ yarn application -list -appStates FINISHED,FAILED,KILLED
 14/04/15 23:55:07 INFO client.RMProxy: Connecting to ResourceManager at host
 Total number of applications (application-types: [] and states: [FINISHED, 
 FAILED, KILLED]):4
 Application-IdApplication-Name
 Application-Type  User   Queue   State
  Final-State ProgressTracking-URL
 application_1397598467870_0004   Sleep job   
 MAPREDUCEhrt_qa defaultFINISHED   
 SUCCEEDED 100% 
 http://host:19888/jobhistory/job/job_1397598467870_0004
 application_1397598467870_0003   Sleep job   
 MAPREDUCEhrt_qa defaultFINISHED   
 SUCCEEDED 100% 
 http://host:19888/jobhistory/job/job_1397598467870_0003
 application_1397598467870_0002   Sleep job   
 MAPREDUCEhrt_qa default  FAILED   
FAILED 100% 
 host:8088/cluster/app/application_1397598467870_0002
 application_1397598467870_0001  word count   
 MAPREDUCEhrt_qa defaultFINISHED   
 SUCCEEDED 100% 
 http://host:19888/jobhistory/job/job_1397598467870_0001
 {noformat}
 It only prints 'host:8088/cluster/app/application_1397598467870_0002' instead 
 'http://host:8088/cluster/app/application_1397598467870_0002' 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1362) Distinguish between nodemanager shutdown for decommission vs shutdown for restart

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998225#comment-13998225
 ] 

Hudson commented on YARN-1362:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1753 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/])
YARN-1362. Distinguish between nodemanager shutdown for decommission vs 
shutdown for restart. (Contributed by Jason Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594421)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 Distinguish between nodemanager shutdown for decommission vs shutdown for 
 restart
 -

 Key: YARN-1362
 URL: https://issues.apache.org/jira/browse/YARN-1362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.5.0

 Attachments: YARN-1362.patch


 When a nodemanager shuts down it needs to determine if it is likely to be 
 restarted.  If a restart is likely then it needs to preserve container 
 directories, logs, distributed cache entries, etc.  If it is being shutdown 
 more permanently (e.g.: like a decommission) then the nodemanager should 
 cleanup directories and logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests

2014-05-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997750#comment-13997750
 ] 

Sandy Ryza commented on YARN-2027:
--

Including a rack in your request will allow containers to go anywhere on the 
rack, even when relaxLocality is set to false.

From the AMRMClient.ContainerRequest doc: If locality relaxation is disabled, 
then only within the same request, a node and its rack may be specified 
together. This allows for a specific rack with a preference for a specific 
node within that rack.

So try passing in the rack list as null instead of 
List(/default-rack).toArray[String].


 YARN ignores host-specific resource requests
 

 Key: YARN-2027
 URL: https://issues.apache.org/jira/browse/YARN-2027
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.0
 Environment: RHEL 6.1
 YARN 2.4
Reporter: Chris Riccomini

 YARN appears to be ignoring host-level ContainerRequests.
 I am creating a container request with code that pretty closely mirrors the 
 DistributedShell code:
 {code}
   protected def requestContainers(memMb: Int, cpuCores: Int, containers: Int) 
 {
 info(Requesting %d container(s) with %dmb of memory format (containers, 
 memMb))
 val capability = Records.newRecord(classOf[Resource])
 val priority = Records.newRecord(classOf[Priority])
 priority.setPriority(0)
 capability.setMemory(memMb)
 capability.setVirtualCores(cpuCores)
 // Specifying a host in the String[] host parameter here seems to do 
 nothing. Setting relaxLocality to false also doesn't help.
 (0 until containers).foreach(idx = amClient.addContainerRequest(new 
 ContainerRequest(capability, null, null, priority)))
   }
 {code}
 When I run this code with a specific host in the ContainerRequest, YARN does 
 not honor the request. Instead, it puts the container on an arbitrary host. 
 This appears to be true for both the FifoScheduler and the CapacityScheduler.
 Currently, we are running the CapacityScheduler with the following settings:
 {noformat}
 configuration
   property
 nameyarn.scheduler.capacity.maximum-applications/name
 value1/value
 description
   Maximum number of applications that can be pending and running.
 /description
   /property
   property
 nameyarn.scheduler.capacity.maximum-am-resource-percent/name
 value0.1/value
 description
   Maximum percent of resources in the cluster which can be used to run
   application masters i.e. controls number of concurrent running
   applications.
 /description
   /property
   property
 nameyarn.scheduler.capacity.resource-calculator/name
 
 valueorg.apache.hadoop.yarn.util.resource.DefaultResourceCalculator/value
 description
   The ResourceCalculator implementation to be used to compare
   Resources in the scheduler.
   The default i.e. DefaultResourceCalculator only uses Memory while
   DominantResourceCalculator uses dominant-resource to compare
   multi-dimensional resources such as Memory, CPU etc.
 /description
   /property
   property
 nameyarn.scheduler.capacity.root.queues/name
 valuedefault/value
 description
   The queues at the this level (root is the root queue).
 /description
   /property
   property
 nameyarn.scheduler.capacity.root.default.capacity/name
 value100/value
 descriptionSamza queue target capacity./description
   /property
   property
 nameyarn.scheduler.capacity.root.default.user-limit-factor/name
 value1/value
 description
   Default queue user limit a percentage from 0.0 to 1.0.
 /description
   /property
   property
 nameyarn.scheduler.capacity.root.default.maximum-capacity/name
 value100/value
 description
   The maximum capacity of the default queue.
 /description
   /property
   property
 nameyarn.scheduler.capacity.root.default.state/name
 valueRUNNING/value
 description
   The state of the default queue. State can be one of RUNNING or STOPPED.
 /description
   /property
   property
 nameyarn.scheduler.capacity.root.default.acl_submit_applications/name
 value*/value
 description
   The ACL of who can submit jobs to the default queue.
 /description
   /property
   property
 nameyarn.scheduler.capacity.root.default.acl_administer_queue/name
 value*/value
 description
   The ACL of who can administer jobs on the default queue.
 /description
   /property
   property
 nameyarn.scheduler.capacity.node-locality-delay/name
 value40/value
 description
   Number of missed scheduling opportunities after which the 
 CapacityScheduler
   attempts to schedule rack-local containers.
   Typically 

[jira] [Updated] (YARN-1937) Add entity-level access control of the timeline data for owners only

2014-05-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1937:
--

Summary: Add entity-level access control of the timeline data for owners 
only  (was: Access control of per-framework data)

 Add entity-level access control of the timeline data for owners only
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1937.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2059) Extend access control for admin and configured user/group list

2014-05-15 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2059:
-

 Summary: Extend access control for admin and configured user/group 
list
 Key: YARN-2059
 URL: https://issues.apache.org/jira/browse/YARN-2059
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-896) Roll up for long-lived services in YARN

2014-05-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-896:
---

Assignee: (was: Xuan Gong)

 Roll up for long-lived services in YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1104) NMs to support rolling logs of stdout stderr

2014-05-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998397#comment-13998397
 ] 

Xuan Gong commented on YARN-1104:
-

Start to work on it. Will provide a proposal soon.

 NMs to support rolling logs of stdout  stderr
 --

 Key: YARN-1104
 URL: https://issues.apache.org/jira/browse/YARN-1104
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Assignee: Xuan Gong

 Currently NMs stream the stdout and stderr streams of a container to a file. 
 For longer lived processes those files need to be rotated so that the log 
 doesn't overflow



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.

2014-05-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997224#comment-13997224
 ] 

Sandy Ryza commented on YARN-1986:
--

Sorry for being so slow on this.

+1 to the change.  I looked at the code for the fair and capacity schedulers 
and they don't seem to face the same issue.

 After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 --

 Key: YARN-1986
 URL: https://issues.apache.org/jira/browse/YARN-1986
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jon Bringhurst
Assignee: Hong Zhiguo
Priority: Critical
 Attachments: YARN-1986-2.patch, YARN-1986-3.patch, 
 YARN-1986-testcase.patch, YARN-1986.patch


 After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 -After RM was restarted, the job runs without a problem.-
 {noformat}
 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
 NODE_UPDATE to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
   at java.lang.Thread.run(Thread.java:744)
 19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2064) MR job successful but Note: Container killed by the ApplicationMaster.

2014-05-15 Thread dlim (JIRA)
dlim created YARN-2064:
--

 Summary: MR job successful but Note: Container killed by the 
ApplicationMaster.
 Key: YARN-2064
 URL: https://issues.apache.org/jira/browse/YARN-2064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, scheduler
Reporter: dlim


Hi, just a short question for everyone
I got MR job run on YARN, normally for small jobs, it succeeded without any 
note in the URL page.

However, when running long-running job, it ends with successful status but with 
note: Container killed by the ApplicationMaster.

The job is still running and i hesitate to kill it. Anyone know if it is 
actually successful or not ?? I know there is a previous post on this, but the 
answers are not so clear for me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1993) Cross-site scripting vulnerability in TextView.java

2014-05-15 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-1993:
--

Attachment: YARN-1993.patch

For example, how about to use StringEscapeUtils like this patch?

 Cross-site scripting vulnerability in TextView.java
 ---

 Key: YARN-1993
 URL: https://issues.apache.org/jira/browse/YARN-1993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Ted Yu
 Attachments: YARN-1993.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/TextView.java
  , method echo() e.g. :
 {code}
 for (Object s : args) {
   out.print(s);
 }
 {code}
 Printing s to an HTML page allows cross-site scripting, because it was not 
 properly sanitized for context HTML attribute name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.

2014-05-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reassigned YARN-1986:


Assignee: Sandy Ryza  (was: Hong Zhiguo)

 After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 --

 Key: YARN-1986
 URL: https://issues.apache.org/jira/browse/YARN-1986
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jon Bringhurst
Assignee: Sandy Ryza
Priority: Critical
 Attachments: YARN-1986-2.patch, YARN-1986-3.patch, 
 YARN-1986-testcase.patch, YARN-1986.patch


 After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
 -After RM was restarted, the job runs without a problem.-
 {noformat}
 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type 
 NODE_UPDATE to the scheduler
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
   at java.lang.Thread.run(Thread.java:744)
 19:11:13,443  INFO ResourceManager:604 - Exiting, bbye..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2064) MR job successful but Note: Container killed by the ApplicationMaster.

2014-05-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-2064.
---

Resolution: Not a Problem

Closed this.

 MR job successful but Note: Container killed by the ApplicationMaster.
 --

 Key: YARN-2064
 URL: https://issues.apache.org/jira/browse/YARN-2064
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, scheduler
Reporter: dlim

 Hi, just a short question for everyone
 I got MR job run on YARN, normally for small jobs, it succeeded without any 
 note in the URL page.
 However, when running long-running job, it ends with successful status but 
 with note: Container killed by the ApplicationMaster.
 The job is still running and i hesitate to kill it. Anyone know if it is 
 actually successful or not ?? I know there is a previous post on this, but 
 the answers are not so clear for me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1872) TestDistributedShell occasionally fails in trunk

2014-05-15 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998493#comment-13998493
 ] 

Binglin Chang commented on YARN-1872:
-

Hi, testDSShell fails with asser failed, don't know whether it is relevant:

https://builds.apache.org/job/Hadoop-Yarn-trunk/561/consoleText

testDSShell(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 27.557 sec   FAILURE!
java.lang.AssertionError: expected:1 but was:0
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)


Results :

Failed tests: 
  TestDistributedShell.testDSShell:198 expected:1 but was:0

Tests run: 8, Failures: 1, Errors: 0, Skipped: 0
 

 TestDistributedShell occasionally fails in trunk
 

 Key: YARN-1872
 URL: https://issues.apache.org/jira/browse/YARN-1872
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Hong Zhiguo
 Attachments: TestDistributedShell.out, YARN-1872.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/520/console :
 TestDistributedShell#testDSShellWithCustomLogPropertyFile failed and 
 TestDistributedShell#testDSShell timed out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2058) .gitignore should ignore .orig and .rej files

2014-05-15 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2058:
--

 Summary: .gitignore should ignore .orig and .rej files
 Key: YARN-2058
 URL: https://issues.apache.org/jira/browse/YARN-2058
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


.gitignore file should ignore .orig and .rej files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-05-15 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997781#comment-13997781
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

{quote}
 e.g. container_XXX_1000 after epoch 1. 
{quote}

This approach can be compatible change. 
ConverterUtils.toContainerId(containerIdStr) works without any changes if the 
container id with the epoch is under Integer.MAX_VALUE. What's happens if id 
overflows? Maybe container id collision occurs. If we can handle it correctly, 
this approach is simple and good choice. I'll take a moment about this approach.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-373) Allow an AM to reuse the resources allocated to container for a new container

2014-05-15 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved YARN-373.
-

Resolution: Won't Fix

[doing self-clean up of JIRAs]

 Allow an AM to reuse the resources allocated to container for a new container
 -

 Key: YARN-373
 URL: https://issues.apache.org/jira/browse/YARN-373
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 When a container completes, instead the corresponding resources being freed 
 up, it should be possible for the AM to reuse the assigned resources for a 
 new container.
 As part of the reallocation, the AM would notify the RM about partial 
 resources being freed up and the RM would make the necessary corrections in 
 the corresponding node.
 With this functionality, an AM can ensure it gets a container in the same 
 node where previous containers run.
 This will allow getting rid of the ShuffleHandler as a service in the NMs and 
 run it as regular container task of the corresponding AM. In this case, the 
 reallocation would reduce the CPU/MEM obtained for the original container to 
 the what is needed for serving the shuffle. Note that in this example the MR 
 AM would only do this reallocation for one of the many tasks that may have 
 run in a particular node (as a single shuffle task could serve all the map 
 outputs from all map tasks run in that node). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1981) Nodemanager version is not updated when a node reconnects

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998191#comment-13998191
 ] 

Hudson commented on YARN-1981:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1981. Nodemanager version is not updated when a node reconnects (Jason 
Lowe via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594358)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


 Nodemanager version is not updated when a node reconnects
 -

 Key: YARN-1981
 URL: https://issues.apache.org/jira/browse/YARN-1981
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1981.patch


 When a nodemanager is quickly restarted and happens to change versions during 
 the restart (e.g.: rolling upgrade scenario) the NM version as reported by 
 the RM is not updated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1362) Distinguish between nodemanager shutdown for decommission vs shutdown for restart

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998187#comment-13998187
 ] 

Hudson commented on YARN-1362:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1362. Distinguish between nodemanager shutdown for decommission vs 
shutdown for restart. (Contributed by Jason Lowe) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594421)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 Distinguish between nodemanager shutdown for decommission vs shutdown for 
 restart
 -

 Key: YARN-1362
 URL: https://issues.apache.org/jira/browse/YARN-1362
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.5.0

 Attachments: YARN-1362.patch


 When a nodemanager shuts down it needs to determine if it is likely to be 
 restarted.  If a restart is likely then it needs to preserve container 
 directories, logs, distributed cache entries, etc.  If it is being shutdown 
 more permanently (e.g.: like a decommission) then the nodemanager should 
 cleanup directories and logs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2011) Fix typo and warning in TestLeafQueue

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998188#comment-13998188
 ] 

Hudson commented on YARN-2011:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-2011. Fix typo and warning in TestLeafQueue (Contributed by Chen He) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593804)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 Fix typo and warning in TestLeafQueue
 -

 Key: YARN-2011
 URL: https://issues.apache.org/jira/browse/YARN-2011
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.0
Reporter: Chen He
Assignee: Chen He
Priority: Trivial
 Fix For: 2.5.0

 Attachments: YARN-2011-v2.patch, YARN-2011.patch


 a.assignContainers(clusterResource, node_0);
 assertEquals(2*GB, a.getUsedResources().getMemory());
 assertEquals(2*GB, app_0.getCurrentConsumption().getMemory());
 assertEquals(0*GB, app_1.getCurrentConsumption().getMemory());
 assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G
 assertEquals(0*GB, app_0.getHeadroom().getMemory()); // User limit = 2G
 // Again one to user_0 since he hasn't exceeded user limit yet
 a.assignContainers(clusterResource, node_0);
 assertEquals(3*GB, a.getUsedResources().getMemory());
 assertEquals(2*GB, app_0.getCurrentConsumption().getMemory());
 assertEquals(1*GB, app_1.getCurrentConsumption().getMemory());
 assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G
 assertEquals(0*GB, app_0.getHeadroom().getMemory()); // 3G - 2G



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1982) Rename the daemon name to timelineserver

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998190#comment-13998190
 ] 

Hudson commented on YARN-1982:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1982. Renamed the daemon name to be TimelineServer instead of History 
Server and deprecated the old usage. Contributed by Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1593748)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh


 Rename the daemon name to timelineserver
 

 Key: YARN-1982
 URL: https://issues.apache.org/jira/browse/YARN-1982
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
  Labels: cli
 Fix For: 2.5.0

 Attachments: YARN-1982.1.patch


 Nowadays, it's confusing that we call the new component timeline server, but 
 we use
 {code}
 yarn historyserver
 yarn-daemon.sh start historyserver
 {code}
 to start the daemon.
 Before the confusion keeps being propagated, we'd better to modify command 
 line asap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-05-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998189#comment-13998189
 ] 

Hudson commented on YARN-1861:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1779 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1779/])
YARN-1861. Fixed a bug in RM to reset leader-election on fencing that was 
causing both RMs to be stuck in standby mode when automatic failover is 
enabled. Contributed by Karthik Kambatla and Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1594356)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java


 Both RM stuck in standby mode when automatic failover is enabled
 

 Key: YARN-1861
 URL: https://issues.apache.org/jira/browse/YARN-1861
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.4.1

 Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
 YARN-1861.5.patch, YARN-1861.7.patch, yarn-1861-1.patch, yarn-1861-6.patch


 In our HA tests we noticed that the tests got stuck because both RM's got 
 into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995898#comment-13995898
 ] 

Sandy Ryza commented on YARN-2017:
--

Thanks for working on this Jian.  A couple questions:

Why take out the header comment in SchedulerNode?

Can we use generics to avoid all the casting (and findbugs)?  I.e. class 
CapacityScheduler extends AbstractYarnSchedulerFiCaSchedulerApp, 
FiCaSchedulerNode?

 Merge some of the common lib code in schedulers
 ---

 Key: YARN-2017
 URL: https://issues.apache.org/jira/browse/YARN-2017
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2017.1.patch


 A bunch of same code is repeated among schedulers, e.g:  between 
 FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
 common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever

2014-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995746#comment-13995746
 ] 

Hadoop QA commented on YARN-2049:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644497/YARN-2049.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3738//console

This message is automatically generated.

 Delegation token stuff for the timeline sever
 -

 Key: YARN-2049
 URL: https://issues.apache.org/jira/browse/YARN-2049
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2049.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2017:
--

Attachment: YARN-2017.2.patch

 Merge some of the common lib code in schedulers
 ---

 Key: YARN-2017
 URL: https://issues.apache.org/jira/browse/YARN-2017
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2017.1.patch, YARN-2017.2.patch


 A bunch of same code is repeated among schedulers, e.g:  between 
 FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
 common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-352) Inconsistent picture of how a container was killed when querying RM and NM in case of preemption

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-352:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-45

 Inconsistent picture of how a container was killed when querying RM and NM in 
 case of preemption
 

 Key: YARN-352
 URL: https://issues.apache.org/jira/browse/YARN-352
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah

 When the RM preempts a container, it records the exit status as -100. 
 However, at the NM, it registers the preempted container's exit status as 
 simply killed by an external via SIGTERM or SIGKILL.
 When the AM queries the RM and NM for the same container's status, it will 
 get 2 different values. 
 When killing a container, the exit reason should likely be more defined via 
 an exit status code for the AM to act on in addition to providing of the 
 diagnostic messages that can contain more detailed information ( though 
 probably not programmatically interpret-able by the AM ). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1751) Improve MiniYarnCluster for log aggregation testing

2014-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995739#comment-13995739
 ] 

Hadoop QA commented on YARN-1751:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644486/YARN-1751.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3736//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3736//console

This message is automatically generated.

 Improve MiniYarnCluster for log aggregation testing
 ---

 Key: YARN-1751
 URL: https://issues.apache.org/jira/browse/YARN-1751
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-1751-trunk.patch, YARN-1751.patch


 MiniYarnCluster specifies individual remote log aggregation root dir for each 
 NM. Test code that uses MiniYarnCluster won't be able to get the value of log 
 aggregation root dir. The following code isn't necessary in MiniYarnCluster.
   File remoteLogDir =
   new File(testWorkDir, MiniYARNCluster.this.getName()
   + -remoteLogDir-nm- + index);
   remoteLogDir.mkdir();
   config.set(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
   remoteLogDir.getAbsolutePath());
 In LogCLIHelpers.java, dumpAllContainersLogs should pass its conf object to 
 FileContext.getFileContext() call.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1937) Access control of per-framework data

2014-05-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1937:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1935

 Access control of per-framework data
 

 Key: YARN-1937
 URL: https://issues.apache.org/jira/browse/YARN-1937
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext

2014-05-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996685#comment-13996685
 ] 

Jason Lowe commented on YARN-2050:
--

bq. remoteAppLogDir.toUri().getScheme() returns null and 
AbstractFileSystem.createFileSystem doesn't like it if dumpAllContainersLogs 
calls FileContext.getFileContext(remoteAppLogDir.toUri())

Argh right, I forgot that FileContext is less-than-helpful in this regard.   It 
needs to be something like this:

{code}
  Path qualifiedLogDir = 
FileContext.getFileContext(getConf()).makeQualified(remoteAppLogDir);
  FileContext fc = FileContext.getFileContext(qualifiedLogDir.toUri(), 
getConf());
  nodeFiles = fc.listStatus(qualifiedLogDir);
{code}

This allows the code to handle cases where the remote log dir has been 
configured to be a different filesystem than the default filesystem.

 Fix LogCLIHelpers to create the correct FileContext
 ---

 Key: YARN-2050
 URL: https://issues.apache.org/jira/browse/YARN-2050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: YARN-2050.patch


 LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus 
 the FileContext created isn't necessarily the FileContext for remote log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2048) List all of the containers of an application from the yarn web

2014-05-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996096#comment-13996096
 ] 

Zhijie Shen commented on YARN-2048:
---

bq. Seems Zhijie Shen's patch fetch containers from ApplicationContext.

Currently, the history web UI fetches data (app/attempt/container) from 
ApplicationContext, while RM web UI does it from RM context. My ultimate goal 
to uniform both history and rm web UI, and uniform the data source with the RPC 
protocol.

 List all of the containers of an application from the yarn web
 --

 Key: YARN-2048
 URL: https://issues.apache.org/jira/browse/YARN-2048
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, webapp
Affects Versions: 2.3.0, 2.4.0, 2.5.0
Reporter: Min Zhou
 Attachments: YARN-2048-trunk-v1.patch


 Currently, Yarn haven't provide a way to list all of the containers of an 
 application from its web. This kind of information is needed by the 
 application user. They can conveniently know how many containers their 
 applications already acquired as well as which nodes those containers were 
 launched on.  They also want to view the logs of each container of an 
 application.
 One approach is maintain a container list in RMAppImpl and expose this info 
 to Application page. I will submit a patch soon



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover

2014-05-15 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996348#comment-13996348
 ] 

Tsuyoshi OZAWA commented on YARN-2001:
--

Created YARN-2052 for tracking container id discussion to make it easier to 
track. 

 Threshold for RM to accept requests from AM after failover
 --

 Key: YARN-2001
 URL: https://issues.apache.org/jira/browse/YARN-2001
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 After failover, RM may require a certain threshold to determine whether it’s 
 safe to make scheduling decisions and start accepting new container requests 
 from AMs. The threshold could be a certain amount of nodes. i.e. RM waits 
 until a certain amount of nodes joining before accepting new container 
 requests.  Or it could simply be a timeout, only after the timeout RM accepts 
 new requests. 
 NMs joined after the threshold can be treated as new NMs and instructed to 
 kill all its containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-182) Unnecessary Container killed by the ApplicationMaster message for successful containers

2014-05-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995087#comment-13995087
 ] 

Jason Lowe commented on YARN-182:
-

I don't believe this is related to YARN-903, rather it seems more likely to be 
related to MAPREDUCE-5465.  The MapReduce ApplicationMaster kills tasks as soon 
as they report success via the umbilical connection, and sometimes that kill 
arrives before the task exits on its own.  In those cases the containers will 
be marked as killed by the ApplicationMaster.

 Unnecessary Container killed by the ApplicationMaster message for 
 successful containers
 -

 Key: YARN-182
 URL: https://issues.apache.org/jira/browse/YARN-182
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.1-alpha
Reporter: zhengqiu cai
Assignee: Omkar Vinit Joshi
  Labels: hadoop, usability
 Attachments: Log.txt


 I was running wordcount and the resourcemanager web UI shown the status as 
 FINISHED SUCCEEDED, but the log shown Container killed by the 
 ApplicationMaster



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2048) List all of the containers of an application from the yarn web

2014-05-15 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou resolved YARN-2048.


Resolution: Duplicate

Duplicate with YARN-1809

 List all of the containers of an application from the yarn web
 --

 Key: YARN-2048
 URL: https://issues.apache.org/jira/browse/YARN-2048
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, webapp
Affects Versions: 2.3.0, 2.4.0, 2.5.0
Reporter: Min Zhou
 Attachments: YARN-2048-trunk-v1.patch


 Currently, Yarn haven't provide a way to list all of the containers of an 
 application from its web. This kind of information is needed by the 
 application user. They can conveniently know how many containers their 
 applications already acquired as well as which nodes those containers were 
 launched on.  They also want to view the logs of each container of an 
 application.
 One approach is maintain a container list in RMAppImpl and expose this info 
 to Application page. I will submit a patch soon



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1927) Preemption message shouldn’t be created multiple times for same container-id in ProportionalCapacityPreemptionPolicy

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1927:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-45

 Preemption message shouldn’t be created multiple times for same container-id 
 in ProportionalCapacityPreemptionPolicy
 

 Key: YARN-1927
 URL: https://issues.apache.org/jira/browse/YARN-1927
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Minor
 Attachments: YARN-1927.patch


 Currently, after each editSchedule() called, preemption message will be 
 created and sent to scheduler. ProportionalCapacityPreemptionPolicy should 
 only send preemption message once for each container.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2048) List all of the containers of an application from the yarn web

2014-05-15 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995553#comment-13995553
 ] 

Tsuyoshi OZAWA commented on YARN-2048:
--

+1 for the idea. Looking forward.

 List all of the containers of an application from the yarn web
 --

 Key: YARN-2048
 URL: https://issues.apache.org/jira/browse/YARN-2048
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, webapp
Reporter: Min Zhou

 Currently, Yarn haven't provide a way to list all of the containers of an 
 application from its web. This kind of information is needed by the 
 application user. They can conveniently know how many containers their 
 applications already acquired as well as which nodes those containers were 
 launched on.  They also want to view the logs of each container of an 
 application.
 One approach is maintain a container list in RMAppImpl and expose this info 
 to Application page. I will submit a patch soon



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-2033:
-

 Summary: Investigate merging generic-history into the Timeline 
Store
 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Having two different stores isn't amicable to generic insights on what's 
happening with applications. This is to investigate porting generic-history 
into the Timeline Store.

One goal is to try and retain most of the client side interfaces as close to 
what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2016) Yarn getApplicationRequest start time range is not honored

2014-05-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2016:
-

Attachment: YARN-2016.patch

Fix the issues in PBImpl and deliver a test to verify it works now.

 Yarn getApplicationRequest start time range is not honored
 --

 Key: YARN-2016
 URL: https://issues.apache.org/jira/browse/YARN-2016
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Venkat Ranganathan
Assignee: Junping Du
 Attachments: YARN-2016.patch, YarnTest.java


 When we query for the previous applications by creating an instance of 
 GetApplicationsRequest and setting the start time range and application tag, 
 we see that the start range provided is not honored and all applications with 
 the tag are returned
 Attaching a reproducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >