date:20140515

[jira] [Commented] (YARN-2053) Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

2014-05-15 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998324#comment-13998324
 ] 

Wangda Tan commented on YARN-2053:
--

Sure, I'll do that, thanks for review!

 Slider AM fails to restart: NPE in 
 RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts
 

 Key: YARN-2053
 URL: https://issues.apache.org/jira/browse/YARN-2053
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2053.patch, 
 yarn-yarn-nodemanager-c6403.ambari.apache.org.log.bak, 
 yarn-yarn-resourcemanager-c6403.ambari.apache.org.log.bak


 Slider AppMaster restart fails with the following:
 {code}
 org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval

2014-05-15 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998337#comment-13998337
 ] 

Sandy Ryza commented on YARN-2054:
--

+1

 Poor defaults for YARN ZK configs for retries and retry-inteval
 ---

 Key: YARN-2054
 URL: https://issues.apache.org/jira/browse/YARN-2054
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2054-1.patch


 Currenly, we have the following default values:
 # yarn.resourcemanager.zk-num-retries - 500
 # yarn.resourcemanager.zk-retry-interval-ms - 2000
 This leads to a cumulate 1000 seconds before the RM gives up trying to 
 connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1957) ProportionalCapacitPreemptionPolicy handling of corner cases...

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1957:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-45

 ProportionalCapacitPreemptionPolicy handling of corner cases...
 ---

 Key: YARN-1957
 URL: https://issues.apache.org/jira/browse/YARN-1957
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler, preemption
 Attachments: YARN-1957.patch, YARN-1957.patch, YARN-1957_test.patch


 The current version of ProportionalCapacityPreemptionPolicy should be 
 improved to deal with the following two scenarios:
 1) when rebalancing over-capacity allocations, it potentially preempts 
 without considering the maxCapacity constraints of a queue (i.e., preempting 
 possibly more than strictly necessary)
 2) a zero capacity queue is preempted even if there is no demand (coherent 
 with old use of zero-capacity to disabled queues)
 The proposed patch fixes both issues, and introduce few new test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers

2014-05-15 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997192#comment-13997192
 ] 

Wangda Tan commented on YARN-2017:
--

bq. On a second thought, user might pass in a resource request with null 
capability. I would prefer not changing it. In fact, we can add many other null 
checks in many places. Changed the patch back.
I think null capability should be checked by ApplicationMasterService and throw 
exception before passed in. So do or don't do null pointer checking should be 
fine :)

 Merge some of the common lib code in schedulers
 ---

 Key: YARN-2017
 URL: https://issues.apache.org/jira/browse/YARN-2017
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2017.1.patch, YARN-2017.2.patch, YARN-2017.3.patch


 A bunch of same code is repeated among schedulers, e.g:  between 
 FicaSchedulerNode and FSSchedulerNode. It's good to merge and share them in a 
 common base.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval

2014-05-15 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998423#comment-13998423
 ] 

Vinod Kumar Vavilapalli commented on YARN-2054:
---

Sounds related to YARN-1878, though not exactly.

If we want these configs to match up with yarn.resourcemanager.zk-timeout-ms 
and (as YARN-1878 is trying) if that can change, we need to somehow make them 
linked dynamically?

 Poor defaults for YARN ZK configs for retries and retry-inteval
 ---

 Key: YARN-2054
 URL: https://issues.apache.org/jira/browse/YARN-2054
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2054-1.patch


 Currenly, we have the following default values:
 # yarn.resourcemanager.zk-num-retries - 500
 # yarn.resourcemanager.zk-retry-interval-ms - 2000
 This leads to a cumulate 1000 seconds before the RM gives up trying to 
 connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1916) Leveldb timeline store applies secondary filters incorrectly

2014-05-15 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1916:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

 Leveldb timeline store applies secondary filters incorrectly
 

 Key: YARN-1916
 URL: https://issues.apache.org/jira/browse/YARN-1916
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1916.1.patch


 When applying a secondary filter (fieldname:fieldvalue) in a get entities 
 query, LeveldbTimelineStore retrieves entities that do not have the specified 
 fieldname, in addition to correctly retrieving entities that have the 
 fieldname with the specified fieldvalue.  It should not return entities that 
 do not have the fieldname.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-05-15 Thread Sunil G (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995057#comment-13995057
]

Sunil G commented on YARN-2022:
---

Thank you very much Carlo for the review.
As per your concern about AM Container priority, I am using a static final
variable named AM_CONTAINER_PRIORITY from RMAppAttemptImpl to check whether a
container is AM or not.
As per my code review, this variable is not been set by user [RM only uses this
to create an AM container Resource Request]. Hence there is no much problem in
using the same.

Secondly for the corner cases, I agree with your point. In a specific corner
case it is possible that 100% AM can take over a queue.
1. maximum-am-resource-percent is in cluster level and we can get maximum
runnable applications. Actual total running applications count can also be
fetched from all leaf queues.
With these two, a checkpoint can be derived as you have mentioned.
2. user-limit-factor will set a user limit quota among total resources for each
user. If preemption has to be done among applications, currently only
application timestamp is considered [reverse order].
So how this factor can help in giving a checkpoint for saving AM. Could you
please share your thoughts on this point.

I will work on defining checkpoint for saving AM and will update. Meanwhile
please check whether my explanation is in-line with you thoughts. Thank you.

Preempting an Application Master container can be kept as least priority when
multiple applications are marked for preemption by
ProportionalCapacityPreemptionPolicy
-

Key: YARN-2022
URL: https://issues.apache.org/jira/browse/YARN-2022
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
Attachments: Yarn-2022.1.patch

Cluster Size = 16GB [2NM's]
Queue A Capacity = 50%
Queue B Capacity = 50%
Consider there are 3 applications running in Queue A which has taken the full
cluster capacity.
J1 = 2GB AM + 1GB * 4 Maps
J2 = 2GB AM + 1GB * 4 Maps
J3 = 2GB AM + 1GB * 2 Maps
Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
Currently in this scenario, Jobs J3 will get killed including its AM.
It is better if AM can be given least priority among multiple applications.
In this same scenario, map tasks from J3 and J2 can be preempted.
Later when cluster is free, maps can be allocated to these Jobs.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-15 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164
]

Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:08 AM:
---

[~jianhe], I understand the patch is taking a different approach, which is
based on the work Anubhav started. Instead hijacking the JIRA, the correct way
should have been proposing -to the assignee/author of the original patch- the
changes and offering to contribute/breakdown tasks. Please do so next time.

was (Author: tucu00):
[~ jianhe], I understand the patch is taking a different approach, which is
based on the work Anubhav started. Instead hijacking the JIRA, the correct way
should have been proposing -to the assignee/author of the original patch- the
changes and offering to contribute/breakdown tasks. Please do so next time.

Common work to re-populate containers’ state into scheduler
---

Key: YARN-1368
URL: https://issues.apache.org/jira/browse/YARN-1368
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
Attachments: YARN-1368.1.patch, YARN-1368.2.patch,
YARN-1368.combined.001.patch, YARN-1368.preliminary.patch

YARN-1367 adds support for the NM to tell the RM about all currently running
containers upon registration. The RM needs to send this information to the
schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover
the current allocation state of the cluster.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 >

1 - 100 of 116 matches

Mail list logo