[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2017-06-07 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042057#comment-16042057 ] Hong Zhiguo commented on YARN-4024: --- [~maobaolong], this depends on the probability that

[jira] [Comment Edited] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2017-06-06 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040075#comment-16040075 ] Hong Zhiguo edited comment on YARN-4024 at 6/7/17 3:23 AM: --- [~mao

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2017-06-06 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040075#comment-16040075 ] Hong Zhiguo commented on YARN-4024: --- [~maobaolong], we don't turn on log-aggregation to a

[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929330#comment-15929330 ] Hong Zhiguo commented on YARN-6319: --- [~haibochen], the post-callback will not linearize c

[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-15 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927351#comment-15927351 ] Hong Zhiguo commented on YARN-6319: --- [~haibochen], the CONTAINER_RESOURCES_CLEANEDUP even

[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925421#comment-15925421 ] Hong Zhiguo commented on YARN-6319: --- [~haibochen], thanks for your comments. I found solu

[jira] [Comment Edited] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923890#comment-15923890 ] Hong Zhiguo edited comment on YARN-6319 at 3/14/17 9:50 AM: One

[jira] [Comment Edited] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923890#comment-15923890 ] Hong Zhiguo edited comment on YARN-6319 at 3/14/17 9:49 AM: One

[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923890#comment-15923890 ] Hong Zhiguo commented on YARN-6319: --- One "serialize" solution: Add a post-callback to Fil

[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-14 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923881#comment-15923881 ] Hong Zhiguo commented on YARN-6319: --- The "app dir cleanup" is triggered in ApplicationIm

[jira] [Updated] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-13 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-6319: -- Description: Last container (on one node) of one app complete |--> triggers async deletion of con

[jira] [Commented] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-12 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906816#comment-15906816 ] Hong Zhiguo commented on YARN-6319: --- The race condition could be reproduced by below scri

[jira] [Created] (YARN-6319) race condition between deleting app dir and deleting container dir

2017-03-10 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-6319: - Summary: race condition between deleting app dir and deleting container dir Key: YARN-6319 URL: https://issues.apache.org/jira/browse/YARN-6319 Project: Hadoop YARN

[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2016-04-14 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242280#comment-15242280 ] Hong Zhiguo commented on YARN-2306: --- The patch is available, do you have any comments? >

[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-29 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217334#comment-15217334 ] Hong Zhiguo commented on YARN-4002: --- Including return statement into readlock critical se

[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-20 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Attachment: YARN-4002-rwlock-v2.patch Uploaded YARN-4002-rwlock-v2.patch for an improvement: make the rea

[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-01 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174795#comment-15174795 ] Hong Zhiguo commented on YARN-4002: --- Hi, [~rohithsharma], thanks for the refinement. But

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-12-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075626#comment-15075626 ] Hong Zhiguo commented on YARN-4024: --- Thanks for your good point. Yes I can do it. Should

[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-12-03 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Attachment: YARN-4002-rwlock.patch YARN-4002-lockless-read.patch 2 patch for the 2 propos

[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-12-03 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039597#comment-15039597 ] Hong Zhiguo commented on YARN-4002: --- I'm working on it. I've proposed 2 different solutio

[jira] [Updated] (YARN-4181) node blacklist for AM launching

2015-09-18 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4181: -- Description: In some cases, a node goes problematic and most launching containers fail on this node, as

[jira] [Created] (YARN-4181) node blacklist for AM launching

2015-09-18 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4181: - Summary: node blacklist for AM launching Key: YARN-4181 URL: https://issues.apache.org/jira/browse/YARN-4181 Project: Hadoop YARN Issue Type: Bug Compone

[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726780#comment-14726780 ] Hong Zhiguo commented on YARN-4104: --- For better human readability, it's plain text. {code

[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726697#comment-14726697 ] Hong Zhiguo commented on YARN-4104: --- It only works for fair scheduler at this moment beca

[jira] [Updated] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4104: -- Description: We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We g

[jira] [Updated] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4104: -- Description: We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We g

[jira] [Created] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2015-09-01 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4104: - Summary: dryrun of schedule for diagnostic and tenant's complain Key: YARN-4104 URL: https://issues.apache.org/jira/browse/YARN-4104 Project: Hadoop YARN Issue Typ

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-29 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721144#comment-14721144 ] Hong Zhiguo commented on YARN-4024: --- Why jenkins doesn't run against the latest patch? >

[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-23 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v7.patch Thanks for your comments, [~adhoot], updated the patch. > YARN RM should

[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-20 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v6.patch the findbugs warning is about "unchecked rawtypes" in AMLivelinessMonitor.

[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-19 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v5.patch Thanks for your comments, [~sunilg] and [~leftnoteasy]. I updated patch v5

[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-18 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v4.patch Thanks for your comments, [~leftnoteasy]. I didn't notice there's already

[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-18 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-draft-v3.patch YARN-4024-draft-v3.patch: fix the checkstyle warning and testcase fa

[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-17 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-draft-v2.patch updated the patch with flushing when node state is transiting betwee

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-17 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699285#comment-14699285 ] Hong Zhiguo commented on YARN-4024: --- In this patch, both positive and negative lookup res

[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-17 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-draft.patch Add an configuration option "yarn.resourcemanager.node-ip-cache.expiry

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698929#comment-14698929 ] Hong Zhiguo commented on YARN-4024: --- Please ignore the last sentence "a better way is to

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698927#comment-14698927 ] Hong Zhiguo commented on YARN-4024: --- That's a good reason to have this cache. [~leftnotea

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-13 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695011#comment-14695011 ] Hong Zhiguo commented on YARN-4024: --- There's DNS cache in InetAddress. What's the benefit

[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-09 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679444#comment-14679444 ] Hong Zhiguo commented on YARN-4024: --- We've did this one year ago in our 5k+ cluster. Can

[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor

2015-08-04 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4018: -- Attachment: YARN-4018.patch > correct docker image name is rejected by DockerContainerExecutor >

[jira] [Updated] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor

2015-08-04 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4018: -- Description: For example: "www.dockerbase.net/library/mongo" "www.dockerbase.net:5000/library/mongo:lates

[jira] [Created] (YARN-4018) correct docker image name is rejected by DockerContainerExecutor

2015-08-04 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4018: - Summary: correct docker image name is rejected by DockerContainerExecutor Key: YARN-4018 URL: https://issues.apache.org/jira/browse/YARN-4018 Project: Hadoop YARN

[jira] [Updated] (YARN-4016) docker container is still running when app is killed

2015-08-04 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4016: -- Description: The docker_container_executor_session.sh is generated like below: {code} ### get the pid of

[jira] [Created] (YARN-4016) docker container is still running when app is killed

2015-08-04 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4016: - Summary: docker container is still running when app is killed Key: YARN-4016 URL: https://issues.apache.org/jira/browse/YARN-4016 Project: Hadoop YARN Issue Type:

[jira] [Commented] (YARN-3965) Add startup timestamp to nodemanager UI

2015-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648666#comment-14648666 ] Hong Zhiguo commented on YARN-3965: --- Hi, [~jlowe], version 4 of the patch is uploaded wit

[jira] [Updated] (YARN-3965) Add startup timestamp to nodemanager UI

2015-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: -- Attachment: YARN-3965-4.patch > Add startup timestamp to nodemanager UI > ---

[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Description: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method

[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Description: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method

[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Description: We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By design the method

[jira] [Created] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2015-07-30 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4002: - Summary: make ResourceTrackerService.nodeHeartbeat more concurrent Key: YARN-4002 URL: https://issues.apache.org/jira/browse/YARN-4002 Project: Hadoop YARN Issue T

[jira] [Created] (YARN-4001) normalizeHostName takes too much of execution time

2015-07-30 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-4001: - Summary: normalizeHostName takes too much of execution time Key: YARN-4001 URL: https://issues.apache.org/jira/browse/YARN-4001 Project: Hadoop YARN Issue Type: Im

[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager

2015-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647709#comment-14647709 ] Hong Zhiguo commented on YARN-3965: --- made it private with Getter. Hi, [~zxu], [~jlowe], c

[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager

2015-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: -- Attachment: YARN-3965-3.patch > Add starup timestamp for nodemanager > --

[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager

2015-07-25 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641473#comment-14641473 ] Hong Zhiguo commented on YARN-3965: --- Hi, [~zxu], thanks for your comments. Here comes my

[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager

2015-07-24 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: -- Attachment: YARN-3965-2.patch The first patch breaks TestNMWebServices.verifyNodeInfo. Corrected in this

[jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager

2015-07-24 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: -- Attachment: YARN-3965.patch > Add starup timestamp for nodemanager >

[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager

2015-07-24 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640195#comment-14640195 ] Hong Zhiguo commented on YARN-3965: --- The polling doesn't need to happen frequently. Only

[jira] [Assigned] (YARN-3965) Add starup timestamp for nodemanager

2015-07-23 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo reassigned YARN-3965: - Assignee: Hong Zhiguo > Add starup timestamp for nodemanager > ---

[jira] [Created] (YARN-3965) Add starup timestamp for nodemanager

2015-07-23 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-3965: - Summary: Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement

[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2015-07-23 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638610#comment-14638610 ] Hong Zhiguo commented on YARN-2545: --- RMAppEventType#ATTEMPT_FAILED is not suitable becaus

[jira] [Resolved] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo resolved YARN-1974. --- Resolution: Not A Problem > add args for DistributedShell to specify a set of nodes on which the tasks

[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630742#comment-14630742 ] Hong Zhiguo commented on YARN-2306: --- Updated the patch. I ran testReservationMetrics seve

[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630741#comment-14630741 ] Hong Zhiguo commented on YARN-2306: --- I checked the code of tearDown and it shows someone

[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: YARN-2306-3.patch > leak of reservation metrics (fair scheduler) > --

[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: YARN-2306.patch-3 > leak of reservation metrics (fair scheduler) > --

[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: (was: YARN-2306.patch-3) > leak of reservation metrics (fair scheduler) > ---

[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630694#comment-14630694 ] Hong Zhiguo commented on YARN-2306: --- hi, [~rchiang], do you mean running the unit test in

[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630688#comment-14630688 ] Hong Zhiguo commented on YARN-2768: --- [~kasha], could you please review the patch? > opti

[jira] [Updated] (YARN-3897) "Too many links" in NM log dir

2015-07-16 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3897: -- Description: Users need to left container logs more than one day. On some nodes of our busy cluster, the

[jira] [Commented] (YARN-3897) "Too many links" in NM log dir

2015-07-08 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618365#comment-14618365 ] Hong Zhiguo commented on YARN-3897: --- One solution is to have an extra layer of dirs as th

[jira] [Updated] (YARN-3897) "Too many links" in NM log dir

2015-07-08 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3897: -- Description: Users need to left container logs more than one day. On some nodes of our busy cluster, the

[jira] [Created] (YARN-3897) "Too many links" in NM log dir

2015-07-08 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-3897: - Summary: "Too many links" in NM log dir Key: YARN-3897 URL: https://issues.apache.org/jira/browse/YARN-3897 Project: Hadoop YARN Issue Type: Improvement

[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2015-06-10 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581349#comment-14581349 ] Hong Zhiguo commented on YARN-2768: --- [~kasha], the excution time displayed in the profili

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560748#comment-14560748 ] Hong Zhiguo commented on YARN-3678: --- the event sequence: call "SEND SIGTERM" -> pid rec

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560663#comment-14560663 ] Hong Zhiguo commented on YARN-3678: --- First, "stop container" happens frequently. Second,

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2015-05-27 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560578#comment-14560578 ] Hong Zhiguo commented on YARN-3678: --- We met same issue on our production cluster last yea

[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2015-04-07 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482891#comment-14482891 ] Hong Zhiguo commented on YARN-3102: --- I met the same problem. Hi, [~Naganarasimha], can I

[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Attachment: YARN-2768.patch Avoid the clone by adding a ternary operator Resources.multiplyAndAddTo. Afte

[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Description: See the attached picture of profiling result. The clone of Resource object within Resources

[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Attachment: profiling_FairScheduler_update.png > optimize FSAppAttempt.updateDemand by avoid clone of Res

[jira] [Created] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2768: - Summary: optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-

[jira] [Updated] (YARN-2761) potential race condition in SchedulingPolicy

2014-10-29 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2761: -- Attachment: YARN-2761.patch > potential race condition in SchedulingPolicy >

[jira] [Created] (YARN-2761) potential race condition in SchedulingPolicy

2014-10-27 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2761: - Summary: potential race condition in SchedulingPolicy Key: YARN-2761 URL: https://issues.apache.org/jira/browse/YARN-2761 Project: Hadoop YARN Issue Type: Bug

[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2014-10-01 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154946#comment-14154946 ] Hong Zhiguo commented on YARN-2545: --- How about the state of appAttempt? should it finally

[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2014-09-29 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152804#comment-14152804 ] Hong Zhiguo commented on YARN-2545: --- [~leftnoteasy], [~jianhe], [~ozawa], please have a l

[jira] [Created] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED

2014-09-12 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2545: - Summary: RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED Key: YARN-2545 URL: https://issues.apache.org/jira/browse/YARN-2545 Project: Hadoo

[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2014-09-02 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: YARN-2306-2.patch updated the patch with only the new unit test, since it seems this bug is f

[jira] [Commented] (YARN-1801) NPE in public localizer

2014-08-21 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106555#comment-14106555 ] Hong Zhiguo commented on YARN-1801: --- I think YARN-1575 already fixed this NPE. We could m

[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running

2014-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1489 > Wrong NMToken is issued when NM preserving

[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restarts with containers running

2014-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Summary: Wrong NMToken is issued when NM preserving restarts with containers running (was: Wrong NMToke

[jira] [Updated] (YARN-2371) Wrong NMToken is issued when NM preserving restart with containers running

2014-07-30 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2371: -- Attachment: YARN-2371.patch > Wrong NMToken is issued when NM preserving restart with containers running

[jira] [Created] (YARN-2371) Wrong NMToken is issued when NM preserving restart with containers running

2014-07-29 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2371: - Summary: Wrong NMToken is issued when NM preserving restart with containers running Key: YARN-2371 URL: https://issues.apache.org/jira/browse/YARN-2371 Project: Hadoop YARN

[jira] [Updated] (YARN-2323) FairShareComparator creates too much Resource object

2014-07-20 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2323: -- Attachment: YARN-2323-2.patch patch revised according to [~sandyr]'s comments. > FairShareComparator cr

[jira] [Updated] (YARN-2323) FairShareComparator creates too much Resource object

2014-07-19 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2323: -- Attachment: YARN-2323.patch > FairShareComparator creates too much Resource object > ---

[jira] [Created] (YARN-2323) FairShareComparator creates too much Resource object

2014-07-19 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2323: - Summary: FairShareComparator creates too much Resource object Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type:

[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2014-07-17 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Attachment: YARN-2306.patch > leak of reservation metrics (fair scheduler) > ---

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064738#comment-14064738 ] Hong Zhiguo commented on YARN-2305: --- OK > When a container is in reserved state then tot

[jira] [Updated] (YARN-2306) leak of reservation metrics (fair scheduler)

2014-07-17 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2306: -- Summary: leak of reservation metrics (fair scheduler) (was: leak of reservation metrics) > leak of res

[jira] [Assigned] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread Hong Zhiguo (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo reassigned YARN-2305: - Assignee: Hong Zhiguo > When a container is in reserved state then total cluster memory is display

  1   2   >