[jira] [Updated] (YARN-8695) ERROR: Container complete event for unknown container id

2018-08-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8695: - Priority: Minor (was: Major) Downgrading the priority since this has no impact on functionality. Does "c

[jira] [Commented] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-08-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584463#comment-16584463 ] Jason Lowe commented on YARN-8242: -- Thanks for updating the patch! +1 lgtm pending Jenki

[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-08-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584453#comment-16584453 ] Jason Lowe commented on YARN-7018: -- Thanks for the POC patch! At a very high level it's

[jira] [Commented] (YARN-8640) Restore previous state in container-executor after failure

2018-08-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584338#comment-16584338 ] Jason Lowe commented on YARN-8640: -- Thanks for updating the patches! +1 for the branch-2

[jira] [Commented] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-08-17 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584261#comment-16584261 ] Jason Lowe commented on YARN-8242: -- Thanks for updating the patch! When translating DBEx

[jira] [Commented] (YARN-8640) Restore previous state in container-executor after failure

2018-08-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583079#comment-16583079 ] Jason Lowe commented on YARN-8640: -- Bummer, the 2.8 docker build appears to be busted. I

[jira] [Commented] (YARN-8656) container-executor should not write cgroup tasks files for docker containers

2018-08-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582663#comment-16582663 ] Jason Lowe commented on YARN-8656: -- Thanks for updating the patch! +1 lgtm. I agree the

[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-08-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582661#comment-16582661 ] Jason Lowe commented on YARN-8672: -- Here's some sample output showing the localization fa

[jira] [Created] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-08-16 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8672: Summary: TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out Key: YARN-8672 URL: https://issues.apache.org/jira/browse/YARN-8672 Project: H

[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-08-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582594#comment-16582594 ] Jason Lowe commented on YARN-6456: -- [~ebadger] did the work related to this in our intern

[jira] [Commented] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-08-15 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581419#comment-16581419 ] Jason Lowe commented on YARN-8242: -- Thanks for updating the patch! bq. The problem/issue

[jira] [Updated] (YARN-8568) Replace the deprecated zk-address property in the HA config example in ResourceManagerHA.md

2018-08-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8568: - Fix Version/s: (was: 3.0.3) 3.0.4 Thanks [~rkanter]! I marked this as fixed in 3.0

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-08-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580081#comment-16580081 ] Jason Lowe commented on YARN-8648: -- bq. Is it worth breaking cgroups parameters temporari

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-08-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580025#comment-16580025 ] Jason Lowe commented on YARN-8648: -- +1 for the proposal to fix the cgroup leak by having

[jira] [Commented] (YARN-8656) container-executor should not write cgroup tasks files for docker containers

2018-08-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580007#comment-16580007 ] Jason Lowe commented on YARN-8656: -- Thanks for the patch! bq. should I remove the resour

[jira] [Commented] (YARN-8568) Replace the deprecated zk-address property in the HA config example in ResourceManagerHA.md

2018-08-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579963#comment-16579963 ] Jason Lowe commented on YARN-8568: -- bq. Committed to trunk, branch-3, and branch-3.1! [

[jira] [Updated] (YARN-8640) Restore previous state in container-executor after failure

2018-08-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8640: - Summary: Restore previous state in container-executor after failure (was: Restore previous state in conta

[jira] [Commented] (YARN-8640) Restore previous state in container-executor if write_exit_code_file_as_nm fails

2018-08-14 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579944#comment-16579944 ] Jason Lowe commented on YARN-8640: -- Thanks for the patch! +1 lgtm. Committing this. >

[jira] [Commented] (YARN-8331) Race condition in NM container launched after done

2018-08-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574989#comment-16574989 ] Jason Lowe commented on YARN-8331: -- Thanks for updating the patch! +1 lgtm. Commiting t

[jira] [Commented] (YARN-8331) Race condition in NM container launched after done

2018-08-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573432#comment-16573432 ] Jason Lowe commented on YARN-8331: -- Thanks for the patch, [~pradeepambati]! In Container

[jira] [Commented] (YARN-8609) NM oom because of large container statuses

2018-08-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573232#comment-16573232 ] Jason Lowe commented on YARN-8609: -- This JIRA does mention all those things, and now it p

[jira] [Commented] (YARN-8609) NM oom because of large container statuses

2018-08-07 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571732#comment-16571732 ] Jason Lowe commented on YARN-8609: -- bq. Indeed, it would not take up too much memory if r

[jira] [Commented] (YARN-8609) NM oom because of large container statuses

2018-08-06 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570805#comment-16570805 ] Jason Lowe commented on YARN-8609: -- bq. As far as I know, there are two kinds of diagnost

[jira] [Commented] (YARN-8523) Interactive docker shell

2018-08-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568291#comment-16568291 ] Jason Lowe commented on YARN-8523: -- Involving the nodemanager in the data path should con

[jira] [Commented] (YARN-8263) DockerClient still touches hadoop.tmp.dir

2018-08-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566936#comment-16566936 ] Jason Lowe commented on YARN-8263: -- Thanks for updating the patch! +1 lgtm. There are n

[jira] [Commented] (YARN-8609) NM oom because of large container statuses

2018-08-01 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565511#comment-16565511 ] Jason Lowe commented on YARN-8609: -- Thanks for the report and patch! IMHO any truncation

[jira] [Commented] (YARN-8263) DockerClient still touches hadoop.tmp.dir

2018-07-31 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564136#comment-16564136 ] Jason Lowe commented on YARN-8263: -- Thanks for the patch! Patch looks good, but I think

[jira] [Assigned] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-07-31 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-8242: Assignee: Pradeep Ambati Thanks for picking this up, [~pradeepambati]! RecoveryIterator should ext

[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553540#comment-16553540 ] Jason Lowe commented on YARN-8330: -- One of the envisioned use-cases of ATSv2 was to recor

[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551338#comment-16551338 ] Jason Lowe commented on YARN-8330: -- To me this all depends upon the intent of how the ATS

[jira] [Commented] (YARN-8515) container-executor can crash with SIGPIPE after nodemanager restart

2018-07-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542176#comment-16542176 ] Jason Lowe commented on YARN-8515: -- Thanks for the patch! +1 lgtm. I'll commit this tom

[jira] [Commented] (YARN-8518) test-container-executor test_is_empty() is broken

2018-07-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541883#comment-16541883 ] Jason Lowe commented on YARN-8518: -- Ah, right. I verified that running mvn test in the n

[jira] [Commented] (YARN-8518) test-container-executor test_is_empty() is broken

2018-07-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541832#comment-16541832 ] Jason Lowe commented on YARN-8518: -- It would also be good to understand why this wasn't c

[jira] [Commented] (YARN-8383) TimelineServer 1.5 start fails with NoClassDefFoundError

2018-07-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538755#comment-16538755 ] Jason Lowe commented on YARN-8383: -- bq. change in HDFS is an incompatible change from bra

[jira] [Commented] (YARN-8383) TimelineServer 1.5 start fails with NoClassDefFoundError

2018-07-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538699#comment-16538699 ] Jason Lowe commented on YARN-8383: -- bq. I think adding dependencies share/hadoop/yarn/lib

[jira] [Assigned] (YARN-8383) TimelineServer 1.5 start fails with NoClassDefFoundError

2018-07-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-8383: Assignee: Jason Lowe Attaching a patch for branch-2.8 that also shades the fst jar so it gets the

[jira] [Updated] (YARN-8383) TimelineServer 1.5 start fails with NoClassDefFoundError

2018-07-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8383: - Attachment: YARN-8383.001-branch-2.8.patch > TimelineServer 1.5 start fails with NoClassDefFoundError > --

[jira] [Commented] (YARN-8507) EINVRES Request to https://bower.herokuapp.com/packages/ember-cli-shims failed with 502

2018-07-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537331#comment-16537331 ] Jason Lowe commented on YARN-8507: -- This looks like a duplicate of YARN-8457, and the fix

[jira] [Commented] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-07-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537148#comment-16537148 ] Jason Lowe commented on YARN-8473: -- Thanks for the review, [~sunilg]! Feel free to commi

[jira] [Resolved] (YARN-8385) Clean local directories when a container is killed

2018-07-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-8385. -- Resolution: Invalid Closing this as invalid since YARN is deleting the container directory and leaving

[jira] [Commented] (YARN-8383) TimelineServer 1.5 start fails with NoClassDefFoundError

2018-07-06 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535392#comment-16535392 ] Jason Lowe commented on YARN-8383: -- Sorry for the delay, as I was out on vacation and fin

[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.

2018-07-06 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535019#comment-16535019 ] Jason Lowe commented on YARN-8193: -- bq. The build failed due to some reason not related t

[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.

2018-07-06 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8193: - Attachment: YARN-8193-branch-2-001.patch > YARN RM hangs abruptly (stops allocating resources) when runnin

[jira] [Resolved] (YARN-8462) Resource Manager shutdown with FATAL Exception

2018-07-06 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-8462. -- Resolution: Duplicate This is being handled by YARN-8193 with a new branch-2 patch posted there. > Reso

[jira] [Commented] (YARN-8385) Clean local directories when a container is killed

2018-07-05 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534030#comment-16534030 ] Jason Lowe commented on YARN-8385: -- The container's working directories are cleaned up by

[jira] [Updated] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-07-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8473: - Attachment: YARN-8473.003.patch > Containers being launched as app tears down can leave containers in NEW

[jira] [Commented] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-07-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531852#comment-16531852 ] Jason Lowe commented on YARN-8473: -- bq. In which case, container can come to this case?

[jira] [Commented] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-07-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531572#comment-16531572 ] Jason Lowe commented on YARN-8473: -- Fixed the unit test to remove the race condition when

[jira] [Updated] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-07-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8473: - Attachment: YARN-8473.002.patch > Containers being launched as app tears down can leave containers in NEW

[jira] [Updated] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-07-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8473: - Attachment: YARN-8473.001.patch > Containers being launched as app tears down can leave containers in NEW

[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528025#comment-16528025 ] Jason Lowe commented on YARN-8451: -- Thanks for updating the patch! +1 lgtm. Committing

[jira] [Commented] (YARN-8480) Add boolean option for resources

2018-06-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527962#comment-16527962 ] Jason Lowe commented on YARN-8480: -- Could someone elaborate a bit more on the use-case fo

[jira] [Commented] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-06-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526815#comment-16526815 ] Jason Lowe commented on YARN-8473: -- Sample error transitions from the NM log: {noformat}

[jira] [Created] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-06-28 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8473: Summary: Containers being launched as app tears down can leave containers in NEW state Key: YARN-8473 URL: https://issues.apache.org/jira/browse/YARN-8473 Project: Hadoop YAR

[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526650#comment-16526650 ] Jason Lowe commented on YARN-8451: -- Ah, sorry, I missed that there was a thread earlier a

[jira] [Commented] (YARN-8451) Multiple NM heartbeat thread created when a slow NM resync with RM

2018-06-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526343#comment-16526343 ] Jason Lowe commented on YARN-8451: -- Thanks for the report and patch! Are we sure it's sa

[jira] [Resolved] (YARN-8462) Resource Manager shutdown with FATAL Exception

2018-06-26 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-8462. -- Resolution: Duplicate > Resource Manager shutdown with FATAL Exception > ---

[jira] [Commented] (YARN-8462) Resource Manager shutdown with FATAL Exception

2018-06-26 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524162#comment-16524162 ] Jason Lowe commented on YARN-8462: -- Great, thanks Wangda! I'll mark this as a duplicate

[jira] [Commented] (YARN-8462) Resource Manager shutdown with FATAL Exception

2018-06-26 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523790#comment-16523790 ] Jason Lowe commented on YARN-8462: -- It looks like the NPE is occurring here when getSched

[jira] [Commented] (YARN-8329) Docker client configuration can still be set incorrectly

2018-05-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494133#comment-16494133 ] Jason Lowe commented on YARN-8329: -- Thanks for updating the patch! The unit test failure

[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493963#comment-16493963 ] Jason Lowe commented on YARN-4599: -- TestCGroupElasticMemoryController is failing precommi

[jira] [Commented] (YARN-8375) TestCGroupElasticMemoryController fails surefire build

2018-05-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493962#comment-16493962 ] Jason Lowe commented on YARN-8375: -- Example precommit log at https://builds.apache.org/j

[jira] [Created] (YARN-8375) TestCGroupElasticMemoryController fails surefire build

2018-05-29 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8375: Summary: TestCGroupElasticMemoryController fails surefire build Key: YARN-8375 URL: https://issues.apache.org/jira/browse/YARN-8375 Project: Hadoop YARN Issue Type:

[jira] [Commented] (YARN-8359) Exclude containermanager.linux test classes on Windows

2018-05-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493941#comment-16493941 ] Jason Lowe commented on YARN-8359: -- The unit test failure is unrelated. I attached a new

[jira] [Updated] (YARN-8359) Exclude containermanager.linux test classes on Windows

2018-05-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8359: - Attachment: YARN-8359.002.patch > Exclude containermanager.linux test classes on Windows > ---

[jira] [Updated] (YARN-8359) Exclude containermanager.linux test classes on Windows

2018-05-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8359: - Summary: Exclude containermanager.linux test classes on Windows (was: Disable containermanager.linux.runt

[jira] [Created] (YARN-8374) Upgrade objenesis dependency

2018-05-29 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8374: Summary: Upgrade objenesis dependency Key: YARN-8374 URL: https://issues.apache.org/jira/browse/YARN-8374 Project: Hadoop YARN Issue Type: Improvement Comp

[jira] [Assigned] (YARN-8359) Disable containermanager.linux.runtime.TEST to run on Windows

2018-05-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-8359: Assignee: Jason Lowe > Disable containermanager.linux.runtime.TEST to run on Windows >

[jira] [Commented] (YARN-8359) Disable containermanager.linux.runtime.TEST to run on Windows

2018-05-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493744#comment-16493744 ] Jason Lowe commented on YARN-8359: -- bq. Is this used somewhere else? Using the pom to ex

[jira] [Commented] (YARN-8359) Disable containermanager.linux.runtime.TEST to run on Windows

2018-05-25 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491301#comment-16491301 ] Jason Lowe commented on YARN-8359: -- Attaching a patch that I _think_ does what I proposed.

[jira] [Updated] (YARN-8359) Disable containermanager.linux.runtime.TEST to run on Windows

2018-05-25 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8359: - Attachment: YARN-8359.001.patch > Disable containermanager.linux.runtime.TEST to run on Windows > -

[jira] [Commented] (YARN-8359) Disable containermanager.linux.runtime.TEST to run on Windows

2018-05-25 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491112#comment-16491112 ] Jason Lowe commented on YARN-8359: -- Since this part of the source tree is specific to the

[jira] [Commented] (YARN-8359) Disable containermanager.linux.runtime.TEST to run on Windows

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489807#comment-16489807 ] Jason Lowe commented on YARN-8359: -- The java package is containermanager.*linux*.runtime,

[jira] [Updated] (YARN-8068) Application Priority field causes NPE in app timeline publish when Hadoop 2.7 based clients to 2.8+

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8068: - Fix Version/s: 2.9.2 2.10.0 Thanks, [~sunilg]! This fix needs to go everywhere YARN-586

[jira] [Resolved] (YARN-8358) ResourceManager restart fail to recover due to TimelineServiceV1Publisher NPE

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-8358. -- Resolution: Duplicate > ResourceManager restart fail to recover due to TimelineServiceV1Publisher NPE > -

[jira] [Commented] (YARN-8358) ResourceManager restart fail to recover due to TimelineServiceV1Publisher NPE

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489504#comment-16489504 ] Jason Lowe commented on YARN-8358: -- This looks like a duplicate of YARN-8068. Unfortunate

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489192#comment-16489192 ] Jason Lowe commented on YARN-8292: -- Thanks for updating the patch! I agree with Eric's co

[jira] [Commented] (YARN-8356) yarn.log-aggregation why can auto clean

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489063#comment-16489063 ] Jason Lowe commented on YARN-8356: -- I agree this looks very likely to be a duplicate of YA

[jira] [Commented] (YARN-8279) AggregationLogDeletionService does not honor yarn.log-aggregation.IndexedFormat.remote-app-log-dir-suffix

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489061#comment-16489061 ] Jason Lowe commented on YARN-8279: -- Did this really occur in Hadoop 2.7.3? LogAggregation

[jira] [Commented] (YARN-8352) AM should retry on a different node after the previous application attempt fail

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488969#comment-16488969 ] Jason Lowe commented on YARN-8352: -- Is this a duplicate of YARN-2005? This is filed again

[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488962#comment-16488962 ] Jason Lowe commented on YARN-8346: -- Thanks for the reviews! Does this need to go into bra

[jira] [Commented] (YARN-8356) yarn.log-aggregation why can auto clean

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488922#comment-16488922 ] Jason Lowe commented on YARN-8356: -- Couple of questions to clarify the setup: Is yarn.nod

[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488908#comment-16488908 ] Jason Lowe commented on YARN-8338: -- My personal preference would be to use objenesis 1.0 i

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488092#comment-16488092 ] Jason Lowe commented on YARN-8292: -- Thanks for updating the patch! The TestPreemptionForQ

[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8346: - Attachment: YARN-8346.001.patch > Upgrading to 3.1 kills running containers with error "Opportunistic conta

[jira] [Assigned] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-8346: Assignee: Jason Lowe > Upgrading to 3.1 kills running containers with error "Opportunistic container

[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487741#comment-16487741 ] Jason Lowe commented on YARN-8346: -- I should have a patch up later today. I already verif

[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487738#comment-16487738 ] Jason Lowe commented on YARN-8338: -- bq. This comment says that if android is not used, we

[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487557#comment-16487557 ] Jason Lowe commented on YARN-8346: -- IIUC the issue isn't the queue length setting but rath

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484674#comment-16484674 ] Jason Lowe commented on YARN-8292: -- Thanks for updating the patch! Why does isAnyMajorRes

[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484398#comment-16484398 ] Jason Lowe commented on YARN-8338: -- To clarify, the reason the removal of the objenesis de

[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484390#comment-16484390 ] Jason Lowe commented on YARN-8338: -- Thanks for the patch! Yeah, this looks like a case of

[jira] [Updated] (YARN-8232) RMContainer lost queue name when RM HA happens

2018-05-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8232: - Fix Version/s: 2.8.5 2.9.2 2.10.0 Thanks, [~ziqian hu]! We recently

[jira] [Commented] (YARN-8329) Docker client configuration can still be set incorrectly

2018-05-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483140#comment-16483140 ] Jason Lowe commented on YARN-8329: -- Thanks for the patch! I'm wondering why the code is f

[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483087#comment-16483087 ] Jason Lowe commented on YARN-8206: -- Thanks for updating the patch! +1 lgtm. I'll commit

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483032#comment-16483032 ] Jason Lowe commented on YARN-8259: -- I do agree with Shane that there are already subsystem

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483029#comment-16483029 ] Jason Lowe commented on YARN-8259: -- Ah comment race with [~eyang], I'll defer until his co

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483028#comment-16483028 ] Jason Lowe commented on YARN-8259: -- Thanks for the patch! +1 lgtm. I'll commit this tomo

[jira] [Commented] (YARN-8328) NonAggregatingLogHandler needlessly waits upon shutdown if delayed deletion is scheduled

2018-05-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482643#comment-16482643 ] Jason Lowe commented on YARN-8328: -- This is making a lot of unit tests take a lot longer t

[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482584#comment-16482584 ] Jason Lowe commented on YARN-8206: -- Thanks for updating the patch! Unfortunately the patc

<    1   2   3   4   5   6   7   8   9   10   >