[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-10 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609943#comment-16609943 ] Wangda Tan commented on YARN-8757: -- Added ver.1 patch which spin up a Tensorboard container when

[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-10 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8757: - Attachment: YARN-8757.001.patch > [Submarine] Add Tensorboard component when --tensorboard is specified >

[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-09 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608611#comment-16608611 ] Wangda Tan commented on YARN-8757: -- Working on the patch now, will update patch shortly. > [Submarine]

[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-09 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8757: - Description: We need to have a Tensorboard component when --tensorboard is specified. And we need to set

[jira] [Created] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8757: Summary: [Submarine] Add Tensorboard component when --tensorboard is specified Key: YARN-8757 URL: https://issues.apache.org/jira/browse/YARN-8757 Project: Hadoop YARN

[jira] [Updated] (YARN-8698) [Submarine] Failed to reset Hadoop home environment when submitting a submarine job

2018-09-09 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8698: - Summary: [Submarine] Failed to reset Hadoop home environment when submitting a submarine job (was:

[jira] [Updated] (YARN-8756) [Submarine] Properly handle relative path for staging area

2018-09-09 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8756: - Attachment: YARN-8756.001.patch > [Submarine] Properly handle relative path for staging area >

[jira] [Created] (YARN-8756) [Submarine] Properly handle relative path for staging area

2018-09-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8756: Summary: [Submarine] Properly handle relative path for staging area Key: YARN-8756 URL: https://issues.apache.org/jira/browse/YARN-8756 Project: Hadoop YARN Issue

[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-09-09 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608545#comment-16608545 ] Wangda Tan commented on YARN-8698: --  +1, will commit the patch shortly. Thanks, > [Submarine] Failed to

[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-09-08 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608213#comment-16608213 ] Wangda Tan commented on YARN-8513: -- [~hustnn], I agree that it is still a problem, but relatively minor

[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-09-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607495#comment-16607495 ] Wangda Tan commented on YARN-8513: -- Spent good amount of time to check the issue. I found scheduler

[jira] [Commented] (YARN-5592) Add support for dynamic resource updates with multiple resource types

2018-09-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607393#comment-16607393 ] Wangda Tan commented on YARN-5592: -- [~sunilg],  I think remove resource types gonna be hard. Unless we

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-09-04 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603707#comment-16603707 ] Wangda Tan commented on YARN-8569: -- Thanks [~eyang],   And forgot to mention: if we're going to add new

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-09-04 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603680#comment-16603680 ] Wangda Tan commented on YARN-8569: -- [~eyang],  How we can make it available prior to container launch?

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-09-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602350#comment-16602350 ] Wangda Tan commented on YARN-8569: -- And in implementation, AM should have ability to write files to an

[jira] [Comment Edited] (YARN-8569) Create an interface to provide cluster information to application

2018-09-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602345#comment-16602345 ] Wangda Tan edited comment on YARN-8569 at 9/3/18 4:57 PM: -- [~eyang], As

[jira] [Comment Edited] (YARN-8569) Create an interface to provide cluster information to application

2018-09-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602345#comment-16602345 ] Wangda Tan edited comment on YARN-8569 at 9/3/18 4:57 PM: -- [~eyang], As

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-09-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602345#comment-16602345 ] Wangda Tan commented on YARN-8569: -- [~eyang], I still think it is a bad idea to support "sys info" by

[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-09-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602326#comment-16602326 ] Wangda Tan commented on YARN-8513: -- And btw, I found a comment in LeafQueue: {code:java} private void

[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-09-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602323#comment-16602323 ] Wangda Tan commented on YARN-8513: -- Interesting, it must be caused by CS allocation doesn't fully

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-08-29 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596926#comment-16596926 ] Wangda Tan commented on YARN-8468: -- [~bsteinbach], Thanks, I think it makes sense to normalize/validate

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-08-29 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596014#comment-16596014 ] Wangda Tan commented on YARN-8569: -- [~eyang],  {quote}Unless malicious user already hacked into yarn user

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595949#comment-16595949 ] Wangda Tan commented on YARN-8569: -- [~eyang], As we discussed offline, the use case is not clear to me.

[jira] [Commented] (YARN-8718) Merge related work for YARN-3409

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595945#comment-16595945 ] Wangda Tan commented on YARN-8718: -- [~sunilg], the attached patch doesn't look correct. > Merge related

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595944#comment-16595944 ] Wangda Tan commented on YARN-8220: -- Thanks [~sunilg], I think we should close this JIRA. > Running

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595712#comment-16595712 ] Wangda Tan commented on YARN-8468: -- 1) Is it sufficient to make changes like YARN-1582, IIUC, it doesn't

[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595575#comment-16595575 ] Wangda Tan commented on YARN-7018: -- [~jlowe], given the fields need to be updated should all inside

[jira] [Commented] (YARN-8722) Failed to get native service application status when security is enabled

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595340#comment-16595340 ] Wangda Tan commented on YARN-8722: -- Thanks [~eyang], [~yuan_zac], are you able to *submit* job without

[jira] [Commented] (YARN-8722) Failed to get native service application status when security is enabled

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595185#comment-16595185 ] Wangda Tan commented on YARN-8722: -- [~eyang], [~billie.rinaldi], have we seen this issue when trying to

[jira] [Updated] (YARN-8722) Failed to get native service application status when security is enabled

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8722: - Environment: (was: The environment context is as follows: 1) Security enabled. kerberos 2) Klist

[jira] [Updated] (YARN-8722) Failed to get native service application status when security is enabled

2018-08-28 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8722: - Description: Can't get job status with the following command, after a submarine job is submitted.

[jira] [Created] (YARN-8716) [Submarine] Support passing Kerberos principle tokens when launch training jobs.

2018-08-26 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8716: Summary: [Submarine] Support passing Kerberos principle tokens when launch training jobs. Key: YARN-8716 URL: https://issues.apache.org/jira/browse/YARN-8716 Project: Hadoop

[jira] [Created] (YARN-8713) [Submarine] Support deploy model serving for existing models

2018-08-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8713: Summary: [Submarine] Support deploy model serving for existing models Key: YARN-8713 URL: https://issues.apache.org/jira/browse/YARN-8713 Project: Hadoop YARN

[jira] [Created] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-08-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8714: Summary: [Submarine] Support files/tarballs to be localized for a training job. Key: YARN-8714 URL: https://issues.apache.org/jira/browse/YARN-8714 Project: Hadoop YARN

[jira] [Created] (YARN-8712) [Submarine] Support create models / versions for training result.

2018-08-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8712: Summary: [Submarine] Support create models / versions for training result. Key: YARN-8712 URL: https://issues.apache.org/jira/browse/YARN-8712 Project: Hadoop YARN

[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-08-24 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591878#comment-16591878 ] Wangda Tan commented on YARN-8513: -- [~hustnn], what is the cause of "Failed to accept allocation

[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-22 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589478#comment-16589478 ] Wangda Tan commented on YARN-8638: -- [~ccondit-target], Thanks for working on this ticket. It will be

[jira] [Assigned] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-8698: Assignee: Zac Zhou > [Submarine] Failed to add hadoop dependencies in docker container when >

[jira] [Updated] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8698: - Issue Type: Sub-task (was: Bug) Parent: YARN-8135 > [Submarine] Failed to add hadoop

[jira] [Updated] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8698: - Summary: [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine

[jira] [Updated] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-21 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8675: - Reporter: Yesha Vora (was: Suma Shivaprasad) > Setting hostname of docker container breaks with "host"

[jira] [Comment Edited] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-08-20 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586854#comment-16586854 ] Wangda Tan edited comment on YARN-8513 at 8/21/18 3:37 AM: --- Interesting,

[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-08-20 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586854#comment-16586854 ] Wangda Tan commented on YARN-8513: -- Interesting, [~cheersyang], I can only think about reservation

[jira] [Assigned] (YARN-8679) [ATSv2] If HBase cluster is down for long time, high chances that NM ContainerManager dispatcher get blocked

2018-08-17 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-8679: Assignee: Wangda Tan (was: Rohith Sharma K S) > [ATSv2] If HBase cluster is down for long time,

[jira] [Commented] (YARN-8679) [ATSv2] If HBase cluster is down for long time, high chances that NM ContainerManager dispatcher get blocked

2018-08-17 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584322#comment-16584322 ] Wangda Tan commented on YARN-8679: -- [~rohithsharma], thanks for the patch. I'm a bit worried about the

[jira] [Updated] (YARN-8679) [ATSv2] If HBase cluster is down for long time, high chances that NM ContainerManager dispatcher get blocked

2018-08-17 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8679: - Attachment: YARN-8679.02.patch > [ATSv2] If HBase cluster is down for long time, high chances that NM >

[jira] [Commented] (YARN-8677) Queue Management API - no errors thrown for wrong properties

2018-08-17 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584242#comment-16584242 ] Wangda Tan commented on YARN-8677: -- [~akhilpb], could u move these issues to sub jira of YARN-5734 for

[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-08-17 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584238#comment-16584238 ] Wangda Tan commented on YARN-8657: -- [~sunilg], I'm not quite sure if the patch changed locking scope of

[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-08-16 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583026#comment-16583026 ] Wangda Tan commented on YARN-8513: -- [~cyfdecyf], Could u upload logs/jstacks for 3.1.0 deployment? We

[jira] [Updated] (YARN-8667) Cleanup symlinks when container restarted by NM to solve issue "find: File system loop detected;" for tar ball artifacts.

2018-08-16 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8667: - Summary: Cleanup symlinks when container restarted by NM to solve issue "find: File system loop

[jira] [Updated] (YARN-8667) Cleanup symlinks when container restarted by NM to solve issue "find: File system loop detected;" for tar ball artifacts.

2018-08-16 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8667: - Target Version/s: 3.1.1, 3.2.0 Priority: Critical (was: Major) > Cleanup symlinks when

[jira] [Comment Edited] (YARN-8668) Inconsistency between capacity and fair scheduler in the aspect of computing node available resource

2018-08-15 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581574#comment-16581574 ] Wangda Tan edited comment on YARN-8668 at 8/15/18 8:34 PM: --- Thanks [~Cyl] for

[jira] [Commented] (YARN-8668) Inconsistency between capacity and fair scheduler in the aspect of computing node available resource

2018-08-15 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581574#comment-16581574 ] Wangda Tan commented on YARN-8668: -- Thanks [~Cyl] for reporting the issue, this is by design in CS.

[jira] [Updated] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-08-13 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8657: - Attachment: YARN-8657.001.patch > User limit calculation should be read-lock-protected within LeafQueue >

[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-08-13 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579053#comment-16579053 ] Wangda Tan commented on YARN-8657: -- [~sunil.gov...@gmail.com], [~cheersyang], could u help to review this

[jira] [Created] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-08-13 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8657: Summary: User limit calculation should be read-lock-protected within LeafQueue Key: YARN-8657 URL: https://issues.apache.org/jira/browse/YARN-8657 Project: Hadoop YARN

[jira] [Updated] (YARN-8647) Add a flag to disable move app between queues

2018-08-10 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8647: - Summary: Add a flag to disable move app between queues (was: Add a flag to disable move queue) > Add a

[jira] [Updated] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-09 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8561: - Attachment: YARN-8561.005.patch > [Submarine] Add initial implementation: training job submission and job

[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-09 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575464#comment-16575464 ] Wangda Tan commented on YARN-8561: -- Thanks [~sunilg] For your addition comments: 1. I think we can

[jira] [Commented] (YARN-8588) Logging improvements for better debuggability

2018-08-08 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574042#comment-16574042 ] Wangda Tan commented on YARN-8588: -- +1, LGTM. thanks [~suma.shivaprasad] > Logging improvements for

[jira] [Updated] (YARN-8588) Logging improvements for better debuggability

2018-08-08 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8588: - Target Version/s: 3.2.0, 3.1.2 > Logging improvements for better debuggability >

[jira] [Commented] (YARN-8407) Container launch exception in AM log should be printed in ERROR level

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572265#comment-16572265 ] Wangda Tan commented on YARN-8407: -- +1 to the patch. Thanks [~yeshavora] > Container launch exception in

[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572253#comment-16572253 ] Wangda Tan commented on YARN-8561: -- Attached ver.4 patch, fixed jenkins warnings. > [Submarine] Add

[jira] [Updated] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8561: - Attachment: YARN-8561.004.patch > [Submarine] Add initial implementation: training job submission and job

[jira] [Commented] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572242#comment-16572242 ] Wangda Tan commented on YARN-8629: -- Ah forgot to mention, patch got committed to trunk/branch-3.1 >

[jira] [Updated] (YARN-8407) Container launch exception in AM log should be printed in ERROR level

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8407: - Target Version/s: 3.2.0, 3.1.2 > Container launch exception in AM log should be printed in ERROR level >

[jira] [Commented] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572206#comment-16572206 ] Wangda Tan commented on YARN-8629: -- +1, patch LGTM, thanks [~suma.shivaprasad]. > Container cleanup

[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572152#comment-16572152 ] Wangda Tan commented on YARN-8561: -- Attached ver.3 patch which included help messages and cleaned up

[jira] [Updated] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8561: - Attachment: YARN-8561.003.patch > [Submarine] Add initial implementation: training job submission and job

[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572087#comment-16572087 ] Wangda Tan commented on YARN-8561: -- Thanks [~sunilg], 1. Addressed. 2. I think we can rely on yarn app

[jira] [Updated] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8561: - Attachment: YARN-8561.002.patch > [Submarine] Add initial implementation: training job submission and job

[jira] [Updated] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8629: - Target Version/s: 3.2.0, 3.1.2 Priority: Critical (was: Major) > Container cleanup fails

[jira] [Commented] (YARN-7089) Mark the log-aggregation-controller APIs as public

2018-08-06 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570961#comment-16570961 ] Wangda Tan commented on YARN-7089: -- +1, LGTM. > Mark the log-aggregation-controller APIs as public >

[jira] [Commented] (YARN-8475) Should check the resource of assignment is greater than Resources.none() before submitResourceCommitRequest

2018-08-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568712#comment-16568712 ] Wangda Tan commented on YARN-8475: -- [~zhouyunfan], could u add more details to the bug? In what scenario

[jira] [Commented] (YARN-8136) Add version attribute to site doc examples and quickstart

2018-08-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568596#comment-16568596 ] Wangda Tan commented on YARN-8136: -- +1, LGTM, thanks [~eyang].  > Add version attribute to site doc

[jira] [Assigned] (YARN-8136) Add version attribute to site doc examples and quickstart

2018-08-03 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-8136: Assignee: Eric Yang > Add version attribute to site doc examples and quickstart >

[jira] [Updated] (YARN-8608) [UI2] No information available per application appAttempt about 'Total Outstanding Resource Requests'

2018-08-02 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8608: - Reporter: Sumana Sathish (was: Akhil PB) > [UI2] No information available per application appAttempt

[jira] [Updated] (YARN-8615) [UI2] Resource Usage tab shows only memory related info. No info available for vcores/gpu.

2018-08-02 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8615: - Reporter: Sumana Sathish (was: Akhil PB) > [UI2] Resource Usage tab shows only memory related info. No

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-08-01 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566119#comment-16566119 ] Wangda Tan commented on YARN-8200: -- [~jhung], thanks for sharing the result. Overall the number looks

[jira] [Comment Edited] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-08-01 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566045#comment-16566045 ] Wangda Tan edited comment on YARN-8559 at 8/1/18 9:57 PM: -- Thanks [~cheersyang],

[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-08-01 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566045#comment-16566045 ] Wangda Tan commented on YARN-8559: -- Thanks [~cheersyang], latest patch LGTM.  [~jhung], given this is

[jira] [Commented] (YARN-8588) Logging improvements for better debuggability

2018-08-01 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565926#comment-16565926 ] Wangda Tan commented on YARN-8588: -- [~suma.shivaprasad], could you help to take care of the findbugs

[jira] [Commented] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564775#comment-16564775 ] Wangda Tan commented on YARN-8606: -- [~bibinchundatt],  Gotcha, fix make sense to me. +1 to the patch. I

[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564589#comment-16564589 ] Wangda Tan commented on YARN-7494: -- [~sunilg], Thanks for updating the patch, some comments. 1) Not

[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564437#comment-16564437 ] Wangda Tan commented on YARN-8522: -- LGTM +1, thanks [~Zian Chen], Will commit shortly, we may not need

[jira] [Updated] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8522: - Target Version/s: 3.2.0, 3.1.1 Priority: Critical (was: Major) > Application fails with

[jira] [Updated] (YARN-8600) RegistryDNS hang when remote lookup does not reply

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8600: - Priority: Critical (was: Major) > RegistryDNS hang when remote lookup does not reply >

[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8579: - Target Version/s: 3.2.0, 3.1.2 Fix Version/s: (was: 3.1.2) (was:

[jira] [Updated] (YARN-7512) Support service upgrade via YARN Service API and CLI

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-7512: - Target Version/s: 3.1.2 (was: 3.1.1) > Support service upgrade via YARN Service API and CLI >

[jira] [Updated] (YARN-8399) NodeManager is giving 403 GSS exception post upgrade to 3.1 in secure mode

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8399: - Target Version/s: 2.10.0, 3.2.0, 3.0.3, 3.1.2 (was: 2.10.0, 3.2.0, 3.1.1, 3.0.3) > NodeManager is

[jira] [Updated] (YARN-8520) Document best practice for user management

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8520: - Target Version/s: 3.2.0, 3.1.2 (was: 3.2.0, 3.1.1) > Document best practice for user management >

[jira] [Updated] (YARN-8052) Move overwriting of service definition during flex to service master

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8052: - Target Version/s: 3.1.2 (was: 3.1.1) > Move overwriting of service definition during flex to service

[jira] [Updated] (YARN-8136) Add version attribute to site doc examples and quickstart

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8136: - Target Version/s: 3.1.2 (was: 3.1.1) > Add version attribute to site doc examples and quickstart >

[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8453: - Target Version/s: 3.0.4, 3.1.2 (was: 3.1.1, 3.0.4) > Additional Unit tests to verify queue limit and

[jira] [Updated] (YARN-8399) NodeManager is giving 403 GSS exception post upgrade to 3.1 in secure mode

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8399: - Target Version/s: 2.10.0, 3.2.0, 3.0.3 (was: 2.10.0, 3.2.0, 3.0.3, 3.1.2) > NodeManager is giving 403

[jira] [Updated] (YARN-8161) ServiceState FLEX should be removed

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8161: - Target Version/s: 3.2.0, 3.1.2 (was: 3.2.0, 3.1.1) > ServiceState FLEX should be removed >

[jira] [Updated] (YARN-8366) Expose debug log information when user intend to enable GPU without setting nvidia-smi path

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8366: - Target Version/s: 3.2.0, 3.1.2 (was: 3.2.0, 3.1.1) > Expose debug log information when user intend to

[jira] [Updated] (YARN-8552) [DS] Container report fails for distributed containers

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8552: - Target Version/s: 3.1.2 (was: 3.1.1) > [DS] Container report fails for distributed containers >

[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564255#comment-16564255 ] Wangda Tan commented on YARN-8301: -- Committed to branch-3.1.1, thanks [~csingh]/[~eyang]. > Yarn Service

[jira] [Updated] (YARN-8508) On NodeManager container gets cleaned up before its pid file is created

2018-07-31 Thread Wangda Tan (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8508: - Fix Version/s: (was: 3.1.2) 3.1.1 > On NodeManager container gets cleaned up

<    1   2   3   4   5   6   7   8   9   10   >