[jira] [Updated] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-21 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8333: Description: For scaling stateless containers, it would be great to support DNS round robin for fault

[jira] [Created] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-21 Thread Eric Yang (JIRA)
Eric Yang created YARN-8333: --- Summary: Load balance YARN services using RegistryDNS multiple A records Key: YARN-8333 URL: https://issues.apache.org/jira/browse/YARN-8333 Project: Hadoop YARN

[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-21 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482905#comment-16482905 ] Eric Yang edited comment on YARN-8326 at 5/21/18 6:57 PM: -- This appears to be

[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-21 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482905#comment-16482905 ] Eric Yang commented on YARN-8326: - This appears to be introduced by YARN-5662 by turning on container

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482992#comment-16482992 ] Eric Yang commented on YARN-8259: - If I am not mistaken, DockerContainerRuntime is running as part of node

[jira] [Comment Edited] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482992#comment-16482992 ] Eric Yang edited comment on YARN-8259 at 5/21/18 8:15 PM: -- If I am not mistaken,

[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-05-21 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483116#comment-16483116 ] Eric Yang commented on YARN-8326: - [~hlhu...@us.ibm.com] Does the same log entries show up? > Yarn 3.0

[jira] [Commented] (YARN-7960) Add no-new-privileges flag to docker run

2018-05-21 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483138#comment-16483138 ] Eric Yang commented on YARN-7960: - +1 looks good to me. > Add no-new-privileges flag to docker run >

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-21 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483057#comment-16483057 ] Eric Yang commented on YARN-8259: - System administrator can reserve one cpu core for node manager and all

[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-22 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484199#comment-16484199 ] Eric Yang commented on YARN-8108: - [~yzhangal] This is a regression in Hadoop 3.x, hence it is marked as a

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-05-22 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484271#comment-16484271 ] Eric Yang commented on YARN-8259: - [~shaneku...@gmail.com] The proposal for implementing both is okay, but

[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487952#comment-16487952 ] Eric Yang commented on YARN-8342: - [~vinodkv] Launch command was dropped in YARN-7516 due to concerns of

[jira] [Assigned] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8333: --- Assignee: Eric Yang > Load balance YARN services using RegistryDNS multiple A records >

[jira] [Comment Edited] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-22 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486530#comment-16486530 ] Eric Yang edited comment on YARN-8342 at 5/23/18 12:58 AM: --- The current behavior

[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-22 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486530#comment-16486530 ] Eric Yang commented on YARN-8342: - The current behavior is documented in

[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488222#comment-16488222 ] Eric Yang commented on YARN-8342: - We have the following options: 1. Allow exemption to bind-mount

[jira] [Commented] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488241#comment-16488241 ] Eric Yang commented on YARN-8333: - Patch 001 added multi-A record per component. > Load balance YARN

[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488320#comment-16488320 ] Eric Yang commented on YARN-7530: - +1 for branch-3.1 change. > hadoop-yarn-services-api should be part of

[jira] [Updated] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8333: Attachment: YARN-8333.001.patch > Load balance YARN services using RegistryDNS multiple A records >

[jira] [Commented] (YARN-8290) SystemMetricsPublisher.appACLsUpdated should be invoked after application information is published to ATS to avoid "User is not set in the application report" Exception

2018-05-22 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484554#comment-16484554 ] Eric Yang commented on YARN-8290: - Thank you [~leftnoteasy] for the review and commit. >

[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-25 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490956#comment-16490956 ] Eric Yang commented on YARN-8342: - [~vinodkv] The original design was: - Images in trusted registry can

[jira] [Commented] (YARN-8255) Allow option to disable flex for a service component

2018-05-25 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490979#comment-16490979 ] Eric Yang commented on YARN-8255: - [~suma.shivaprasad] Restart_policy = ON_FAILURE covers Spark use case to

[jira] [Commented] (YARN-8365) Revisit the record type used by Registry DNS for upstream resolution

2018-05-25 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491273#comment-16491273 ] Eric Yang commented on YARN-8365: - [~shaneku...@gmail.com] We have several options: Option 1, default

[jira] [Commented] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-24 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489365#comment-16489365 ] Eric Yang commented on YARN-8333: - Patch 002 fixed check style issues. > Load balance YARN services using

[jira] [Assigned] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-24 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8342: --- Assignee: Eric Yang > Using docker image from a non-privileged registry, the launch_command is not >

[jira] [Updated] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records

2018-05-24 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8333: Attachment: YARN-8333.002.patch > Load balance YARN services using RegistryDNS multiple A records >

[jira] [Commented] (YARN-8357) Yarn Service: NPE when service is saved first and then started.

2018-05-24 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489752#comment-16489752 ] Eric Yang commented on YARN-8357: - +1 looks good. > Yarn Service: NPE when service is saved first and then

[jira] [Commented] (YARN-8316) Diagnostic message should improve when yarn service fails to launch due to ATS unavailability

2018-05-24 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489730#comment-16489730 ] Eric Yang commented on YARN-8316: - +1 looks good to me. > Diagnostic message should improve when yarn

[jira] [Updated] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-05-24 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8342: Attachment: YARN-8342.001.patch > Using docker image from a non-privileged registry, the launch_command is

[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-18 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481207#comment-16481207 ] Eric Yang commented on YARN-8108: - cc [~rkanter] and [~tucu00] > RM metrics rest API throws GSSException

[jira] [Updated] (YARN-8293) In YARN Services UI, "User Name for service" should be completely removed in secure clusters

2018-05-17 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8293: Fix Version/s: 3.1.1 3.2.0 > In YARN Services UI, "User Name for service" should be

[jira] [Updated] (YARN-8315) HDP 3.0.0 perfromance is slower than HDP 2.6.4

2018-05-17 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8315: Environment: (was: I have a HDP 2.6.4 cluster and HDP 3.0.0 cluster,  I set up to have the same settings

[jira] [Comment Edited] (YARN-8315) HDP 3.0.0 perfromance is slower than HDP 2.6.4

2018-05-17 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479673#comment-16479673 ] Eric Yang edited comment on YARN-8315 at 5/17/18 9:37 PM: -- [~hlhu...@us.ibm.com]

[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-15 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476621#comment-16476621 ] Eric Yang commented on YARN-8080: - [~suma.shivaprasad] Thank you for the patch, a few nitpicks: {code}

[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-18 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481059#comment-16481059 ] Eric Yang commented on YARN-7530: - The pre-commit build failed because it downloaded pre-built binary of

[jira] [Updated] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-18 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8290: Attachment: YARN-8290.004.patch > Yarn application failed to recover with "Error Launching job : User is not

[jira] [Commented] (YARN-8315) HDP 3.0.0 perfromance is slower than HDP 2.6.4

2018-05-17 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479673#comment-16479673 ] Eric Yang commented on YARN-8315: - [~hlhu...@us.ibm.com] Apache Hadoop community is not responsible for HDP

[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-16 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478311#comment-16478311 ] Eric Yang commented on YARN-8141: - [~csingh] Thank you for the patch, a few nits: FindAbsoluteMount method

[jira] [Commented] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component

2018-05-15 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476655#comment-16476655 ] Eric Yang commented on YARN-8081: - +1 looks good to me. > Yarn Service Upgrade: Add support to upgrade a

[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487535#comment-16487535 ] Eric Yang commented on YARN-8346: - The queue length is based on

[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487593#comment-16487593 ] Eric Yang commented on YARN-8346: - [~jlowe] Yes, you are right. Existing workload supposed to have

[jira] [Commented] (YARN-8341) Yarn Service: Integration tests

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487636#comment-16487636 ] Eric Yang commented on YARN-8341: - [~csingh] Thanks for the patch. I think it will be good to create a

[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-23 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487825#comment-16487825 ] Eric Yang commented on YARN-8108: - [~yzhangal] My preference is to fix this in 3.0.3 release. If consensus

[jira] [Updated] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-17 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8290: Attachment: YARN-8290.003.patch > Yarn application failed to recover with "Error Launching job : User is not

[jira] [Updated] (YARN-8296) Update YarnServiceApi documentation and yarn service UI code to remove references to unique_component_support

2018-05-17 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8296: Fix Version/s: 3.2.0 > Update YarnServiceApi documentation and yarn service UI code to remove > references

[jira] [Commented] (YARN-8259) Revisit liveliness checks for privileged Docker containers

2018-05-17 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479872#comment-16479872 ] Eric Yang commented on YARN-8259: - I prefer to use docker ps -fname=containerID. This approach keeps

[jira] [Updated] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8414: Description: Test cluster has 1000 apps running, and a user trigger capacity scheduler queue changes.

[jira] [Updated] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8414: Description: Test cluster has 1000 apps running, and a user trigger capacity scheduler queue changes.

[jira] [Comment Edited] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508697#comment-16508697 ] Eric Yang edited comment on YARN-8403 at 6/11/18 10:49 PM: --- [~vinodkv],

[jira] [Commented] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508908#comment-16508908 ] Eric Yang commented on YARN-8403: - - Patch 003 added test case. > Nodemanager logs failed to download

[jira] [Updated] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8403: Attachment: (was: YARN-8403.003.patch) > Nodemanager logs failed to download file with INFO level >

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509112#comment-16509112 ] Eric Yang commented on YARN-8220: - [~leftnoteasy] If JAVA_HOME, CLASSPATH, HDFS_HOME are not changeable.

[jira] [Created] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-11 Thread Eric Yang (JIRA)
Eric Yang created YARN-8414: --- Summary: Nodemanager crashes soon if ATSv2 HBase is either down or absent Key: YARN-8414 URL: https://issues.apache.org/jira/browse/YARN-8414 Project: Hadoop YARN

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508922#comment-16508922 ] Eric Yang commented on YARN-8414: - Timeline collector retries HBase write without any pause in between

[jira] [Updated] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8403: Attachment: YARN-8403.003.patch > Nodemanager logs failed to download file with INFO level >

[jira] [Updated] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8403: Attachment: YARN-8403.003.patch > Nodemanager logs failed to download file with INFO level >

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509039#comment-16509039 ] Eric Yang commented on YARN-8414: - There are general protection code in place to close idle connections:

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509816#comment-16509816 ] Eric Yang commented on YARN-8414: - Timeline Service v2 implements BufferedMutator API from HBase for

[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509772#comment-16509772 ] Eric Yang commented on YARN-8258: - HADOOP-15518 has other unexpected problems. YARN-8108 is the patch

[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-06-07 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504904#comment-16504904 ] Eric Yang commented on YARN-8108: - [~lmccay] If the entire webserver only serves secure contents, then it

[jira] [Updated] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-08 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8403: Attachment: YARN-8403.png > Nodemanager logs failed to download file with INFO level >

[jira] [Updated] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-08 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8403: Attachment: YARN-8403.002.patch > Nodemanager logs failed to download file with INFO level >

[jira] [Commented] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-08 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506418#comment-16506418 ] Eric Yang commented on YARN-8403: - Patch 002 has been updated to send diagnostic information to AM. The

[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-06-07 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505034#comment-16505034 ] Eric Yang commented on YARN-8258: - This patch will be ok to commit, if HADOOP-15518 is committed. > YARN

[jira] [Commented] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-07 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505267#comment-16505267 ] Eric Yang commented on YARN-8403: - [~vinodkv] Yes. This will be added to the diagnostics in the next

[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-06-18 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516072#comment-16516072 ] Eric Yang commented on YARN-8326: - [~shaneku...@gmail.com] In line

[jira] [Commented] (YARN-8410) Registry DNS lookup fails to return for CNAMEs

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510389#comment-16510389 ] Eric Yang commented on YARN-8410: - [~shaneku...@gmail.com] {code} if (r.getType() != Type.CNAME) {

[jira] [Commented] (YARN-8411) Enable stopped system services to be started during RM start

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510497#comment-16510497 ] Eric Yang commented on YARN-8411: - +1 looks good to me. I will commit patch 002 tomorrow if no

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510434#comment-16510434 ] Eric Yang commented on YARN-8414: - [~te...@apache.org] Thanks for the suggestion. It looks like

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510275#comment-16510275 ] Eric Yang commented on YARN-8259: - 4 People have expressed opinion to go with option #1. Therefore, this

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510094#comment-16510094 ] Eric Yang commented on YARN-8414: - [~te...@apache.org] Thanks for the input, I am going to experiment with

[jira] [Commented] (YARN-8411) Enable stopped system services to be started during RM start

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510177#comment-16510177 ] Eric Yang commented on YARN-8411: - [~billie.rinaldi] Thank you for the patch, the approach looks good.

[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510088#comment-16510088 ] Eric Yang commented on YARN-8258: - [~sunilg] I tested YARN-8108 with

[jira] [Commented] (YARN-8410) Registry DNS lookup fails to return for CNAMEs

2018-06-13 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511494#comment-16511494 ] Eric Yang commented on YARN-8410: - {quote} I'll note that it does look like we've got follow on work here.

[jira] [Commented] (YARN-8410) Registry DNS lookup fails to return for CNAMEs

2018-06-13 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511505#comment-16511505 ] Eric Yang commented on YARN-8410: - [~shaneku...@gmail.com] The reordering of cname records first is

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-13 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511737#comment-16511737 ] Eric Yang commented on YARN-8414: - When hbase.client.pause is configured, there does not appear to be any

[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-06-13 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511833#comment-16511833 ] Eric Yang commented on YARN-8258: - [~sunilg] I am getting this error with patch 008 without using

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512863#comment-16512863 ] Eric Yang commented on YARN-8414: - [~rohithsharma] FYI, We have more applications running than physical

[jira] [Comment Edited] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512837#comment-16512837 ] Eric Yang edited comment on YARN-8414 at 6/14/18 6:28 PM: -- [~rohithsharma] We

[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512817#comment-16512817 ] Eric Yang commented on YARN-8258: - Patch 008 shows a blank screen with above javascript error. How to

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512837#comment-16512837 ] Eric Yang commented on YARN-8414: - [~rohithsharma] We have 9 node managers running 1000 applications, each

[jira] [Comment Edited] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511867#comment-16511867 ] Eric Yang edited comment on YARN-8414 at 6/14/18 6:25 PM: -- The root cause for

[jira] [Commented] (YARN-8410) Registry DNS lookup fails to return for CNAMEs

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512641#comment-16512641 ] Eric Yang commented on YARN-8410: - [~shaneku...@gmail.com] I saw infinite CNAME record problem before. If

[jira] [Comment Edited] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511867#comment-16511867 ] Eric Yang edited comment on YARN-8414 at 6/14/18 3:38 PM: -- The root cause for

[jira] [Comment Edited] (YARN-8410) Registry DNS lookup fails to return for CNAMEs

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512912#comment-16512912 ] Eric Yang edited comment on YARN-8410 at 6/14/18 7:59 PM: -- +1 for patch 004. I

[jira] [Created] (YARN-8428) YARN service has ZooKeeper connection leaks

2018-06-14 Thread Eric Yang (JIRA)
Eric Yang created YARN-8428: --- Summary: YARN service has ZooKeeper connection leaks Key: YARN-8428 URL: https://issues.apache.org/jira/browse/YARN-8428 Project: Hadoop YARN Issue Type: Bug

[jira] [Commented] (YARN-8410) Registry DNS lookup fails to return for CNAMEs

2018-06-14 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512912#comment-16512912 ] Eric Yang commented on YARN-8410: - +1 for patch 004. I will commit this shortly. > Registry DNS lookup

[jira] [Commented] (YARN-8428) YARN service has ZooKeeper connection leaks

2018-06-15 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514236#comment-16514236 ] Eric Yang commented on YARN-8428: - YARN service AM requires a ZooKeeper connection to update YARN registry

[jira] [Assigned] (YARN-8428) YARN service has ZooKeeper connection leaks

2018-06-15 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8428: --- Assignee: Eric Yang > YARN service has ZooKeeper connection leaks >

[jira] [Resolved] (YARN-8428) YARN service has ZooKeeper connection leaks

2018-06-15 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang resolved YARN-8428. - Resolution: Won't Fix YARN ServiceClient is using a single connection to connect to ZooKeeper for

[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-06-15 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514439#comment-16514439 ] Eric Yang commented on YARN-8326: - The root cause for the stopping delay coming from clean up containers:

[jira] [Comment Edited] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6

2018-06-15 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514439#comment-16514439 ] Eric Yang edited comment on YARN-8326 at 6/15/18 10:30 PM: --- The root cause for

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508408#comment-16508408 ] Eric Yang commented on YARN-8220: - [~leftnoteasy] Base on RunTensorflowJobUsingNativeServiceSpec.md the

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-10 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507542#comment-16507542 ] Eric Yang commented on YARN-8220: - [~leftnoteasy] Launch_command containers environment variables and

[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508516#comment-16508516 ] Eric Yang commented on YARN-8259: - I prefer #3 to keep abstraction in place, and improve portability. #1

[jira] [Commented] (YARN-8410) Registry DNS lookup fails to return for CNAMEs

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508549#comment-16508549 ] Eric Yang commented on YARN-8410: - [~shaneku...@gmail.com] Thank you for the patch. Root SOA lookup

[jira] [Commented] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-06-11 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508697#comment-16508697 ] Eric Yang commented on YARN-8403: - [~vinodkv]publicRsrc does not trigger notification to application

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-13 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511867#comment-16511867 ] Eric Yang commented on YARN-8414: - The root cause for node manager to crash is contributed by leaking

[jira] [Commented] (YARN-8414) Nodemanager crashes soon if ATSv2 HBase is either down or absent

2018-06-12 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509914#comment-16509914 ] Eric Yang commented on YARN-8414: - Timeline collector can adjust hbase.client.pause setting in

[jira] [Updated] (YARN-8376) Separate white list for docker.trusted.registries and docker.privileged-container.registries

2018-05-29 Thread Eric Yang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8376: Issue Type: Sub-task (was: Improvement) Parent: YARN-3611 > Separate white list for

<    1   2   3   4   5   6   7   8   9   10   >