[jira] [Comment Edited] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664670#comment-16664670 ] Zhankun Tang edited comment on YARN-8927 at 10/26/18 5:19 AM: -- [~eyang] , Sorry for the misleading. I mean we do check with "library" prefix but not using this "library/centos:latest" to replace user's input image name. I agree that we enable local and docker pub repository by default. [~ebadger] Thanks for the detailed discussion here. Really helpful. What YARN does here is adding a white-list for an administrator to allow what "[repository/]image[:tag] " end user can pull(YARN-3854) or run. To keep the end user's experience of running image without repository name consistent with "Docker", I guess we all agreed that leave "library" in "{{docker.trusted.registries}}" by default to enable local images. Since Docker will try to pull it from docker hub if not in local, should we avoid this pull? I think probably no. The Docker hub could be a trusted repo for YARN. And if not, another problem comes up when only allow real local images: how do we configure Docker hub repo for YARN-3854 to pull images? Use another convention preserved words? So maybe set "library" to "docker.trusted.registries" allowing both local and Docker hub is clean and simple? was (Author: tangzhankun): [~eyang] , Sorry for the misleading. I mean we do check with "library" prefix but not using this "library/centos:latest" to replace user's input image name. I agree that we enable local and docker pub repository by default. [~ebadger] Thanks for the detailed discussion here. Really helpful. What YARN does here is adding a white-list for an administrator to allow what "repository/image[:tag] " end user can pull(YARN-3854) or run. To keep the end user's experience of running image without repository name consistent with "Docker", I guess we all agreed that leave "library" in "{{docker.trusted.registries}}" by default to enable local images. Since Docker will try to pull it from docker hub if not in local, should we avoid this pull? I think probably no. The Docker hub could be a trusted repo for YARN. And if not, another problem comes up when only allow real local images: how do we configure Docker hub repo for YARN-3854 to pull images? Use another convention preserved words? So maybe set "library" to "docker.trusted.registries" allowing both local and Docker hub is clean and simple? > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664670#comment-16664670 ] Zhankun Tang commented on YARN-8927: [~eyang] , Sorry for the misleading. I mean we do check with "library" prefix but not using this "library/centos:latest" to replace user's input image name. I agree that we enable local and docker pub repository by default. [~ebadger] Thanks for the detailed discussion here. Really helpful. What YARN does here is adding a white-list for an administrator to allow what "repository/image[:tag] " end user can pull(YARN-3854) or run. To keep the end user's experience of running image without repository name consistent with "Docker", I guess we all agreed that leave "library" in "{{docker.trusted.registries}}" by default to enable local images. Since Docker will try to pull it from docker hub if not in local, should we avoid this pull? I think probably no. The Docker hub could be a trusted repo for YARN. And if not, another problem comes up when only allow real local images: how do we configure Docker hub repo for YARN-3854 to pull images? Use another convention preserved words? So maybe set "library" to "docker.trusted.registries" allowing both local and Docker hub is clean and simple? > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664664#comment-16664664 ] Eric Yang commented on YARN-8914: - [~akhilpb] The add-on scripts are already minified. Some modules are using a mix of regular javascript and ES Module. Xtermjs minified tools does not compress inline ES Module. I don't think we should recompress the minified files to compress the inline ES module. We might step on a field of land mines with ES module compressor because ES module compressors are mostly beta quality only. It seems safer to use the stock distribution for ease of maintenance. > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch, YARN-8914.002.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664641#comment-16664641 ] Akhil PB commented on YARN-8914: [~eyang] Pls use minified js in addon/dist files along with map files. This will reduce the patch size as wells as load time on the browser. Will be able to debug since map files exist along with minified files. > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch, YARN-8914.002.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8902: -- Attachment: YARN-8902.003.patch > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8694) app flex with relative changes does not work
[ https://issues.apache.org/jira/browse/YARN-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664632#comment-16664632 ] kyungwan nam commented on YARN-8694: attaches a new patch 002. It will get the service status from api server prior to requesting flex. so, we can handle relative changes. > app flex with relative changes does not work > > > Key: YARN-8694 > URL: https://issues.apache.org/jira/browse/YARN-8694 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-8694.001.patch, YARN-8694.002.patch > > > I'd like to increase 2 containers as belows. > {code:java} > yarn app -flex my-sleeper -component sleeper +2{code} > but, It did not work. it seems to request 2, not +2. > > ApiServiceClient.actionFlex > {code:java} > @Override > public int actionFlex(String appName, Map componentCounts) > throws IOException, YarnException { > int result = EXIT_SUCCESS; > try { > Service service = new Service(); > service.setName(appName); > service.setState(ServiceState.FLEX); > for (Map.Entry entry : componentCounts.entrySet()) { > Component component = new Component(); > component.setName(entry.getKey()); > Long numberOfContainers = Long.parseLong(entry.getValue()); > component.setNumberOfContainers(numberOfContainers); > service.addComponent(component); > } > String buffer = jsonSerDeser.toJson(service); > ClientResponse response = getApiClient(getServicePath(appName)) > .put(ClientResponse.class, buffer);{code} > It looks like there is no code, which handle “+”, “-“ in > ApiServiceClient.actionFlex -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8694) app flex with relative changes does not work
[ https://issues.apache.org/jira/browse/YARN-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kyungwan nam updated YARN-8694: --- Attachment: YARN-8694.002.patch > app flex with relative changes does not work > > > Key: YARN-8694 > URL: https://issues.apache.org/jira/browse/YARN-8694 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-8694.001.patch, YARN-8694.002.patch > > > I'd like to increase 2 containers as belows. > {code:java} > yarn app -flex my-sleeper -component sleeper +2{code} > but, It did not work. it seems to request 2, not +2. > > ApiServiceClient.actionFlex > {code:java} > @Override > public int actionFlex(String appName, Map componentCounts) > throws IOException, YarnException { > int result = EXIT_SUCCESS; > try { > Service service = new Service(); > service.setName(appName); > service.setState(ServiceState.FLEX); > for (Map.Entry entry : componentCounts.entrySet()) { > Component component = new Component(); > component.setName(entry.getKey()); > Long numberOfContainers = Long.parseLong(entry.getValue()); > component.setNumberOfContainers(numberOfContainers); > service.addComponent(component); > } > String buffer = jsonSerDeser.toJson(service); > ClientResponse response = getApiClient(getServicePath(appName)) > .put(ClientResponse.class, buffer);{code} > It looks like there is no code, which handle “+”, “-“ in > ApiServiceClient.actionFlex -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6985) The wrapper methods in Resources aren't useful
[ https://issues.apache.org/jira/browse/YARN-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664613#comment-16664613 ] Sunil Govindan commented on YARN-6985: -- I agree to this optimization for better readability. > The wrapper methods in Resources aren't useful > -- > > Key: YARN-6985 > URL: https://issues.apache.org/jira/browse/YARN-6985 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-6985.1.patch > > > The code would be shorter, easier to read, and a tiny smidgeon faster if we > just called the {{ResourceCalculator}} methods directly. I don't see where > the wrappers improve the code in any way. > For example, with wrappers:{code}Resource normalized = > Resources.normalize( > resourceCalculator, ask, minimumResource, > maximumResource, incrementResource); > {code} and without wrappers:{code}Resource normalized = > resourceCalculator.normalize(ask, minimumResource, > maximumResource, incrementResource);{code} > The difference isn't huge, but I find the latter much more readable. With > the former I always have to figure out which parameters are which, because > passing in the {{ResourceCalculator}} adds in an unrelated additional > parameter at the head of the list. > There may be some cases where the wrapper methods are mixed in with calls to > legitimate {{Resources}} methods, making the code more consistent to use the > wrappers. In those cases, that may be a reason to keep and use the wrapper > method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6985) The wrapper methods in Resources aren't useful
[ https://issues.apache.org/jira/browse/YARN-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664612#comment-16664612 ] Suma Shivaprasad commented on YARN-6985: [~templedf] Makes sense. Replaced wrapper calls to normalize and normalizeDown. > The wrapper methods in Resources aren't useful > -- > > Key: YARN-6985 > URL: https://issues.apache.org/jira/browse/YARN-6985 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-6985.1.patch > > > The code would be shorter, easier to read, and a tiny smidgeon faster if we > just called the {{ResourceCalculator}} methods directly. I don't see where > the wrappers improve the code in any way. > For example, with wrappers:{code}Resource normalized = > Resources.normalize( > resourceCalculator, ask, minimumResource, > maximumResource, incrementResource); > {code} and without wrappers:{code}Resource normalized = > resourceCalculator.normalize(ask, minimumResource, > maximumResource, incrementResource);{code} > The difference isn't huge, but I find the latter much more readable. With > the former I always have to figure out which parameters are which, because > passing in the {{ResourceCalculator}} adds in an unrelated additional > parameter at the head of the list. > There may be some cases where the wrapper methods are mixed in with calls to > legitimate {{Resources}} methods, making the code more consistent to use the > wrappers. In those cases, that may be a reason to keep and use the wrapper > method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8917) Absolute (maximum) capacity of level3+ queues is wrongly calculated for absolute resource
[ https://issues.apache.org/jira/browse/YARN-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664611#comment-16664611 ] Sunil Govindan commented on YARN-8917: -- [~Tao Yang] Thank you for the patch. Few test case failure in TestRMAdminService. I am seeing this for first time. But in first glance not related as well. could you please double check. Thank you > Absolute (maximum) capacity of level3+ queues is wrongly calculated for > absolute resource > - > > Key: YARN-8917 > URL: https://issues.apache.org/jira/browse/YARN-8917 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8917.001.patch, YARN-8917.002.patch > > > Absolute capacity should be equal to multiply capacity by parent-queue's > absolute-capacity, > but currently it's calculated as dividing capacity by parent-queue's > absolute-capacity. > Calculation for absolute-maximum-capacity has the same problem. > For example: > root.a capacity=0.4 maximum-capacity=0.8 > root.a.a1 capacity=0.5 maximum-capacity=0.6 > Absolute capacity of root.a.a1 should be 0.2 but is wrongly calculated as 1.25 > Absolute maximum capacity of root.a.a1 should be 0.48 but is wrongly > calculated as 0.75 > Moreover: > {{childQueue.getQueueCapacities().getCapacity()}} should be changed to > {{childQueue.getQueueCapacities().getCapacity(label)}} to avoid getting wrong > capacity from default partition when calculating for a non-default partition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6985) The wrapper methods in Resources aren't useful
[ https://issues.apache.org/jira/browse/YARN-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-6985: --- Attachment: YARN-6985.1.patch > The wrapper methods in Resources aren't useful > -- > > Key: YARN-6985 > URL: https://issues.apache.org/jira/browse/YARN-6985 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Priority: Major > Attachments: YARN-6985.1.patch > > > The code would be shorter, easier to read, and a tiny smidgeon faster if we > just called the {{ResourceCalculator}} methods directly. I don't see where > the wrappers improve the code in any way. > For example, with wrappers:{code}Resource normalized = > Resources.normalize( > resourceCalculator, ask, minimumResource, > maximumResource, incrementResource); > {code} and without wrappers:{code}Resource normalized = > resourceCalculator.normalize(ask, minimumResource, > maximumResource, incrementResource);{code} > The difference isn't huge, but I find the latter much more readable. With > the former I always have to figure out which parameters are which, because > passing in the {{ResourceCalculator}} adds in an unrelated additional > parameter at the head of the list. > There may be some cases where the wrapper methods are mixed in with calls to > legitimate {{Resources}} methods, making the code more consistent to use the > wrappers. In those cases, that may be a reason to keep and use the wrapper > method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6985) The wrapper methods in Resources aren't useful
[ https://issues.apache.org/jira/browse/YARN-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad reassigned YARN-6985: -- Assignee: Suma Shivaprasad > The wrapper methods in Resources aren't useful > -- > > Key: YARN-6985 > URL: https://issues.apache.org/jira/browse/YARN-6985 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-6985.1.patch > > > The code would be shorter, easier to read, and a tiny smidgeon faster if we > just called the {{ResourceCalculator}} methods directly. I don't see where > the wrappers improve the code in any way. > For example, with wrappers:{code}Resource normalized = > Resources.normalize( > resourceCalculator, ask, minimumResource, > maximumResource, incrementResource); > {code} and without wrappers:{code}Resource normalized = > resourceCalculator.normalize(ask, minimumResource, > maximumResource, incrementResource);{code} > The difference isn't huge, but I find the latter much more readable. With > the former I always have to figure out which parameters are which, because > passing in the {{ResourceCalculator}} adds in an unrelated additional > parameter at the head of the list. > There may be some cases where the wrapper methods are mixed in with calls to > legitimate {{Resources}} methods, making the code more consistent to use the > wrappers. In those cases, that may be a reason to keep and use the wrapper > method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4249) Many options in "yarn application" command is not documented
[ https://issues.apache.org/jira/browse/YARN-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664595#comment-16664595 ] Suma Shivaprasad commented on YARN-4249: These are documented in YarnCommands.md. Hence closing the issue > Many options in "yarn application" command is not documented > > > Key: YARN-4249 > URL: https://issues.apache.org/jira/browse/YARN-4249 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Priority: Major > > in document only few options are specified. > {code} > Usage: `yarn application [options] ` > | COMMAND\_OPTIONS | Description | > |: |: | > | -appStates \ | Works with -list to filter applications based on > input comma-separated list of application states. The valid application state > can be one of the following: ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, > RUNNING, FINISHED, FAILED, KILLED | > | -appTypes \ | Works with -list to filter applications based on > input comma-separated list of application types. | > | -list | Lists applications from the RM. Supports optional use of -appTypes > to filter applications based on application type, and -appStates to filter > applications based on application state. | > | -kill \ | Kills the application. | > | -status \ | Prints the status of the application. | > {code} > some options are missing like > -appId Specify Application Id to be operated > -help Displays help for all commands. > -movetoqueueMoves the application to a different queue. > -queue Works with the movetoqueue command to specify > which queue to move an application to. > -updatePriority update priority of an > application.ApplicationId can be passed using 'appId' option. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-4249) Many options in "yarn application" command is not documented
[ https://issues.apache.org/jira/browse/YARN-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad resolved YARN-4249. Resolution: Not A Problem > Many options in "yarn application" command is not documented > > > Key: YARN-4249 > URL: https://issues.apache.org/jira/browse/YARN-4249 > Project: Hadoop YARN > Issue Type: Bug >Reporter: nijel >Priority: Major > > in document only few options are specified. > {code} > Usage: `yarn application [options] ` > | COMMAND\_OPTIONS | Description | > |: |: | > | -appStates \ | Works with -list to filter applications based on > input comma-separated list of application states. The valid application state > can be one of the following: ALL, NEW, NEW\_SAVING, SUBMITTED, ACCEPTED, > RUNNING, FINISHED, FAILED, KILLED | > | -appTypes \ | Works with -list to filter applications based on > input comma-separated list of application types. | > | -list | Lists applications from the RM. Supports optional use of -appTypes > to filter applications based on application type, and -appStates to filter > applications based on application state. | > | -kill \ | Kills the application. | > | -status \ | Prints the status of the application. | > {code} > some options are missing like > -appId Specify Application Id to be operated > -help Displays help for all commands. > -movetoqueueMoves the application to a different queue. > -queue Works with the movetoqueue command to specify > which queue to move an application to. > -updatePriority update priority of an > application.ApplicationId can be passed using 'appId' option. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7439) Minor improvements to Reservation System documentation/exceptions
[ https://issues.apache.org/jira/browse/YARN-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-7439: --- Attachment: YARN-7439.1.patch > Minor improvements to Reservation System documentation/exceptions > - > > Key: YARN-7439 > URL: https://issues.apache.org/jira/browse/YARN-7439 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Priority: Trivial > Attachments: YARN-7439.1.patch > > > This JIRA tracks a couple of minor issues with docs and exception for the > reservation system. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7439) Minor improvements to Reservation System documentation/exceptions
[ https://issues.apache.org/jira/browse/YARN-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad reassigned YARN-7439: -- Assignee: Suma Shivaprasad > Minor improvements to Reservation System documentation/exceptions > - > > Key: YARN-7439 > URL: https://issues.apache.org/jira/browse/YARN-7439 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Suma Shivaprasad >Priority: Trivial > Attachments: YARN-7439.1.patch > > > This JIRA tracks a couple of minor issues with docs and exception for the > reservation system. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8905) [Router] Add JvmMetricsInfo and pause monitor
[ https://issues.apache.org/jira/browse/YARN-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-8905: Attachment: YARN-8905-002.patch > [Router] Add JvmMetricsInfo and pause monitor > - > > Key: YARN-8905 > URL: https://issues.apache.org/jira/browse/YARN-8905 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-8905-001.patch, YARN-8905-002.patch > > > Similar to resourcemanager and nodemanager serivce we can add JvmMetricsInfo > to router service too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8945) Calculation of maximum applications should respect specified and global maximum applications for absolute resource
[ https://issues.apache.org/jira/browse/YARN-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664584#comment-16664584 ] Tao Yang commented on YARN-8945: Attached v1 patch for review. > Calculation of maximum applications should respect specified and global > maximum applications for absolute resource > -- > > Key: YARN-8945 > URL: https://issues.apache.org/jira/browse/YARN-8945 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8945.001.patch > > > Currently maximum applications is expected to be calculated as follow > according to priority when using percentage based capacity: > (1) equals specified maximum applications for queues > (2) equals global maximum applications > (3) calculated as queue-capacity * maximum-system-applications > But for absolute resource configuration, maximum applications is calculated > as (3) in ParentQueue#deriveCapacityFromAbsoluteConfigurations, this is a > strict limit for high max-capacity and low capacity queues which have little > guaranteed resources but want to use lots of share resources. So I propose to > share the maximum applications calculation of percentage based capacity, > absolute resource can call the same calculation if necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8945) Calculation of maximum applications should respect specified and global maximum applications for absolute resource
[ https://issues.apache.org/jira/browse/YARN-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8945: --- Attachment: YARN-8945.001.patch > Calculation of maximum applications should respect specified and global > maximum applications for absolute resource > -- > > Key: YARN-8945 > URL: https://issues.apache.org/jira/browse/YARN-8945 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8945.001.patch > > > Currently maximum applications is expected to be calculated as follow > according to priority when using percentage based capacity: > (1) equals specified maximum applications for queues > (2) equals global maximum applications > (3) calculated as queue-capacity * maximum-system-applications > But for absolute resource configuration, maximum applications is calculated > as (3) in ParentQueue#deriveCapacityFromAbsoluteConfigurations, this is a > strict limit for high max-capacity and low capacity queues which have little > guaranteed resources but want to use lots of share resources. So I propose to > share the maximum applications calculation of percentage based capacity, > absolute resource can call the same calculation if necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7754) [Atsv2] Update document for running v1 and v2 TS
[ https://issues.apache.org/jira/browse/YARN-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-7754: --- Attachment: YARN-7754.1.patch > [Atsv2] Update document for running v1 and v2 TS > > > Key: YARN-7754 > URL: https://issues.apache.org/jira/browse/YARN-7754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Priority: Major > Attachments: YARN-7754.1.patch > > > Post YARN-6736, RM can publish events into both v1 and v2 TS. Newer > configuration need to be updated in document. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7754) [Atsv2] Update document for running v1 and v2 TS
[ https://issues.apache.org/jira/browse/YARN-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad reassigned YARN-7754: -- Assignee: Suma Shivaprasad > [Atsv2] Update document for running v1 and v2 TS > > > Key: YARN-7754 > URL: https://issues.apache.org/jira/browse/YARN-7754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-7754.1.patch > > > Post YARN-6736, RM can publish events into both v1 and v2 TS. Newer > configuration need to be updated in document. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8917) Absolute (maximum) capacity of level3+ queues is wrongly calculated for absolute resource
[ https://issues.apache.org/jira/browse/YARN-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664570#comment-16664570 ] Tao Yang commented on YARN-8917: Hi, [~sunilg] [~leftnoteasy], v2 patch updated UT to add assert annotations, could you please help to review again? Thanks. > Absolute (maximum) capacity of level3+ queues is wrongly calculated for > absolute resource > - > > Key: YARN-8917 > URL: https://issues.apache.org/jira/browse/YARN-8917 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8917.001.patch, YARN-8917.002.patch > > > Absolute capacity should be equal to multiply capacity by parent-queue's > absolute-capacity, > but currently it's calculated as dividing capacity by parent-queue's > absolute-capacity. > Calculation for absolute-maximum-capacity has the same problem. > For example: > root.a capacity=0.4 maximum-capacity=0.8 > root.a.a1 capacity=0.5 maximum-capacity=0.6 > Absolute capacity of root.a.a1 should be 0.2 but is wrongly calculated as 1.25 > Absolute maximum capacity of root.a.a1 should be 0.48 but is wrongly > calculated as 0.75 > Moreover: > {{childQueue.getQueueCapacities().getCapacity()}} should be changed to > {{childQueue.getQueueCapacities().getCapacity(label)}} to avoid getting wrong > capacity from default partition when calculating for a non-default partition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8776) Container Executor change to create stdin/stdout pipeline
[ https://issues.apache.org/jira/browse/YARN-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664565#comment-16664565 ] Hadoop QA commented on YARN-8776: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 25 new + 95 unchanged - 1 fixed = 120 total (was 96) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 50s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 83m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8776 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945668/YARN-8776.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 50dd8c466042 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 38a65e3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/22340/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | whitespace |
[jira] [Comment Edited] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development
[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664540#comment-16664540 ] Zhankun Tang edited comment on YARN-8851 at 10/26/18 2:22 AM: -- [~csingh] , Thanks for the review! {quote}1. {code:java} DeviceRegisterRequest register(); {code} This is misleading. {{register()}} would mean that the device plugin is registering itself. However, here we need some information from the device plugin. Maybe, it can be changed to something like {code:java} DeviceResourceInfo getDeviceResourceInfo() {code} {quote} {color:#d04437}Zhankun->{color} Yeah. Weiwei also mentioned this problem. "getDeviceResourceInfo" is also very good. Now we have two names for it. :) "DeviceRegisterRequest getDeviceResourceInfo" and "DeviceRegisterRequest getRegisterRequestInfo()". Maybe the later one is more acurate since the "DeviceRegisterRequest" may contains more info besides resource name & version we currently want? {quote}2. {code:java} DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime); {code} If this is get the {{DeviceRuntimeSpec}}, then should it be called {{getDeviceRuntimeSpec()}} ? {quote} {color:#d04437}Zhankun->{color} That's a good idea. {quote}3. Since we have callback for devices released, do we also need a callback for devices allocated? {{void onDevicesAllocated(Set allocatedDevices)}} {quote} {color:#d04437}Zhankun->{color} We have another interface "DevicePluginScheduler" to do this. And one may ask the reason why it's two interfaces, the intention here is that this scheduler interface is optional. And the other one is a must. {code:java} /** * Called when allocating devices. The framework will do all device book keeping * and fail recovery. So this hook should only do scheduling based on available devices * passed in. This method could be invoked multiple times. * @param availableDevices Devices allowed to be chosen from. * @param count Number of device to be allocated. * @return a set of {@link Device} * */ Set allocateDevices(Set availableDevices, Integer count);{code} {quote}4. Just a suggestion about logging Use slf4j logging format since that's the framework we are using and it improves readability of logging stmts. eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. Initializing..");}} we can use {code:java} LOG.info("Adapter of {} created. Initializing..", pluginClassName);{code} {quote} {color:#d04437}Zhankun ->{color} Yeah. I also noted that we're using slf4j here in this "ResourcePluginManager" instead of log4j. Will change the logging format to slf4j. was (Author: tangzhankun): [~csingh] , Thanks for the review! {quote}1. {code:java} DeviceRegisterRequest register(); {code} This is misleading. {{register()}} would mean that the device plugin is registering itself. However, here we need some information from the device plugin. Maybe, it can be changed to something like {code:java} DeviceResourceInfo getDeviceResourceInfo() {code} {quote} {color:#d04437}Zhankun->{color} Yeah. Weiwei also mentioned this problem. "getDeviceResourceInfo" is also very good. Now we have two names for it. :) "DeviceRegisterRequest getDeviceResourceInfo" and "DeviceRegisterRequest getRegisterRequestInfo()". Maybe the later one is more acurate since the "DeviceRegisterRequest" may contains more info besides resource name & version we currently want? {quote}2. {code:java} DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime); {code} If this is get the {{DeviceRuntimeSpec}}, then should it be called {{getDeviceRuntimeSpec()}} ? {quote} {color:#d04437}Zhankun->{color} That's a good idea. {quote}3. Since we have callback for devices released, do we also need a callback for devices allocated? {{void onDevicesAllocated(Set allocatedDevices)}} {quote} {color:#d04437}Zhankun->{color} We have another interface "DevicePluginScheduler" to do this. And one may ask the reason why it's two interfaces, the intention here is that this scheduler interface is optional. And the other one is a must. {code:java} /** * Called when allocating devices. The framework will do all device book keeping * and fail recovery. So this hook should only do scheduling based on available devices * passed in. This method could be invoked multiple times. * @param availableDevices Devices allowed to be chosen from. * @param count Number of device to be allocated. * @return a set of {@link Device} * */ Set allocateDevices(Set availableDevices, Integer count);{code} {quote}4. Just a suggestion about logging Use slf4j logging format since that's the framework we are using and it improves readability of logging stmts. eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. Initializing..");}} we can use {code:java} LOG.info("Adapter of {} created. Initializing..", pluginClassName);{code} {quote} {color:#d04437}Zhankun
[jira] [Updated] (YARN-8945) Calculation of maximum applications should respect specified and global maximum applications for absolute resource
[ https://issues.apache.org/jira/browse/YARN-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8945: --- Summary: Calculation of maximum applications should respect specified and global maximum applications for absolute resource (was: Calculation of Maximum applications should respect specified and global maximum applications for absolute resource) > Calculation of maximum applications should respect specified and global > maximum applications for absolute resource > -- > > Key: YARN-8945 > URL: https://issues.apache.org/jira/browse/YARN-8945 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > > Currently maximum applications is expected to be calculated as follow > according to priority when using percentage based capacity: > (1) equals specified maximum applications for queues > (2) equals global maximum applications > (3) calculated as queue-capacity * maximum-system-applications > But for absolute resource configuration, maximum applications is calculated > as (3) in ParentQueue#deriveCapacityFromAbsoluteConfigurations, this is a > strict limit for high max-capacity and low capacity queues which have little > guaranteed resources but want to use lots of share resources. So I propose to > share the maximum applications calculation of percentage based capacity, > absolute resource can call the same calculation if necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8945) Calculation of Maximum applications should respect specified and global maximum applications for absolute resource
Tao Yang created YARN-8945: -- Summary: Calculation of Maximum applications should respect specified and global maximum applications for absolute resource Key: YARN-8945 URL: https://issues.apache.org/jira/browse/YARN-8945 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.2.0 Reporter: Tao Yang Assignee: Tao Yang Currently maximum applications is expected to be calculated as follow according to priority when using percentage based capacity: (1) equals specified maximum applications for queues (2) equals global maximum applications (3) calculated as queue-capacity * maximum-system-applications But for absolute resource configuration, maximum applications is calculated as (3) in ParentQueue#deriveCapacityFromAbsoluteConfigurations, this is a strict limit for high max-capacity and low capacity queues which have little guaranteed resources but want to use lots of share resources. So I propose to share the maximum applications calculation of percentage based capacity, absolute resource can call the same calculation if necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development
[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664540#comment-16664540 ] Zhankun Tang edited comment on YARN-8851 at 10/26/18 2:13 AM: -- [~csingh] , Thanks for the review! {quote}1. {code:java} DeviceRegisterRequest register(); {code} This is misleading. {{register()}} would mean that the device plugin is registering itself. However, here we need some information from the device plugin. Maybe, it can be changed to something like {code:java} DeviceResourceInfo getDeviceResourceInfo() {code} {quote} {color:#d04437}Zhankun->{color} Yeah. Weiwei also mentioned this problem. "getDeviceResourceInfo" is also very good. Now we have two names for it. :) "DeviceRegisterRequest getDeviceResourceInfo" and "DeviceRegisterRequest getRegisterRequestInfo()". Maybe the later one is more acurate since the "DeviceRegisterRequest" may contains more info besides resource name & version we currently want? {quote}2. {code:java} DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime); {code} If this is get the {{DeviceRuntimeSpec}}, then should it be called {{getDeviceRuntimeSpec()}} ? {quote} {color:#d04437}Zhankun->{color} That's a good idea. {quote}3. Since we have callback for devices released, do we also need a callback for devices allocated? {{void onDevicesAllocated(Set allocatedDevices)}} {quote} {color:#d04437}Zhankun->{color} We have another interface "DevicePluginScheduler" to do this. And one may ask the reason why it's two interfaces, the intention here is that this scheduler interface is optional. And the other one is a must. {code:java} /** * Called when allocating devices. The framework will do all device book keeping * and fail recovery. So this hook should only do scheduling based on available devices * passed in. This method could be invoked multiple times. * @param availableDevices Devices allowed to be chosen from. * @param count Number of device to be allocated. * @return a set of {@link Device} * */ Set allocateDevices(Set availableDevices, Integer count);{code} {quote}4. Just a suggestion about logging Use slf4j logging format since that's the framework we are using and it improves readability of logging stmts. eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. Initializing..");}} we can use {code:java} LOG.info("Adapter of {} created. Initializing..", pluginClassName);{code} {quote} {color:#d04437}Zhankun ->{color} Yeah. I also noted that we're using slf4j here in this "ResourcePluginManager" instead of log4j. Will change it. was (Author: tangzhankun): [~csingh] , Thanks for the review! {quote}1. {code:java} DeviceRegisterRequest register(); {code} This is misleading. {{register()}} would mean that the device plugin is registering itself. However, here we need some information from the device plugin. Maybe, it can be changed to something like {code:java} DeviceResourceInfo getDeviceResourceInfo() {code} {quote} Zhankun-> Yeah. Weiwei also mentioned this problem. "getDeviceResourceInfo" is also very good. Now we have two names for it. :) "DeviceRegisterRequest getDeviceResourceInfo" and "DeviceRegisterRequest getRegisterRequestInfo()". Maybe the later one is more acurate since the "DeviceRegisterRequest" may contains more info besides resource name & version we currently want? {quote}2. {code:java} DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime); {code} If this is get the {{DeviceRuntimeSpec}}, then should it be called {{getDeviceRuntimeSpec()}} ? {quote} Zhankun-> That's a good idea. {quote}3. Since we have callback for devices released, do we also need a callback for devices allocated? {{void onDevicesAllocated(Set allocatedDevices)}} {quote} Zhankun-> We have another interface "DevicePluginScheduler" to do this. And one may ask the reason why it's two interfaces, the intention here is that this scheduler interface is optional. And the other one is a must. {code:java} /** * Called when allocating devices. The framework will do all device book keeping * and fail recovery. So this hook should only do scheduling based on available devices * passed in. This method could be invoked multiple times. * @param availableDevices Devices allowed to be chosen from. * @param count Number of device to be allocated. * @return a set of {@link Device} * */ Set allocateDevices(Set availableDevices, Integer count);{code} {quote}4. Just a suggestion about logging Use slf4j logging format since that's the framework we are using and it improves readability of logging stmts. eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. Initializing..");}} we can use {code:java} LOG.info("Adapter of {} created. Initializing..", pluginClassName);{code} {quote} Zhankun -> Yeah. I also noted that we're using slf4j here in this "ResourcePluginManager" instead of log4j. Will
[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development
[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664540#comment-16664540 ] Zhankun Tang commented on YARN-8851: [~csingh] , Thanks for the review! {quote}1. {code:java} DeviceRegisterRequest register(); {code} This is misleading. {{register()}} would mean that the device plugin is registering itself. However, here we need some information from the device plugin. Maybe, it can be changed to something like {code:java} DeviceResourceInfo getDeviceResourceInfo() {code} {quote} Zhankun-> Yeah. Weiwei also mentioned this problem. "getDeviceResourceInfo" is also very good. Now we have two names for it. :) "DeviceRegisterRequest getDeviceResourceInfo" and "DeviceRegisterRequest getRegisterRequestInfo()". Maybe the later one is more acurate since the "DeviceRegisterRequest" may contains more info besides resource name & version we currently want? {quote}2. {code:java} DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime); {code} If this is get the {{DeviceRuntimeSpec}}, then should it be called {{getDeviceRuntimeSpec()}} ? {quote} Zhankun-> That's a good idea. {quote}3. Since we have callback for devices released, do we also need a callback for devices allocated? {{void onDevicesAllocated(Set allocatedDevices)}} {quote} Zhankun-> We have another interface "DevicePluginScheduler" to do this. And one may ask the reason why it's two interfaces, the intention here is that this scheduler interface is optional. And the other one is a must. {code:java} /** * Called when allocating devices. The framework will do all device book keeping * and fail recovery. So this hook should only do scheduling based on available devices * passed in. This method could be invoked multiple times. * @param availableDevices Devices allowed to be chosen from. * @param count Number of device to be allocated. * @return a set of {@link Device} * */ Set allocateDevices(Set availableDevices, Integer count);{code} {quote}4. Just a suggestion about logging Use slf4j logging format since that's the framework we are using and it improves readability of logging stmts. eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. Initializing..");}} we can use {code:java} LOG.info("Adapter of {} created. Initializing..", pluginClassName);{code} {quote} Zhankun -> Yeah. I also noted that we're using slf4j here in this "ResourcePluginManager" instead of log4j. Will change it. > [Umbrella] A new pluggable device plugin framework to ease vendor plugin > development > > > Key: YARN-8851 > URL: https://issues.apache.org/jira/browse/YARN-8851 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8851-WIP2-trunk.001.patch, > YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, > YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, > YARN-8851-WIP7-trunk.001.patch, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal.pdf > > > At present, we support GPU/FPGA device in YARN through a native, coupling > way. But it's difficult for a vendor to implement such a device plugin > because the developer needs much knowledge of YARN internals. And this brings > burden to the community to maintain both YARN core and vendor-specific code. > Here we propose a new device plugin framework to ease vendor device plugin > development and provide a more flexible way to integrate with YARN NM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8921) SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe to 4 GBs
[ https://issues.apache.org/jira/browse/YARN-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664465#comment-16664465 ] Hadoop QA commented on YARN-8921: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} YARN-1011 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 26s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} YARN-1011 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 26s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8921 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945648/YARN-8921-YARN-1011.02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2cfaece8f7b9 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | YARN-1011 / da7722b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/22339/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22339/testReport/ | | Max. process+thread count | 333 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664436#comment-16664436 ] Eric Yang commented on YARN-8914: - [~akhilpb] I removed overlay.js for now because it can be implemented later in YARN-8839. Browser throws file not found exception when js.map files don't exist. Xtermjs distribution have the minified js reference to js.map file for debug purpose. It might be better to keep xtermjs in the original form. > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch, YARN-8914.002.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8914: Attachment: YARN-8914.002.patch > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch, YARN-8914.002.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6167) RM option to delegate NM loss container action to AM
[ https://issues.apache.org/jira/browse/YARN-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664424#comment-16664424 ] Hadoop QA commented on YARN-6167: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 34s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 72 unchanged - 0 fixed = 73 total (was 72) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 7s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 53s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-6167 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945634/YARN-6167.02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 413bbb7959ee 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 34b2521 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664339#comment-16664339 ] Chandni Singh commented on YARN-8672: - [~jlowe] k. I will work at the suggested solution. There is an additional problem that I would like to address with this change: Currently the localizer ID is assigned the Container ID. Since we need to use a token file private to the localizer instance, we need an identifier which is unique for the localizer. I will make the localizer ID unique. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Jason Lowe >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8672.001.patch, YARN-8672.002.patch, > YARN-8672.003.patch, YARN-8672.004.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664324#comment-16664324 ] Eric Badger commented on YARN-8927: --- [~eyang], I see your concern now. However, that would still be a problem (albeit to a smaller extent) with using {{library}}. Admins that want to trust local images don't necessarily want to trust the {{library}} repo on dockerhub. Outside of removing all default registries, is there a way to allow trusted local images? We would basically need to make sure that {{docker run}} only ran on local images (which I don't believe is possible) and have a separate pull phase before running. Otherwise, if the image doesn't exist locally it will always go out to the default registries to try and pull it. I guess maybe we could do a check on the local images when we see that there is an image that wants to be run, needs to be trusted, has no registry prepended to the name, and {{docker.trusted.registries}} contains {{library}}. Then we would only run the container if the image in question was already there. But then you couldn't run an image from a default registry from the {{library}} repo unless you gave its full URI. Maybe that's ok. > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8776) Container Executor change to create stdin/stdout pipeline
[ https://issues.apache.org/jira/browse/YARN-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664317#comment-16664317 ] Eric Yang commented on YARN-8776: - Patch 001 is based on [~Zian Chen]'s work in this area. I made some changes to connect the dots, and the system is working as expected. > Container Executor change to create stdin/stdout pipeline > - > > Key: YARN-8776 > URL: https://issues.apache.org/jira/browse/YARN-8776 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8776.001.patch > > > The pipeline is built to connect the stdin/stdout channel from WebSocket > servlet through container-executor to docker executor. So when the WebSocket > servlet is started, we need to invoke container-executor “dockerExec” method > (which will be implemented) to create a new docker executor and use “docker > exec -it $ContainerId” command which executes an interactive bash shell on > the container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8776) Container Executor change to create stdin/stdout pipeline
[ https://issues.apache.org/jira/browse/YARN-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8776: Attachment: YARN-8776.001.patch > Container Executor change to create stdin/stdout pipeline > - > > Key: YARN-8776 > URL: https://issues.apache.org/jira/browse/YARN-8776 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Major > Labels: Docker > Attachments: YARN-8776.001.patch > > > The pipeline is built to connect the stdin/stdout channel from WebSocket > servlet through container-executor to docker executor. So when the WebSocket > servlet is started, we need to invoke container-executor “dockerExec” method > (which will be implemented) to create a new docker executor and use “docker > exec -it $ContainerId” command which executes an interactive bash shell on > the container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664312#comment-16664312 ] Eric Yang commented on YARN-8927: - [~ebadger] I am some what concerned with "local" or "localhost" being the name to toggle local registry. "library" is a reserved word from docker point of view. No third party can publish to library without Docker Inc approval or the image is already resided locally and tagged by someone who has docker rights. Unknown party might be able to create a "local" or "localhost" registry on docker hub to defeat the docker.trusted.registries mechanism, if we didn't choose the keyword carefully. > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664310#comment-16664310 ] Hadoop QA commented on YARN-6523: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 59s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 3m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 53s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}110m 40s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}209m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-6523 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945612/YARN-6523.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname
[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664285#comment-16664285 ] Jason Lowe commented on YARN-6523: -- Thanks for updating the patch! All PBImpl set methods must call maybeInitBuilder before , otherwise the set method risks an NPE or the get methods could still think the value is coming from a protocol buffer rather than the builder. Does the registration request and response really need a token sequence number field? I think the sequence number only needs to be associated with the heartbeat request to let the RM know where the NM is with respect to the credentials timeline and in the node heartbeat response so the NM knows how to update its own concept of the credentials "timestamp." I'm not seeing how it helps for the NM to report this in the registration request, and it seems actively harmful in the registration response since the token sequence number could be updated on the NM side without actually receiving the updated tokens. Has the RM failover scenario been considered? Arbitrary thread sleeps are a pet peeve of mine and lead to flaky an/or unnecessarily slow unit tests. It would be good to remove the sleeps from the unit tests making them either directly event driven rather than polled (e.g.: through use of CountdownLatch/CyclicBarrier/etc) or use GenericTestUtils.waitFor() with a small poll interval to wait for the necessary condition if it has to be polled. I haven't personally run the unit test in this patch yet, but just looking at it I counted at least 90 seconds of sleeping which makes for a long, single test. > Newly retrieved security Tokens are sent as part of each heartbeat to each > node from RM which is not desirable in large cluster > --- > > Key: YARN-6523 > URL: https://issues.apache.org/jira/browse/YARN-6523 > Project: Hadoop YARN > Issue Type: Improvement > Components: RM >Affects Versions: 2.8.0, 2.7.3 >Reporter: Naganarasimha G R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-6523.001.patch, YARN-6523.002.patch, > YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch > > > Currently as part of heartbeat response RM sets all application's tokens > though all applications might not be active on the node. On top of it > NodeHeartbeatResponsePBImpl converts tokens for each app into > SystemCredentialsForAppsProto. Hence for each node and each heartbeat too > many SystemCredentialsForAppsProto objects were getting created. > We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with > 8GB RAM configured for RM -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664254#comment-16664254 ] Eric Yang commented on YARN-8569: - The failed test cases are not related to this patch. > Create an interface to provide cluster information to application > - > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8569 YARN sysfs interface to provide cluster > information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, > YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, > YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, > YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch, > YARN-8569.012.patch, YARN-8569.013.patch, YARN-8569.014.patch, > YARN-8569.015.patch, YARN-8569.016.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:,ps1.example.com: \ > --worker_hosts=worker0.example.com:,worker1.example.com: \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development
[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664228#comment-16664228 ] Chandni Singh edited comment on YARN-8851 at 10/25/18 8:02 PM: --- [~tangzhankun] Thanks for working on this. I have few initial comments about the Device Plugin API 1. {code:java} DeviceRegisterRequest register(); {code} This is misleading. {{register()}} would mean that the device plugin is registering itself. However, here we need some information from the device plugin. Maybe, it can be changed to something like {code:java} DeviceResourceInfo getDeviceResourceInfo() {code} 2. {code:java} DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime); {code} If this is get the {{DeviceRuntimeSpec}}, then should it be called {{getDeviceRuntimeSpec()}} ? 3. Since we have callback for devices released, do we also need a callback for devices allocated? {{void onDevicesAllocated(Set allocatedDevices)}} 4. Just a suggestion about logging Use slf4j logging format since that's the framework we are using and it improves readability of logging stmts. eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. Initializing..");}} we can use {code} LOG.info("Adapter of {} created. Initializing..", pluginClassName); {code} was (Author: csingh): [~tangzhankun] Thanks for working on this. I have few initial comments about the Device Plugin API 1. {code:java} DeviceRegisterRequest register(); {code} This is misleading. {{register()}} would mean that the device plugin is registering itself. However, here we need some information from the device plugin. Maybe, it can be changed to something like {code:java} DeviceResourceInfo getDeviceResourceInfo() {code} 2. {code:java} DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime); {code} If this is get the {{DeviceRuntimeSpec}}, then should it be called {{getDeviceRuntimeSpec()}} ? 3. Since we have callback for devices released, do we also need a callback for devices allocated? \{{ void onDevicesAllocated(Set allocatedDevices)}} 4. Just a suggestion about logging Use slf4j logging format since that's the framework we are using and it improves readability of logging stmts. eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. Initializing..");}} we can use : \{{LOG.info("Adapter of {} created. Initializing..", pluginClassName); }} > [Umbrella] A new pluggable device plugin framework to ease vendor plugin > development > > > Key: YARN-8851 > URL: https://issues.apache.org/jira/browse/YARN-8851 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8851-WIP2-trunk.001.patch, > YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, > YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, > YARN-8851-WIP7-trunk.001.patch, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal.pdf > > > At present, we support GPU/FPGA device in YARN through a native, coupling > way. But it's difficult for a vendor to implement such a device plugin > because the developer needs much knowledge of YARN internals. And this brings > burden to the community to maintain both YARN core and vendor-specific code. > Here we propose a new device plugin framework to ease vendor device plugin > development and provide a more flexible way to integrate with YARN NM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development
[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664228#comment-16664228 ] Chandni Singh commented on YARN-8851: - [~tangzhankun] Thanks for working on this. I have few initial comments about the Device Plugin API 1. {code:java} DeviceRegisterRequest register(); {code} This is misleading. {{register()}} would mean that the device plugin is registering itself. However, here we need some information from the device plugin. Maybe, it can be changed to something like {code:java} DeviceResourceInfo getDeviceResourceInfo() {code} 2. {code:java} DeviceRuntimeSpec onDevicesUse(Set allocatedDevices, String runtime); {code} If this is get the {{DeviceRuntimeSpec}}, then should it be called {{getDeviceRuntimeSpec()}} ? 3. Since we have callback for devices released, do we also need a callback for devices allocated? \{{ void onDevicesAllocated(Set allocatedDevices)}} 4. Just a suggestion about logging Use slf4j logging format since that's the framework we are using and it improves readability of logging stmts. eg. instead of {{LOG.info("Adapter of " + pluginClassName + " created. Initializing..");}} we can use : \{{LOG.info("Adapter of {} created. Initializing..", pluginClassName); }} > [Umbrella] A new pluggable device plugin framework to ease vendor plugin > development > > > Key: YARN-8851 > URL: https://issues.apache.org/jira/browse/YARN-8851 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8851-WIP2-trunk.001.patch, > YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, > YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, > YARN-8851-WIP7-trunk.001.patch, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal.pdf > > > At present, we support GPU/FPGA device in YARN through a native, coupling > way. But it's difficult for a vendor to implement such a device plugin > because the developer needs much knowledge of YARN internals. And this brings > burden to the community to maintain both YARN core and vendor-specific code. > Here we propose a new device plugin framework to ease vendor device plugin > development and provide a more flexible way to integrate with YARN NM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664229#comment-16664229 ] Jason Lowe commented on YARN-8672: -- Thanks for updating the patch! I'm still skeptical this is going to work well in practice for some corner cases. For example, what if FileDeletionService has been configured with a delay? Deletes would be significantly delayed, and then the tokens file can be removed just as the new one is getting created. Couple of potential fixes: # Have localizers use token files private to that localizer instance. Then each localizer is responsible for reaping its personal tokens file without concerns of deleting it just as a new localizer spins up to use it. Seems we would always have a race trying to share the tokens file given we cannot control the time period between when we want to delete something and when it actually gets deleted. # Never delete the tokens file until the container completes. This could have implications if the tokens file needs to be different between different localizers (i.e.: credentials of the container were updated since the first localizer). My preference would be to separate the token files. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Jason Lowe >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8672.001.patch, YARN-8672.002.patch, > YARN-8672.003.patch, YARN-8672.004.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8921) SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe to 4 GBs
[ https://issues.apache.org/jira/browse/YARN-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664227#comment-16664227 ] Robert Kanter commented on YARN-8921: - Oh, haha, I missed that - I was only looking for the {{Max}} change and didn't notice that the tests were suddenly gone! > SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe > to 4 GBs > > > Key: YARN-8921 > URL: https://issues.apache.org/jira/browse/YARN-8921 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8921-YARN-1011.00.patch, > YARN-8921-YARN-1011.01.patch, YARN-8921-YARN-1011.02.patch, YARN-8921.00.patch > > > The memory overallocate threshold is a float, so is > (overAllocationThresholds.getMemoryThreshold() * > containersMonitor.getPmemAllocatedForContainers()). Because Math.round(float) > return an int, this would cap effectively the amount of memory available for > overallocation to Integer.MAX_VALUE, [see the code at > here|https://github.com/apache/hadoop/blob/fa864b8744cfdfe613a917ba1bbd859a5b6f70b8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/SnapshotBasedOverAllocationPolicy.java#L45] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8921) SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe to 4 GBs
[ https://issues.apache.org/jira/browse/YARN-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8921: - Attachment: YARN-8921-YARN-1011.02.patch > SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe > to 4 GBs > > > Key: YARN-8921 > URL: https://issues.apache.org/jira/browse/YARN-8921 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8921-YARN-1011.00.patch, > YARN-8921-YARN-1011.01.patch, YARN-8921-YARN-1011.02.patch, YARN-8921.00.patch > > > The memory overallocate threshold is a float, so is > (overAllocationThresholds.getMemoryThreshold() * > containersMonitor.getPmemAllocatedForContainers()). Because Math.round(float) > return an int, this would cap effectively the amount of memory available for > overallocation to Integer.MAX_VALUE, [see the code at > here|https://github.com/apache/hadoop/blob/fa864b8744cfdfe613a917ba1bbd859a5b6f70b8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/SnapshotBasedOverAllocationPolicy.java#L45] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8921) SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe to 4 GBs
[ https://issues.apache.org/jira/browse/YARN-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664217#comment-16664217 ] Haibo Chen commented on YARN-8921: -- Oops, I forgot to include the unit test file into the new patch. Let's fix that. > SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe > to 4 GBs > > > Key: YARN-8921 > URL: https://issues.apache.org/jira/browse/YARN-8921 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8921-YARN-1011.00.patch, > YARN-8921-YARN-1011.01.patch, YARN-8921.00.patch > > > The memory overallocate threshold is a float, so is > (overAllocationThresholds.getMemoryThreshold() * > containersMonitor.getPmemAllocatedForContainers()). Because Math.round(float) > return an int, this would cap effectively the amount of memory available for > overallocation to Integer.MAX_VALUE, [see the code at > here|https://github.com/apache/hadoop/blob/fa864b8744cfdfe613a917ba1bbd859a5b6f70b8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/SnapshotBasedOverAllocationPolicy.java#L45] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8921) SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe to 4 GBs
[ https://issues.apache.org/jira/browse/YARN-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664206#comment-16664206 ] Hadoop QA commented on YARN-8921: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} YARN-1011 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 7s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} YARN-1011 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} YARN-1011 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 12s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8921 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945613/YARN-8921-YARN-1011.01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8d6990e60f0f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | YARN-1011 / f53ed8a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/22337/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22337/testReport/ | | Max. process+thread count | 333 (vs. ulimit of
[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application
[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664143#comment-16664143 ] Hadoop QA commented on YARN-8569: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 57s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 146 unchanged - 1 fixed = 146 total (was 147) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 11s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 22s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m
[jira] [Commented] (YARN-8930) CGroup-based strict container memory enforcement does not work with CGroupElasticMemoryController
[ https://issues.apache.org/jira/browse/YARN-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664139#comment-16664139 ] Hudson commented on YARN-8930: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15323 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15323/]) YARN-8930. CGroup-based strict container memory enforcement does not (rkanter: rev f76e3c3db789dd6866fa0fef8e014cbfe8c8f80d) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/MemoryResourceHandler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsMemoryResourceHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java > CGroup-based strict container memory enforcement does not work with > CGroupElasticMemoryController > - > > Key: YARN-8930 > URL: https://issues.apache.org/jira/browse/YARN-8930 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8930.00.patch, YARN-8930.01.patch > > > When yarn.nodemanger.resource.memory.enforced is set to true with memory > cgroup turned on, (aka strict memory enforcement), containers monitor relies > on the under_oom status read from the container cgroup's memory.oom_control > file. > However, when the root yarn container cgroup is under oom (e.g. when the node > is overallocating iteself), the under_oom status is set for all yarn > containers regardless of whether each individual container has run over its > memory limit. > What essentially happens is that whenever the root cgroup is under oom, all > yarn containers are killed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6167) RM option to delegate NM loss container action to AM
[ https://issues.apache.org/jira/browse/YARN-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664132#comment-16664132 ] Billie Rinaldi commented on YARN-6167: -- Patch 2 removes the RMContainer from the RM on release, so that pending releases don't need to be remembered. > RM option to delegate NM loss container action to AM > > > Key: YARN-6167 > URL: https://issues.apache.org/jira/browse/YARN-6167 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-6167.01.patch, YARN-6167.02.patch > > > Currently, if the RM times out an NM, the scheduler will kill all containers > that were running on the NM. For some applications, in the event of a > temporary NM outage, it might be better to delegate to the AM the decision > whether to kill the containers and request new containers from the RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6167) RM option to delegate NM loss container action to AM
[ https://issues.apache.org/jira/browse/YARN-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-6167: - Attachment: YARN-6167.02.patch > RM option to delegate NM loss container action to AM > > > Key: YARN-6167 > URL: https://issues.apache.org/jira/browse/YARN-6167 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-6167.01.patch, YARN-6167.02.patch > > > Currently, if the RM times out an NM, the scheduler will kill all containers > that were running on the NM. For some applications, in the event of a > temporary NM outage, it might be better to delegate to the AM the decision > whether to kill the containers and request new containers from the RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8470) Fair scheduler exception with SLS
[ https://issues.apache.org/jira/browse/YARN-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned YARN-8470: Assignee: (was: Haibo Chen) > Fair scheduler exception with SLS > - > > Key: YARN-8470 > URL: https://issues.apache.org/jira/browse/YARN-8470 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Priority: Major > > I ran into the following exception with sls: > 2018-06-26 13:34:04,358 ERROR resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6356) Allow different values of yarn.log-aggregation.retain-seconds for succeeded and failed jobs
[ https://issues.apache.org/jira/browse/YARN-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned YARN-6356: Assignee: (was: Haibo Chen) > Allow different values of yarn.log-aggregation.retain-seconds for succeeded > and failed jobs > --- > > Key: YARN-6356 > URL: https://issues.apache.org/jira/browse/YARN-6356 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Reporter: Robert Kanter >Priority: Major > > It would be useful to have a value of {{yarn.log-aggregation.retain-seconds}} > for succeeded jobs and a different value for failed/killed jobs. For jobs > that succeeded, you typically don't care about the logs, so a shorter > retention time is fine (and saves space/blocks in HDFS). For jobs that > failed or were killed, the logs are much more important, and it's likely to > want to keep them around for longer so you have time to look at them. > For instance, you could set it to keep logs for succeeded jobs for 1 day and > logs for failed/killed jobs for 1 week. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664108#comment-16664108 ] Eric Badger commented on YARN-8927: --- I think a better name would be {{local}} or {{localhost}}. That way it is very clear that this registry entry is referring to the node where the image is being run > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8921) SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe to 4 GBs
[ https://issues.apache.org/jira/browse/YARN-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664072#comment-16664072 ] Robert Kanter commented on YARN-8921: - +1 pending Jenkins > SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe > to 4 GBs > > > Key: YARN-8921 > URL: https://issues.apache.org/jira/browse/YARN-8921 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8921-YARN-1011.00.patch, > YARN-8921-YARN-1011.01.patch, YARN-8921.00.patch > > > The memory overallocate threshold is a float, so is > (overAllocationThresholds.getMemoryThreshold() * > containersMonitor.getPmemAllocatedForContainers()). Because Math.round(float) > return an int, this would cap effectively the amount of memory available for > overallocation to Integer.MAX_VALUE, [see the code at > here|https://github.com/apache/hadoop/blob/fa864b8744cfdfe613a917ba1bbd859a5b6f70b8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/SnapshotBasedOverAllocationPolicy.java#L45] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8930) CGroup-based strict container memory enforcement does not work with CGroupElasticMemoryController
[ https://issues.apache.org/jira/browse/YARN-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664068#comment-16664068 ] Robert Kanter commented on YARN-8930: - +1 LGTM > CGroup-based strict container memory enforcement does not work with > CGroupElasticMemoryController > - > > Key: YARN-8930 > URL: https://issues.apache.org/jira/browse/YARN-8930 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8930.00.patch, YARN-8930.01.patch > > > When yarn.nodemanger.resource.memory.enforced is set to true with memory > cgroup turned on, (aka strict memory enforcement), containers monitor relies > on the under_oom status read from the container cgroup's memory.oom_control > file. > However, when the root yarn container cgroup is under oom (e.g. when the node > is overallocating iteself), the under_oom status is set for all yarn > containers regardless of whether each individual container has run over its > memory limit. > What essentially happens is that whenever the root cgroup is under oom, all > yarn containers are killed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8898) Fix FederationInterceptor#allocate to set application priority in allocateResponse
[ https://issues.apache.org/jira/browse/YARN-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664064#comment-16664064 ] Botong Huang commented on YARN-8898: I meant I don't have the code. Feel free to take a crack at it. Please do it on top of YARN-8933 and use the last responses from all SCs. Since FederationInteceptor sits between AM and RM, I don't think it can get the ApplicationSubmissionContext easily. When AMRMProxy initializes the interceptor pipeline for the AM, it has the ContainerLaunchContext for the AM, and currently it is also not passed into the interceptors as well. I agree that FederationInteceptor need more information, I think it is better to use/add fields in the AM RM allocate protocol. Generally it should figure out all information by looking at the communication between AM and (home) RM, e.g. application priority, node label etc. If application priority can change over time, then I think we should just follow the application priority in the last home RM response (reuse YARN-8933). Whenever it detects a priority change in home SC, perhaps FederationInterceptor should change the priority of the UAM in secondaries as well. This last part I think we may or may not need it for now, I am okay with both ways. But when we launch the UAM initially, we should definitely make sure to submit it in the same priority as the home SC at that moment. Regarding Router, the current design is that Router only tracks the home SC for an application. The expansion to (which subset of) secondary SCs are solely up to the FederationInceptor according to proxy policy, Router should not be aware of it. So when client updates the priority for the app, Router should only update it in the home RM, and leave the rest to FederationInterceptor. > Fix FederationInterceptor#allocate to set application priority in > allocateResponse > -- > > Key: YARN-8898 > URL: https://issues.apache.org/jira/browse/YARN-8898 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > > In case of FederationInterceptor#mergeAllocateResponses skips > application_priority in response returned -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2199) FairScheduler: Allow max-AM-share to be specified in the root queue
[ https://issues.apache.org/jira/browse/YARN-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664045#comment-16664045 ] Szilard Nemeth commented on YARN-2199: -- As discussed wity [~rkanter], I'm taking this over and going to work on it later. > FairScheduler: Allow max-AM-share to be specified in the root queue > --- > > Key: YARN-2199 > URL: https://issues.apache.org/jira/browse/YARN-2199 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.5.0 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Major > Attachments: YARN-2199.patch, YARN-2199.patch > > > If users want to specify the max-AM-share, they have to do it for each leaf > queue individually. It would be convenient if they could also specify it in > the root queue so they'd only have to specify it once to apply to all queues. > It could still be overridden in a specific leaf queue though. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-2199) FairScheduler: Allow max-AM-share to be specified in the root queue
[ https://issues.apache.org/jira/browse/YARN-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-2199: Assignee: Szilard Nemeth (was: Robert Kanter) > FairScheduler: Allow max-AM-share to be specified in the root queue > --- > > Key: YARN-2199 > URL: https://issues.apache.org/jira/browse/YARN-2199 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.5.0 >Reporter: Robert Kanter >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-2199.patch, YARN-2199.patch > > > If users want to specify the max-AM-share, they have to do it for each leaf > queue individually. It would be convenient if they could also specify it in > the root queue so they'd only have to specify it once to apply to all queues. > It could still be overridden in a specific leaf queue though. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7225) Add queue and partition info to RM audit log
[ https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664038#comment-16664038 ] Jonathan Hung commented on YARN-7225: - Thanks [~eepayne], for the branch-2.8 patch I think we can change the containerCompleted one as in the trunk case. For the apply() one (which I guess is allocate() in the 2.8 case), perhaps we can use node.getPartition()? Thoughts? > Add queue and partition info to RM audit log > > > Key: YARN-7225 > URL: https://issues.apache.org/jira/browse/YARN-7225 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.9.1, 2.8.4, 3.0.2, 3.1.1 >Reporter: Jonathan Hung >Assignee: Eric Payne >Priority: Major > Attachments: YARN-7225.001.patch, YARN-7225.002.patch, > YARN-7225.003.patch, YARN-7225.004.patch, YARN-7225.005.patch, > YARN-7225.branch-2.8.001.patch > > > Right now RM audit log has fields such as user, ip, resource, etc. Having > queue and partition is useful for resource tracking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663984#comment-16663984 ] Eric Yang commented on YARN-8927: - I agree that "library" can be the default word to enable local images and public images to be trusted. [~tangzhankun] thoughts? > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663963#comment-16663963 ] Eric Yang commented on YARN-8914: - [~akhilpb] Thank you for the review. I will remove js.map files, if they are not in use. Overlay.js will be used for window resize function. It is currently in commented out section of the code in line 81 to 86 in terminal.template file. Its purpose is to show a modal dialog of terminal size when user drag and resize the browser. That part of the logic is place holder until YARN-8839 defines the protocol that we will implement for browser resize, and other accessories functions. > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8921) SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe to 4 GBs
[ https://issues.apache.org/jira/browse/YARN-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663961#comment-16663961 ] Haibo Chen commented on YARN-8921: -- Thanks [~rkanter] for the review. I have updated the patch to address your comment. > SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe > to 4 GBs > > > Key: YARN-8921 > URL: https://issues.apache.org/jira/browse/YARN-8921 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8921-YARN-1011.00.patch, > YARN-8921-YARN-1011.01.patch, YARN-8921.00.patch > > > The memory overallocate threshold is a float, so is > (overAllocationThresholds.getMemoryThreshold() * > containersMonitor.getPmemAllocatedForContainers()). Because Math.round(float) > return an int, this would cap effectively the amount of memory available for > overallocation to Integer.MAX_VALUE, [see the code at > here|https://github.com/apache/hadoop/blob/fa864b8744cfdfe613a917ba1bbd859a5b6f70b8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/SnapshotBasedOverAllocationPolicy.java#L45] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8921) SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe to 4 GBs
[ https://issues.apache.org/jira/browse/YARN-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8921: - Attachment: YARN-8921-YARN-1011.01.patch > SnapshotBasedOverAllocationPolicy always caps the amount of memory availabe > to 4 GBs > > > Key: YARN-8921 > URL: https://issues.apache.org/jira/browse/YARN-8921 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8921-YARN-1011.00.patch, > YARN-8921-YARN-1011.01.patch, YARN-8921.00.patch > > > The memory overallocate threshold is a float, so is > (overAllocationThresholds.getMemoryThreshold() * > containersMonitor.getPmemAllocatedForContainers()). Because Math.round(float) > return an int, this would cap effectively the amount of memory available for > overallocation to Integer.MAX_VALUE, [see the code at > here|https://github.com/apache/hadoop/blob/fa864b8744cfdfe613a917ba1bbd859a5b6f70b8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/SnapshotBasedOverAllocationPolicy.java#L45] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663958#comment-16663958 ] Manikandan R commented on YARN-6523: Rebasing.. > Newly retrieved security Tokens are sent as part of each heartbeat to each > node from RM which is not desirable in large cluster > --- > > Key: YARN-6523 > URL: https://issues.apache.org/jira/browse/YARN-6523 > Project: Hadoop YARN > Issue Type: Improvement > Components: RM >Affects Versions: 2.8.0, 2.7.3 >Reporter: Naganarasimha G R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-6523.001.patch, YARN-6523.002.patch, > YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch > > > Currently as part of heartbeat response RM sets all application's tokens > though all applications might not be active on the node. On top of it > NodeHeartbeatResponsePBImpl converts tokens for each app into > SystemCredentialsForAppsProto. Hence for each node and each heartbeat too > many SystemCredentialsForAppsProto objects were getting created. > We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with > 8GB RAM configured for RM -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6523) Newly retrieved security Tokens are sent as part of each heartbeat to each node from RM which is not desirable in large cluster
[ https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-6523: --- Attachment: YARN-6523.005.patch > Newly retrieved security Tokens are sent as part of each heartbeat to each > node from RM which is not desirable in large cluster > --- > > Key: YARN-6523 > URL: https://issues.apache.org/jira/browse/YARN-6523 > Project: Hadoop YARN > Issue Type: Improvement > Components: RM >Affects Versions: 2.8.0, 2.7.3 >Reporter: Naganarasimha G R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-6523.001.patch, YARN-6523.002.patch, > YARN-6523.003.patch, YARN-6523.004.patch, YARN-6523.005.patch > > > Currently as part of heartbeat response RM sets all application's tokens > though all applications might not be active on the node. On top of it > NodeHeartbeatResponsePBImpl converts tokens for each app into > SystemCredentialsForAppsProto. Hence for each node and each heartbeat too > many SystemCredentialsForAppsProto objects were getting created. > We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with > 8GB RAM configured for RM -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663948#comment-16663948 ] Eric Badger commented on YARN-8927: --- bq. Eric Badger This seems to imply that library keyword will toggle to allow public image and image without a registry name. Locally built images will not have registry name. Should we trust all local images without a registry name? I prefer this idea more than prepending library/* but just want to be sure that by common sense, local images can be trusted without getting into trouble. I'm not sure it has to be one or the other. If you specify just {{library}} in the trusted registries then it would mean that all local images are trusted. If you specify {{library/centos:latest}}, then only the {{centos:latest}} image that is local will be trusted and none of the other local images. The main takeaway I want to have here is that the user should not have to change the name of what they're specifying. If the image on the node is {{centos:latest}} then they should ask for {{centos:latest}}, not {{library/centos:latest}}. And there should be a configuration in {{docker.trusted.registries}} to allow for that image to be trusted, even if it is a local image that has no "registry" > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663945#comment-16663945 ] Eric Yang commented on YARN-8927: - {quote}we implicitly transform user given value "centos" to "library/centos:latest", "centos:6" to "library/centos:6".{quote} [~tangzhankun] The idea is good to improve usability. Can user get confused that they ask for centos, but they get library/centos when they run docker inspect command? {quote}If the image is deemed to not have a registry associated with it (e.g. centos:latest or centos:6), we could then mark it as trusted or not based on whether library is in the trusted registries list.{quote} [~ebadger] This seems to imply that library keyword will toggle to allow public image and image without a registry name. Locally built images will not have registry name. Should we trust all local images without a registry name? I prefer this idea more than prepending library/* but just want to be sure that by common sense, local images can be trusted without getting into trouble. > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Better handling of "docker.trusted.registries" in container-executor's "trusted_image_check" function
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663905#comment-16663905 ] Eric Badger commented on YARN-8927: --- It would be strongly preferable for the user to not have to specify {{library/}} for a local image. If the image is deemed to not have a registry associated with it (e.g. centos:latest or centos:6), we could then mark it as trusted or not based on whether {{library}} is in the trusted registries list. > Better handling of "docker.trusted.registries" in container-executor's > "trusted_image_check" function > - > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code
[ https://issues.apache.org/jira/browse/YARN-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663733#comment-16663733 ] Daniel Voros edited comment on YARN-8779 at 10/25/18 1:30 PM: -- One more thing: 9. {{PUT /app/v1/services/\{service_name\}}} might return: * 200 (on start/stop), * 202 (on initiating/canceling upgrade or flexing) * 204 (in any other case - not sure what that might be) however swagger spec only lists 204. was (Author: dvoros): One more thing: 9. {{PUT /app/v1/services/{service_name}}} might return: * 200 (on start/stop), * 202 (on initiating/canceling upgrade or flexing) * 204 (in any other case - not sure what that might be) however swagger spec only lists 204. > Fix few discrepancies between YARN Service swagger spec and code > > > Key: YARN-8779 > URL: https://issues.apache.org/jira/browse/YARN-8779 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0, 3.1.1 >Reporter: Gour Saha >Priority: Major > > Following issues were identified in YARN Service swagger definition during an > effort to integrate with a running service by generating Java and Go > client-side stubs from the spec - > > 1. > *restartPolicy* is wrong and should be *restart_policy* > > 2. > A DELETE request to a non-existing service (or a previously existing but > deleted service) throws an ApiException instead of something like > NotFoundException (the equivalent of 404). Note, DELETE of an existing > service behaves fine. > > 3. > The response code of DELETE request is 200. The spec says 204. Since the > response has a payload, the spec should be updated to 200 instead of 204. > > 4. > _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method > does not return a Service object. Swagger definition has the below bug in GET > response of */app/v1/services/\{service_name}* - > {code:java} > type: object > items: > $ref: '#/definitions/Service' > {code} > It should be - > {code:java} > $ref: '#/definitions/Service' > {code} > > 5. > Serialization issues were seen in all enum classes - ServiceState.java, > ContainerState.java, ComponentState.java, PlacementType.java and > PlacementScope.java. > Java client threw the below exception for ServiceState - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Cannot construct instance of > `org.apache.cb.yarn.service.api.records.ServiceState` (although at least one > Creator exists): no String-argument constructor/factory method to deserialize > from String value ('ACCEPTED') > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 121] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["state”]) > {code} > For Golang we saw this for ContainerState - > {code:java} > ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot > unmarshal string into Go struct field Container.state of type > yarnmodel.ContainerState > {code} > > 6. > *launch_time* actually returns an integer but swagger definition says date. > Hence, the following exception is seen on the client side - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or > string. > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 477] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”]) > {code} > > 8. > *user.name* query param with a valid value is required for all API calls to > an unsecure cluster. This is not defined in the spec. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code
[ https://issues.apache.org/jira/browse/YARN-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663733#comment-16663733 ] Daniel Voros commented on YARN-8779: One more thing: 9. {{PUT /app/v1/services/{service_name}}} might return: * 200 (on start/stop), * 202 (on initiating/canceling upgrade or flexing) * 204 (in any other case - not sure what that might be) however swagger spec only lists 204. > Fix few discrepancies between YARN Service swagger spec and code > > > Key: YARN-8779 > URL: https://issues.apache.org/jira/browse/YARN-8779 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0, 3.1.1 >Reporter: Gour Saha >Priority: Major > > Following issues were identified in YARN Service swagger definition during an > effort to integrate with a running service by generating Java and Go > client-side stubs from the spec - > > 1. > *restartPolicy* is wrong and should be *restart_policy* > > 2. > A DELETE request to a non-existing service (or a previously existing but > deleted service) throws an ApiException instead of something like > NotFoundException (the equivalent of 404). Note, DELETE of an existing > service behaves fine. > > 3. > The response code of DELETE request is 200. The spec says 204. Since the > response has a payload, the spec should be updated to 200 instead of 204. > > 4. > _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method > does not return a Service object. Swagger definition has the below bug in GET > response of */app/v1/services/\{service_name}* - > {code:java} > type: object > items: > $ref: '#/definitions/Service' > {code} > It should be - > {code:java} > $ref: '#/definitions/Service' > {code} > > 5. > Serialization issues were seen in all enum classes - ServiceState.java, > ContainerState.java, ComponentState.java, PlacementType.java and > PlacementScope.java. > Java client threw the below exception for ServiceState - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Cannot construct instance of > `org.apache.cb.yarn.service.api.records.ServiceState` (although at least one > Creator exists): no String-argument constructor/factory method to deserialize > from String value ('ACCEPTED') > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 121] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["state”]) > {code} > For Golang we saw this for ContainerState - > {code:java} > ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot > unmarshal string into Go struct field Container.state of type > yarnmodel.ContainerState > {code} > > 6. > *launch_time* actually returns an integer but swagger definition says date. > Hence, the following exception is seen on the client side - > {code:java} > Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: > Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or > string. > at [Source: > (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); > line: 1, column: 477] (through reference chain: > org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”]) > {code} > > 8. > *user.name* query param with a valid value is required for all API calls to > an unsecure cluster. This is not defined in the spec. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663703#comment-16663703 ] Hadoop QA commented on YARN-8906: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 10s{color} | {color:red} YARN-8906 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8906 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22334/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Node_Heatmap_Chart_Fixed.png, > YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8944) TestContainerAllocation.testUserLimitAllocationMultipleContainers failure after YARN-8896
[ https://issues.apache.org/jira/browse/YARN-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663617#comment-16663617 ] Wilfred Spiegelenburg commented on YARN-8944: - Failure stack trace from the junit test: {code} [ERROR] Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 36.546 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation [ERROR] testUserLimitAllocationMultipleContainers(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation) Time elapsed: 1.09 s <<< FAILURE! java.lang.AssertionError: expected:<101> but was:<71> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation.testUserLimitAllocationMultipleContainers(TestContainerAllocation.java:945) {code} > TestContainerAllocation.testUserLimitAllocationMultipleContainers failure > after YARN-8896 > - > > Key: YARN-8944 > URL: https://issues.apache.org/jira/browse/YARN-8944 > Project: Hadoop YARN > Issue Type: Test > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > YARN-8896 changes the behaviour of the CapacityScheduler by limiting the > number of containers that can be allocated in one heartbeat. It is an > undocumented change in behaviour. > The change breaks the junit test: > {{TestContainerAllocation.testUserLimitAllocationMultipleContainers}} > The maximum number of containers that gets assigned via the on heartbeat is > 100 and it expects 199 to be assigned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8944) TestContainerAllocation.testUserLimitAllocationMultipleContainers failure after YARN-8896
Wilfred Spiegelenburg created YARN-8944: --- Summary: TestContainerAllocation.testUserLimitAllocationMultipleContainers failure after YARN-8896 Key: YARN-8944 URL: https://issues.apache.org/jira/browse/YARN-8944 Project: Hadoop YARN Issue Type: Test Components: capacity scheduler Affects Versions: 3.3.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg YARN-8896 changes the behaviour of the CapacityScheduler by limiting the number of containers that can be allocated in one heartbeat. It is an undocumented change in behaviour. The change breaks the junit test: {{TestContainerAllocation.testUserLimitAllocationMultipleContainers}} The maximum number of containers that gets assigned via the on heartbeat is 100 and it expects 199 to be assigned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration
[ https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663587#comment-16663587 ] Akhil PB commented on YARN-8747: Hi [~sunilg] [~collinma] I have tried the patch and tested the UI, dates seem working fine. > [UI2] YARN UI2 page loading failed due to js error under some time zone > configuration > - > > Key: YARN-8747 > URL: https://issues.apache.org/jira/browse/YARN-8747 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 3.1.1 >Reporter: collinma >Assignee: collinma >Priority: Blocker > Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png > > Original Estimate: 1h > Remaining Estimate: 1h > > We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured > as GMT+8, the web browser time zone is GMT+8 too. yarn ui page loaded failed > due to js error: > > !image-2018-09-05-18-54-03-991.png! > The moment-timezone js component raised that error. This has been fixed in > moment-timezone > v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need > to update moment-timezone version accordingly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663567#comment-16663567 ] Akhil PB edited comment on YARN-8906 at 10/25/18 10:46 AM: --- Attaching v1 patch. See {{Node_Heatmap_Chart_Fixed.png}} for the UI fix. [~sunilg] Could you pls help to review the patch. was (Author: akhilpb): Attaching v1 patch. See {{Screen Shot 2018-10-25 at 4.08.32 PM.png}} for the UI fix. [~sunilg] Could you pls help to review the patch. > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Node_Heatmap_Chart_Fixed.png, > YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-8906: --- Attachment: (was: Screen Shot 2018-10-25 at 4.08.32 PM.png) > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Node_Heatmap_Chart_Fixed.png, > YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-8906: --- Attachment: Node_Heatmap_Chart_Fixed.png > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Node_Heatmap_Chart_Fixed.png, > YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663567#comment-16663567 ] Akhil PB edited comment on YARN-8906 at 10/25/18 10:42 AM: --- Attaching v1 patch. See {{Screen Shot 2018-10-25 at 4.08.32 PM.png}} for the UI fix. [~sunilg] Could you pls help to review the patch. was (Author: akhilpb): Attaching v1 patch. See {{Screen Shot 2018-10-25 at 4.08.32 PM.png}} for the fixed UI. [~sunilg] Could you pls help to review the patch. > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Screen Shot 2018-10-25 at > 4.08.32 PM.png, YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663567#comment-16663567 ] Akhil PB edited comment on YARN-8906 at 10/25/18 10:41 AM: --- Attaching v1 patch. See {{Screen Shot 2018-10-25 at 4.08.32 PM.png}} for the fixed UI. [~sunilg] Could you pls help to review the patch. was (Author: akhilpb): Attaching v1 patch. See Screen Shot 2018-10-25 at 4.08.32 PM.png for the fixed UI. [~sunilg] Could you pls help to review the patch. > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Screen Shot 2018-10-25 at > 4.08.32 PM.png, YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663567#comment-16663567 ] Akhil PB commented on YARN-8906: Attaching v1 patch. See Screen Shot 2018-10-25 at 4.08.32 PM.png for the fixed UI. [~sunilg] Could you pls help to review the patch. > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Screen Shot 2018-10-25 at > 4.08.32 PM.png, YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-8906: --- Attachment: YARN-8906.001.patch > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Screen Shot 2018-10-25 at > 4.08.32 PM.png, YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8906) [UI2] NM hostnames not displayed correctly in Node Heatmap Chart
[ https://issues.apache.org/jira/browse/YARN-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akhil PB updated YARN-8906: --- Attachment: Screen Shot 2018-10-25 at 4.08.32 PM.png > [UI2] NM hostnames not displayed correctly in Node Heatmap Chart > > > Key: YARN-8906 > URL: https://issues.apache.org/jira/browse/YARN-8906 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Akhil PB >Priority: Major > Attachments: Node_Heatmap_Chart.png, Screen Shot 2018-10-25 at > 4.08.32 PM.png, YARN-8906.001.patch > > > Hostnames displayed on the Node Heatmap Chart look garbled and are not > clearly visible. Attached screenshot. > cc [~akhilpb] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8851) [Umbrella] A new pluggable device plugin framework to ease vendor plugin development
[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663495#comment-16663495 ] Zhankun Tang commented on YARN-8851: Sorry that I missed your comments. Thanks [~cheersyang] . :) > [Umbrella] A new pluggable device plugin framework to ease vendor plugin > development > > > Key: YARN-8851 > URL: https://issues.apache.org/jira/browse/YARN-8851 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8851-WIP2-trunk.001.patch, > YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, > YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, > YARN-8851-WIP7-trunk.001.patch, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal.pdf > > > At present, we support GPU/FPGA device in YARN through a native, coupling > way. But it's difficult for a vendor to implement such a device plugin > because the developer needs much knowledge of YARN internals. And this brings > burden to the community to maintain both YARN core and vendor-specific code. > Here we propose a new device plugin framework to ease vendor device plugin > development and provide a more flexible way to integrate with YARN NM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8854) [Hadoop YARN Common] Update jquery datatable version references
[ https://issues.apache.org/jira/browse/YARN-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663325#comment-16663325 ] Akhil PB commented on YARN-8854: [~sunilg] Could you pls review the latest patch v4. > [Hadoop YARN Common] Update jquery datatable version references > --- > > Key: YARN-8854 > URL: https://issues.apache.org/jira/browse/YARN-8854 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akhil PB >Assignee: Akhil PB >Priority: Critical > Attachments: YARN-8854.001.patch, YARN-8854.002.patch, > YARN-8854.003.patch, YARN-8854.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663317#comment-16663317 ] Hadoop QA commented on YARN-8943: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api: The patch generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8943 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945534/YARN-8943.01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle | | uname | Linux 0a06a4624b64 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 97bd49f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22333/testReport/ | | Max. process+thread count | 341 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/22333/console | |
[jira] [Comment Edited] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663305#comment-16663305 ] Akhil PB edited comment on YARN-8914 at 10/25/18 6:23 AM: -- Hi [~eyang] Couple of comments. # Patch looks huge in size. Could we remove js.map files since addon files are already in non-minified? # Where is overlay.js used? Could we wrap the functions in this file in an IIFE? was (Author: akhilpb): [~eyang] Couple of comments. # Patch looks huge in size. Could we remove js.map files since addon files are already in non-minified? # Where is overlay.js used? Could we wrap the functions in this file in an IIFE? > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8936) Update ATSv2 hbase.two.version to 2.0.2
[ https://issues.apache.org/jira/browse/YARN-8936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663306#comment-16663306 ] Rohith Sharma K S commented on YARN-8936: - I think we may need to rebase YARN-7055 branch which has hbase.profile=2.0 set to default. Let me check and confirm > Update ATSv2 hbase.two.version to 2.0.2 > --- > > Key: YARN-8936 > URL: https://issues.apache.org/jira/browse/YARN-8936 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 3.1.2, 3.3.0 >Reporter: Rohith Sharma K S >Assignee: Vrushali C >Priority: Major > Attachments: YARN-8936.0001.patch > > > Hadoop trunk uses hbase.two.version as 2.0.0-beta-1. HBase has release stable > Hbase-2.0.2 version and this could be used in Hadoop-3.3/3.2/3.1 branches. > cc:/ [~vrushalic] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663305#comment-16663305 ] Akhil PB commented on YARN-8914: [~eyang] Couple of comments. # Patch looks huge in size. Could we remove js.map files since addon files are already in non-minified? # Where is overlay.js used? Could we wrap the functions in this file in an IIFE? > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8856) TestTimelineReaderWebServicesHBaseStorage tests failing with NoClassDefFoundError
[ https://issues.apache.org/jira/browse/YARN-8856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663296#comment-16663296 ] Sushil Ks commented on YARN-8856: - Hi [~rohithsharma], Kindly review the patch. > TestTimelineReaderWebServicesHBaseStorage tests failing with > NoClassDefFoundError > - > > Key: YARN-8856 > URL: https://issues.apache.org/jira/browse/YARN-8856 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jason Lowe >Assignee: Sushil Ks >Priority: Major > Attachments: YARN-8856.001.patch > > > TestTimelineReaderWebServicesHBaseStorage has been failing in nightly builds > with NoClassDefFoundError in the tests. Sample error and stacktrace to > follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org