[jira] [Commented] (YARN-8266) [UI2] Clicking on application from cluster view should redirect to application attempt page
[ https://issues.apache.org/jira/browse/YARN-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475368#comment-16475368 ] Hudson commented on YARN-8266: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14197 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14197/]) YARN-8266. [UI2] Clicking on application from cluster view should (sunilg: rev 796b2b0ee36e8e9225fb76ae35edc58ad907b737) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/utils/href-address-utils.js > [UI2] Clicking on application from cluster view should redirect to > application attempt page > --- > > Key: YARN-8266 > URL: https://issues.apache.org/jira/browse/YARN-8266 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8266.001.patch > > > Steps: > 1) Start one application > 2) Go to cluster overview page > 3) Click on applicationId from Cluster Resource Usage By Application > This action redirects to > [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is > invalid url. It does not show any details. > Instead It should redirect to attempt page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8289) Modify distributedshell to support Node Attributes
[ https://issues.apache.org/jira/browse/YARN-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475361#comment-16475361 ] Naganarasimha G R commented on YARN-8289: - [~sunil.gov...@gmail.com], Given that you had already working on the scheduler patch and testing it already with DS shell, would you like to take this Jira up? > Modify distributedshell to support Node Attributes > -- > > Key: YARN-8289 > URL: https://issues.apache.org/jira/browse/YARN-8289 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Affects Versions: YARN-3409 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Major > > Modifications required in Distributed shell to support NodeAttributes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8236) Invalid kerberos principal file name cause NPE in native service
[ https://issues.apache.org/jira/browse/YARN-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475358#comment-16475358 ] Sunil G commented on YARN-8236: --- +1. Committing shortly. > Invalid kerberos principal file name cause NPE in native service > > > Key: YARN-8236 > URL: https://issues.apache.org/jira/browse/YARN-8236 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Sunil G >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8236.01.patch, YARN-8236.02.patch > > > Stack trace > > {code:java} > 2018-04-29 16:22:54,266 WARN webapp.GenericExceptionHandler > (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.client.ServiceClient.addKeytabResourceIfSecure(ServiceClient.java:994) > at > org.apache.hadoop.yarn.service.client.ServiceClient.submitApp(ServiceClient.java:685) > at > org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:269){code} > cc [~gsaha] [~csingh] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8278) DistributedScheduling not working in HA
[ https://issues.apache.org/jira/browse/YARN-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475349#comment-16475349 ] Bibin A Chundatt commented on YARN-8278: [~cheersyang] Thank you for review. Attached patch handling comments > DistributedScheduling not working in HA > --- > > Key: YARN-8278 > URL: https://issues.apache.org/jira/browse/YARN-8278 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: YARN-8278.001.patch, YARN-8278.002.patch > > > Configured HA Cluster and submit application with distributed scheduling > configuration enabled > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): > java.lang.IllegalArgumentException: ResourceManager does not support this > protocol > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.DefaultRequestInterceptor.init(DefaultRequestInterceptor.java:91) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AbstractRequestInterceptor.init(AbstractRequestInterceptor.java:82) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.init(DistributedScheduler.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.initializePipeline(AMRMProxyService.java:450) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.processApplicationStartRequest(AMRMProxyService.java:369) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:942) > at > org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:101) > at > org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:223) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) > {code} > {{ServerRMProxy#checkAllowedProtocols}} should allow > {{DistributedSchedulingAMProtocol}} > {code} > public void checkAllowedProtocols(Class protocol) { > Preconditions.checkArgument( > protocol.isAssignableFrom(ResourceTracker.class), > "ResourceManager does not support this protocol"); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8278) DistributedScheduling not working in HA
[ https://issues.apache.org/jira/browse/YARN-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8278: --- Attachment: YARN-8278.002.patch > DistributedScheduling not working in HA > --- > > Key: YARN-8278 > URL: https://issues.apache.org/jira/browse/YARN-8278 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: YARN-8278.001.patch, YARN-8278.002.patch > > > Configured HA Cluster and submit application with distributed scheduling > configuration enabled > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): > java.lang.IllegalArgumentException: ResourceManager does not support this > protocol > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.DefaultRequestInterceptor.init(DefaultRequestInterceptor.java:91) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AbstractRequestInterceptor.init(AbstractRequestInterceptor.java:82) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.init(DistributedScheduler.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.initializePipeline(AMRMProxyService.java:450) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.processApplicationStartRequest(AMRMProxyService.java:369) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:942) > at > org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:101) > at > org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:223) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) > {code} > {{ServerRMProxy#checkAllowedProtocols}} should allow > {{DistributedSchedulingAMProtocol}} > {code} > public void checkAllowedProtocols(Class protocol) { > Preconditions.checkArgument( > protocol.isAssignableFrom(ResourceTracker.class), > "ResourceManager does not support this protocol"); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8166) [UI2] Service page header links are broken
[ https://issues.apache.org/jira/browse/YARN-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-8166: -- Summary: [UI2] Service page header links are broken (was: Service AppId page throws HTTP Error 401) > [UI2] Service page header links are broken > -- > > Key: YARN-8166 > URL: https://issues.apache.org/jira/browse/YARN-8166 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: YARN-8166.001.patch > > > Steps: > 1) Launch a yarn service in unsecure cluster > 2) Go to component info page for sleeper-0 > 3) click on sleeper link > http://xxx:8088/ui2/#/yarn-component-instances/sleeper/components?service=yesha-sleeper&&appid=application_1518804855867_0002 > Above url fails with HTTP Error 401 > {code} > 401, Authorization required. > Please check your security settings. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475340#comment-16475340 ] Rohith Sharma K S commented on YARN-8130: - thanks to [~haibochen] and [~vrushalic] for the review. I back-ported to branch-3.1/branch-3.0/branch-2 as well. > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Reporter: Charan Hebri >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.3 > > Attachments: YARN-8130.01.patch, YARN-8130.02.patch, > YARN-8130.03.patch > > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-8130: Fix Version/s: 3.0.3 3.1.1 2.10.0 > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Reporter: Charan Hebri >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.3 > > Attachments: YARN-8130.01.patch, YARN-8130.02.patch, > YARN-8130.03.patch > > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8266) [UI2] Clicking on application from cluster view should redirect to application attempt page
[ https://issues.apache.org/jira/browse/YARN-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-8266: -- Summary: [UI2] Clicking on application from cluster view should redirect to application attempt page (was: Clicking on application from cluster view should redirect to application attempt page) > [UI2] Clicking on application from cluster view should redirect to > application attempt page > --- > > Key: YARN-8266 > URL: https://issues.apache.org/jira/browse/YARN-8266 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: YARN-8266.001.patch > > > Steps: > 1) Start one application > 2) Go to cluster overview page > 3) Click on applicationId from Cluster Resource Usage By Application > This action redirects to > [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is > invalid url. It does not show any details. > Instead It should redirect to attempt page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8292) Preemption of GPU resource does not happen if memory/vcores is not required to be preempted
Sumana Sathish created YARN-8292: Summary: Preemption of GPU resource does not happen if memory/vcores is not required to be preempted Key: YARN-8292 URL: https://issues.apache.org/jira/browse/YARN-8292 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Sumana Sathish Assignee: Tan, Wangda -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8278) DistributedScheduling not working in HA
[ https://issues.apache.org/jira/browse/YARN-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475303#comment-16475303 ] Weiwei Yang commented on YARN-8278: --- Hi [~bibinchundatt] Thanks for the patch, please see my comments below 1. ServerRMProxy is used by NMs to talk with RM, and since LocalRM running on NMs need to talk to RM through {{DistributedSchedulingAMProtocol}}, it makes sense to allow this protocol in the proxy. However, can we follow the fashion used by {{ClientRMProxy}} class for the check, by adding an interface {{ServerRMProtocols}} just like following {{ClientRMProtocols}} {code:java} private interface ClientRMProtocols extends ApplicationClientProtocol, ApplicationMasterProtocol, ResourceManagerAdministrationProtocol { // Add nothing } Preconditions.checkArgument(protocol.isAssignableFrom(ClientRMProtocols.class), "RM does not support this client protocol"); {code} 2. {{TestServerRMProxy}}, line 31: throws IOException can be removed; line 33, defaultRMAddress can be removed. 3. Can you please fix the checkstyle issue Thanks > DistributedScheduling not working in HA > --- > > Key: YARN-8278 > URL: https://issues.apache.org/jira/browse/YARN-8278 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: YARN-8278.001.patch > > > Configured HA Cluster and submit application with distributed scheduling > configuration enabled > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException): > java.lang.IllegalArgumentException: ResourceManager does not support this > protocol > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.DefaultRequestInterceptor.init(DefaultRequestInterceptor.java:91) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AbstractRequestInterceptor.init(AbstractRequestInterceptor.java:82) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.init(DistributedScheduler.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.initializePipeline(AMRMProxyService.java:450) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.processApplicationStartRequest(AMRMProxyService.java:369) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:942) > at > org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:101) > at > org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:223) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) > {code} > {{ServerRMProxy#checkAllowedProtocols}} should allow > {{DistributedSchedulingAMProtocol}} > {code} > public void checkAllowedProtocols(Class protocol) { > Preconditions.checkArgument( > protocol.isAssignableFrom(ResourceTracker.class), > "ResourceManager does not support this protocol"); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8291) RMRegistryOperationService don't have limit on AsyncPurge threads
Prabhu Joseph created YARN-8291: --- Summary: RMRegistryOperationService don't have limit on AsyncPurge threads Key: YARN-8291 URL: https://issues.apache.org/jira/browse/YARN-8291 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.3 Reporter: Prabhu Joseph When there are more than 1+ containers finished - RMRegistryOperationService will create 1+ threads for performing AsyncPurge which can slowdown the ResourceManager process. There should be a limit on the number of threads. {code} "RegistryAdminService 554485" #824351 prio=5 os_prio=0 tid=0x7fe4b2bc9800 nid=0xf8ed in Object.wait() [0x7fe31a5e4000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1386) - locked <0x0007902ec7d8> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040) at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:158) at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkStat(CuratorService.java:455) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.stat(RegistryOperationsService.java:137) at org.apache.hadoop.registry.client.binding.RegistryUtils.statChildren(RegistryUtils.java:210) at org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:450) at org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:520) at org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:570) at org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:543) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4353) Provide short circuit user group mapping for NM/AM
[ https://issues.apache.org/jira/browse/YARN-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475269#comment-16475269 ] Wilfred Spiegelenburg commented on YARN-4353: - [~templedf] it has been really quiet on this Jira for a long time. I have recently started to run into a similar issue as described here. Ii dug a bit deeper and found that the group lookup is used especially in the ACL checks. This is taken from a NM log: {code} 2018-03-09 19:14:50,881 DEBUG org.apache.hadoop.yarn.webapp.View: Rendering class org.apache.hadoop.yarn.server.nodemanager.webapp.ContainerLogsPage$ContainersLogsBlock @5 2018-03-09 19:14:50,882 DEBUG org.apache.hadoop.yarn.server.security.ApplicationACLsManager: Verifying access-type VIEW_APP for wilfred (auth:SIMPLE) on application application_1520622831944_0001 owned by systest 2018-03-09 19:14:50,888 DEBUG org.mortbay.log: loaded class com.sun.jndi.ldap.LdapCtxFactory from null ... 2018-03-09 19:14:51,163 WARN org.apache.hadoop.security.LdapGroupsMapping: Exception trying to get groups for user wilfred: [LDAP: error code 1 - 04DC: LdapErr: DSID-0C09075A, comment: In order to perform this operation a successful bind must be completed on the connection., data 0, v1db1^@] 2018-03-09 19:14:51,164 WARN org.apache.hadoop.security.UserGroupInformation: No groups available for user wilfred {code} The group resolution is triggered when you set an ACL which has groups listed as allowed. The lookup will be triggered if the user that is requesting access is not the application owner, an admin or allowed access as the user. Using the {{NullGroupMapping}} would break ACLs. The other proposed solution to pass in the resolved groups to the AM is also not scalable. In the case that there are thousands of users in the LDAP server and hundreds of groups you would add a large overhead to the NM and then to the AM. You would also get into trouble with long running applications. The group data would become stale and thus cause a security issue. The AM also uses it for the RPC protocol ACLs if you have that configured so again a {{NullGroupMapping}} would break ACLs there too. I propose to close this as a won't fix. If you want to use the {{LdapGroupsMapping}} you need to set the configuration up in the correct way to use it. > Provide short circuit user group mapping for NM/AM > -- > > Key: YARN-4353 > URL: https://issues.apache.org/jira/browse/YARN-4353 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Major > Attachments: YARN-4353.prelim.patch > > > When the NM launches an AM, the {{ContainerLocalizer}} gets the current user > from {{UserGroupInformation}}, which triggers user group mapping, even though > the user groups are never accessed. If secure LDAP is configured for group > mapping, then there are some additional complications created by the > unnecessary group resolution. Additionally, it adds unnecessary latency to > the container launch time. > To address the issue, before getting the current user, the > {{ContainerLocalizer}} should configure {{UserGroupInformation}} with a null > group mapping service that quickly and quietly returns an empty group list > for all users. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8289) Modify distributedshell to support Node Attributes
[ https://issues.apache.org/jira/browse/YARN-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475264#comment-16475264 ] Naganarasimha G R commented on YARN-8289: - Hi [~cheersyang], I have not completely thought through but felt this is kind of required for demo and kind of similar to yours YARN-7745. But would start once atleast YARN-7863 WIP is available. > Modify distributedshell to support Node Attributes > -- > > Key: YARN-8289 > URL: https://issues.apache.org/jira/browse/YARN-8289 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Affects Versions: YARN-3409 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Major > > Modifications required in Distributed shell to support NodeAttributes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475254#comment-16475254 ] genericqa commented on YARN-8080: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 2s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 74 new + 121 unchanged - 3 fixed = 195 total (was 124) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 59s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 16s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 81m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8080 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attac
[jira] [Commented] (YARN-8289) Modify distributedshell to support Node Attributes
[ https://issues.apache.org/jira/browse/YARN-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475244#comment-16475244 ] Weiwei Yang commented on YARN-8289: --- Hi [~Naganarasimha] Thanks for creating this JIRA, how do you plan to support this? Will this be specified as part of {{-placement_spec}} argument? > Modify distributedshell to support Node Attributes > -- > > Key: YARN-8289 > URL: https://issues.apache.org/jira/browse/YARN-8289 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Affects Versions: YARN-3409 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Major > > Modifications required in Distributed shell to support NodeAttributes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475243#comment-16475243 ] genericqa commented on YARN-8080: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 49s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 70 new + 121 unchanged - 2 fixed = 191 total (was 123) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 50s{color} | {color:red} hadoop-yarn-services-core in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 37s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.service.component.TestComponent | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JI
[jira] [Updated] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8288: -- Fix Version/s: 3.2.0 > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.3 > > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475239#comment-16475239 ] Weiwei Yang commented on YARN-8288: --- Thanks [~Naganarasimha]! > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.3 > > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8080: --- Attachment: YARN-8080.014.patch > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, > YARN-8080.014.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475188#comment-16475188 ] Suma Shivaprasad edited comment on YARN-8080 at 5/15/18 2:09 AM: - Attached patch with fixes to 1. Request containers according to restart policy and not every time containers exit as was happening earlier. 2. checkAndUpdateServiceState was not getting called in case of terminating components which was causing Service state to be always STARTED instead of STABLE. 3. Fixed a failing UT was (Author: suma.shivaprasad): Attached patch with fixes to 1. Request containers according to restart policy and not every time containers exit as was happening earlier. 2. checkAndUpdateServiceState was not getting called in case of terminating components which was causing Service state to be always STARTED instead of STABLE. > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, > YARN-8080.014.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475188#comment-16475188 ] Suma Shivaprasad commented on YARN-8080: Attached patch with fixes to 1. Request containers according to restart policy and not every time containers exit as was happening earlier. 2. checkAndUpdateServiceState was not getting called in case of terminating components which was causing Service state to be always STARTED instead of STABLE. > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8123) Skip compiling old hamlet package when the Java version is 10 or upper
[ https://issues.apache.org/jira/browse/YARN-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475163#comment-16475163 ] Takanobu Asanuma commented on YARN-8123: Thanks for the patch, [~dineshchitlangia]! I have confirmed that the below command with applying the patch succeeds through jdk9 and jdk10. +1 (non-binding). {noformat} mvn clean javadoc:javadoc --projects hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common {noformat} > Skip compiling old hamlet package when the Java version is 10 or upper > -- > > Key: YARN-8123 > URL: https://issues.apache.org/jira/browse/YARN-8123 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp > Environment: Java 10 or upper >Reporter: Akira Ajisaka >Assignee: Dinesh Chitlangia >Priority: Major > Labels: newbie > Attachments: YARN-8123.001.patch > > > HADOOP-11423 skipped compiling old hamlet package when the Java version is 9, > however, it is not skipped with Java 10+. We need to fix it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8080) YARN native service should support component restart policy
[ https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8080: --- Attachment: YARN-8080.013.patch > YARN native service should support component restart policy > --- > > Key: YARN-8080 > URL: https://issues.apache.org/jira/browse/YARN-8080 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8080.001.patch, YARN-8080.002.patch, > YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, > YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, > YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch > > > Existing native service assumes the service is long running and never > finishes. Containers will be restarted even if exit code == 0. > To support boarder use cases, we need to allow restart policy of component > specified by users. Propose to have following policies: > 1) Always: containers always restarted by framework regardless of container > exit status. This is existing/default behavior. > 2) Never: Do not restart containers in any cases after container finishes: To > support job-like workload (for example Tensorflow training job). If a task > exit with code == 0, we should not restart the task. This can be used by > services which is not restart/recovery-able. > 3) On-failure: Similar to above, only restart task with exitcode != 0. > Behaviors after component *instance* finalize (Succeeded or Failed when > restart_policy != ALWAYS): > 1) For single component, single instance: complete service. > 2) For single component, multiple instance: other running instances from the > same component won't be affected by the finalized component instance. Service > will be terminated once all instances finalized. > 3) For multiple components: Service will be terminated once all components > finalized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component
[ https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475113#comment-16475113 ] genericqa commented on YARN-8081: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 33s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 57 unchanged - 0 fixed = 61 total (was 57) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m 7s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 3s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}114m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.client.cli.TestYarnCLI | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8081 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923353/YARN-8081.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f5238b61bb96 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/pat
[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation
[ https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475100#comment-16475100 ] Haibo Chen commented on YARN-8250: -- [~leftnoteasy] Thanks for your comments. I agree that we should avoid the CS vs FS issue if possible. As I have mentioned, the rational is to do things that is suitable for Oversubscription but do not break or de-stablize existing functionalities. Can you elaborate on what do you mean by an issue we need to fix. I was describing a behavior that is fine except for over-allocation. Today container scheduler tries to launch opportunistic containers whenever there is a container scheduling request, or whenever a container finishes. This is not an issue today. But, in the case of over-allocation because of the fact that the utilization metrics is stale, it is possible that we'd have the following case. A few containers finishes, the container monitor checks the node utilization, which is low, and then the container scheduler gets the container finish events and aggressively tries to start opportunistic containers. Then later NM realizes that opportunistic containers need to be preempted. Not sure what can be done here to unify the two, as they fundamentally have issues with the other one's approach. Hence, the proposal to have two implementations. {quote}) I'm not sure if we should give all the decisions to CGroups {quote} One key thing to note is that we want to ensure GUARANTEED containers are not slowed down by OPPORTUNISTIC containers, so cgroup is always a requirement to do over-allocation from day one to ensure isolation. Unless docker container executor has similar mechanisms, it is hard to make over-allocation work properly with docker without much downsides that render the feature unusable. I am open to suggestions to make things simpler and more maintainable, but as noted here, there is fundamental behavior changes. I'll try to take a look at if there is more behaviors that we could extract into the base container scheduler. > Create another implementation of ContainerScheduler to support NM > overallocation > > > Key: YARN-8250 > URL: https://issues.apache.org/jira/browse/YARN-8250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8250-YARN-1011.00.patch, > YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch > > > YARN-6675 adds NM over-allocation support by modifying the existing > ContainerScheduler and providing a utilizationBased resource tracker. > However, the implementation adds a lot of complexity to ContainerScheduler, > and future tweak of over-allocation strategy based on how much containers > have been launched is even more complicated. > As such, this Jira proposes a new ContainerScheduler that always launch > guaranteed containers immediately and queues opportunistic containers. It > relies on a periodical check to launch opportunistic containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Issue Type: Sub-task (was: Improvement) Parent: YARN-7402 > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Minor > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Affects Version/s: (was: 3.0.0) (was: 2.9.0) > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Priority: Minor (was: Major) > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Minor > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Labels: (was: patch) > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Target Version/s: (was: 2.9.0, 3.0.0) > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Component/s: router > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, router >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475089#comment-16475089 ] Giovanni Matteo Fumarola edited comment on YARN-8041 at 5/15/18 12:18 AM: -- Thanks [~yiran] for working on this. I have already added getAppState in YARN-8186. Please rebase the patch with the current trunk. I will review the patch after that. was (Author: giovanni.fumarola): Thanks [~yiran] for working on this. I have already added getAppState in YARN-8186. Please rebase the patch with the current trunk. > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 2.9.0, 3.0.0 >Reporter: Yiran Wu >Priority: Major > Labels: patch > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola reassigned YARN-8041: -- Assignee: Yiran Wu > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 2.9.0, 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Labels: patch > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Description: This Jira tracks the implementation of some missing REST invocation in FederationInterceptorREST: * getAppStatistics * getNodeToLabels * getLabelsOnNode * updateApplicationPriority * getAppQueue * updateAppQueue * getAppTimeout * getAppTimeouts * updateApplicationTimeout * getAppAttempts * getAppAttempt * getContainers * getContainer was: This Jira tracks the implementation of some missing REST invocation in FederationInterceptorREST: * getAppStatistics; *getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer REST invocations transparently to multiple RMs > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 2.9.0, 3.0.0 >Reporter: Yiran Wu >Priority: Major > Labels: patch > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics > * getNodeToLabels > * getLabelsOnNode > * updateApplicationPriority > * getAppQueue > * updateAppQueue > * getAppTimeout > * getAppTimeouts > * updateApplicationTimeout > * getAppAttempts > * getAppAttempt > * getContainers > * getContainer -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Description: This Jira tracks the implementation of some missing REST invocation in FederationInterceptorREST: * getAppStatistics; *getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer REST invocations transparently to multiple RMs was: Implement routing getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer REST invocations transparently to multiple RMs > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 2.9.0, 3.0.0 >Reporter: Yiran Wu >Priority: Major > Labels: patch > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > This Jira tracks the implementation of some missing REST invocation in > FederationInterceptorREST: > * getAppStatistics; > *getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer > REST invocations transparently to multiple RMs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8041: --- Summary: [Router] Federation: routing some missing REST invocations transparently to multiple RMs (was: Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs ) > [Router] Federation: routing some missing REST invocations transparently to > multiple RMs > > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 2.9.0, 3.0.0 >Reporter: Yiran Wu >Priority: Major > Labels: patch > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > Implement routing > getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer > REST invocations transparently to multiple RMs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs
[ https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475089#comment-16475089 ] Giovanni Matteo Fumarola commented on YARN-8041: Thanks [~yiran] for working on this. I have already added getAppState in YARN-8186. Please rebase the patch with the current trunk. > Federation: Implement multiple interfaces(14 interfaces), routing REST > invocations transparently to multiple RMs > - > > Key: YARN-8041 > URL: https://issues.apache.org/jira/browse/YARN-8041 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 2.9.0, 3.0.0 >Reporter: Yiran Wu >Priority: Major > Labels: patch > Attachments: YARN-8041.001.patch, YARN-8041.002.patch, > YARN-8041.003.patch > > > Implement routing > getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer > REST invocations transparently to multiple RMs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8180) YARN Federation has not implemented blacklist sub-cluster for AM routing
[ https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475081#comment-16475081 ] Giovanni Matteo Fumarola edited comment on YARN-8180 at 5/15/18 12:11 AM: -- Thanks [~shenyinjie] for opening the Jira. The blacklist context in {{FederationClientInterceptor#submitApplication}} is just to keep track of the retries and avoid to resubmit an AM where we already tried. [~abmodi] please update the documentation. We added the property to the documentation by mistake. was (Author: giovanni.fumarola): Thanks [~shenyinjie] for opening the Jira. The blacklist context in \{{FederationClientInterceptor#submitApplication}} is just to keep track of the retries and avoid to resubmit an AM where we already tried. [~abmodi] please update the documentation. We may add the property to the documentation by mistake. > YARN Federation has not implemented blacklist sub-cluster for AM routing > > > Key: YARN-8180 > URL: https://issues.apache.org/jira/browse/YARN-8180 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Reporter: Shen Yinjie >Assignee: Abhishek Modi >Priority: Major > > Property "yarn.federation.blacklist-subclusters" is defined in > yarn-fedeartion doc,but it has not been defined and implemented in Java code. > In FederationClientInterceptor#submitApplication() > {code:java} > List blacklist = new ArrayList(); > for (int i = 0; i < numSubmitRetries; ++i) { > SubClusterId subClusterId = policyFacade.getHomeSubcluster( > request.getApplicationSubmissionContext(), blacklist); > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8180) YARN Federation has not implemented blacklist sub-cluster for AM routing
[ https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475081#comment-16475081 ] Giovanni Matteo Fumarola commented on YARN-8180: Thanks [~shenyinjie] for opening the Jira. The blacklist context in \{{FederationClientInterceptor#submitApplication}} is just to keep track of the retries and avoid to resubmit an AM where we already tried. [~abmodi] please update the documentation. We may add the property to the documentation by mistake. > YARN Federation has not implemented blacklist sub-cluster for AM routing > > > Key: YARN-8180 > URL: https://issues.apache.org/jira/browse/YARN-8180 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Reporter: Shen Yinjie >Assignee: Abhishek Modi >Priority: Major > > Property "yarn.federation.blacklist-subclusters" is defined in > yarn-fedeartion doc,but it has not been defined and implemented in Java code. > In FederationClientInterceptor#submitApplication() > {code:java} > List blacklist = new ArrayList(); > for (int i = 0; i < numSubmitRetries; ++i) { > SubClusterId subClusterId = policyFacade.getHomeSubcluster( > request.getApplicationSubmissionContext(), blacklist); > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart
Yesha Vora created YARN-8290: Summary: Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart Key: YARN-8290 URL: https://issues.apache.org/jira/browse/YARN-8290 Project: Hadoop YARN Issue Type: Bug Reporter: Yesha Vora Scenario: 1) Start 5 streaming application in background 2) Kill Active RM and cause RM failover After RM failover, The application failed with below error. {code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: Invocation returned exception on [rm2] : org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1517520038847_0003' doesn't exist in RM. Please check that the job submission was successful. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) , so propagating back to caller. 18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application application_1517520038847_0003 18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/hrt_qa/.staging/job_1517520038847_0003 18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is not set in the application report Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation
[ https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475062#comment-16475062 ] Wangda Tan commented on YARN-8250: -- [~haibochen], I took a very brief look at implemented code since I couldn't find a chance to read through the implementation. My thoughts: - To me it is important to have a single implementation with different policies or just fix it correctly. Otherwise it will enter the CS vs. FS issue short after this. - For 2), it looks like an issue we need to fix: why we want to keep the logic to aggressively launch O containers and let them killed by framework shortly after launch. - For 1) I'm not sure if we should give all the decisions to CGroups. In some cases kill container cannot be done immediately by system IIRC (like docker container) , it's better to look at existing status of running containers before launch a container. > Create another implementation of ContainerScheduler to support NM > overallocation > > > Key: YARN-8250 > URL: https://issues.apache.org/jira/browse/YARN-8250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8250-YARN-1011.00.patch, > YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch > > > YARN-6675 adds NM over-allocation support by modifying the existing > ContainerScheduler and providing a utilizationBased resource tracker. > However, the implementation adds a lot of complexity to ContainerScheduler, > and future tweak of over-allocation strategy based on how much containers > have been launched is even more complicated. > As such, this Jira proposes a new ContainerScheduler that always launch > guaranteed containers immediately and queues opportunistic containers. It > relies on a periodical check to launch opportunistic containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7900) [AMRMProxy] AMRMClientRelayer for stateful FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475061#comment-16475061 ] Giovanni Matteo Fumarola commented on YARN-7900: Overall the patch looks good to me. Please add some more javadoc in the {{TestAMRMClientRelayer}} and we can commit it. > [AMRMProxy] AMRMClientRelayer for stateful FederationInterceptor > > > Key: YARN-7900 > URL: https://issues.apache.org/jira/browse/YARN-7900 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-7900.v1.patch, YARN-7900.v2.patch, > YARN-7900.v3.patch, YARN-7900.v4.patch, YARN-7900.v5.patch, > YARN-7900.v6.patch, YARN-7900.v7.patch, YARN-7900.v8.patch > > > Inside stateful FederationInterceptor (YARN-7899), we need a component > similar to AMRMClient that remembers all pending (outstands) requests we've > sent to YarnRM, auto re-register and do full pending resend when YarnRM fails > over and throws ApplicationMasterNotRegisteredException back. This JIRA adds > this component as AMRMClientRelayer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475059#comment-16475059 ] Miklos Szegedi commented on YARN-4599: -- {quote}bq. Is 'descriptors->event_control_fd = -1;' necessary? {quote} Yes it is a defense against chained errors, it may make it easier to debug when you get a core dump. {quote}bq. 3) The comments for test_oom() does not quite make sense to me. My current understanding is that it adds the calling process to the given pgroup and simulates an OOM by keep asking OS for memory? {quote} You are mixing the parent with the child. The parent gets the child pid and the child gets 0 after the fork() since the child can just call getpid(). It forks a child process gets it's pid in the parent and adds that to a cgroup. Once the child notices that it is in the cgroup it starts eating memory triggering an OOM. {quote}bq. 4) Can you please elaborate on how cgroup simulation is done in oom_listener_test_main.c? The child process that is added to the cgroup only does sleep(). {quote} /* Unit test for cgroup testing. There are two modes. If the unit test is run as root and we have cgroups we try to crate a cgroup and generate an OOM. If we are not running as root we just sleep instead of eating memory and simulate the OOM by sending an event in a mock event fd mock_oom_event_as_user. */ {quote}bq. 5) Doing a param matching in CGroupsHandlerImpl.GetCGroupParam() does not seem a good practice to me. {quote} CGroupsHandlerImpl.GetCGroupParam() is a smart function that returns the file name given the parameter name. I do not see any good practice issue here. The tasks file is always without the controller name. {quote}bq. 6) Let's wrap the new thread join in ContainersMonitorImpl with try-catch clause as we do with the monitoring thread. {quote} May I ask why? I thought only exceptions that will actually be thrown need to be caught. CGroupElasticMemoryController has a much better cleanup process than the monitoring thread and it does not need InterruptedException. In fact any interrupted exception would mean that we have likely leaked the external process, so I would advise against using it. {quote}bq. 7) The configuration changes are incompatible ... How about we create separate configurations for pm elastic control and vm elastic control? {quote} I do not necessarily agree here. a) First of all polling and cgroups memory control did not work together before the patch either. NM exited with an exception, so there is no previous functionality that worked before and it does not work now. There is no compatibility break. cgroups takes a precedence indeed, that is a new feature. b) I would like to have a clean design in the long term for configuration avoiding too many configuration entries and definitely avoiding confusion. If here is a yarn.nodemanager.pmem-check-enabled, it suggests general use, it would be unintuitive not to use it. We indeed cannot change it's general meaning anymore. I think the clean design is having yarn.nodemanager.resource.memory.enabled to enable cgroups, yarn.nodemanager.resource.memory.enforced to enforce it per container and yarn.nodemanager.elastic-memory-control.enabled to enforce it at the node level. The detailed settings like yarn.nodemanager.pmem-check-enabled and yarn.nodemanager.pmem-check-enabled can only intuitively apply to all of them. In uderstand the concern but this solution would let us keep only these five configuration entries. 11) Does it make sense to have the stopListening logic in `if (!watchdog.get) {}` block instead? It is completely equivalent. It will be called a few milliseconds earlier later, but there was a missing explanation there, so I added a comment. {quote}bq. 16) In TestDefaultOOMHandler.testBothContainersOOM(), I think we also need to verify container 2 is killed. Similarly, in testOneContainerOOM() and testNoContainerOOM(). {quote} Only one container should be killed. However, I refined the verify logic to be even more precise verifying this. I addressed the rest. I will provide a patch soon. > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support.
[jira] [Updated] (YARN-8248) Job hangs when a job requests a resource that its queue does not have
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8248: - Summary: Job hangs when a job requests a resource that its queue does not have (was: Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request) > Job hangs when a job requests a resource that its queue does not have > - > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8248) Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request
[ https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475058#comment-16475058 ] Haibo Chen commented on YARN-8248: -- {quote}as {{RMAppManager.validateAndCreateResourceRequest()}} can return a null value for the AM requests, {quote} Good catch! It does indeed return null if the AM is unmanaged. But I am not sure how the debug message helps diagnose this issue. I'd prefer we remove the debug message {quote} Is this explanation makes it cleaner? {quote} Yes. That makes sense. Comments would be very help in this case. We could also maybe reverse the order of the two conditions. The current diagnostic message seems good to me now that I understand what the condition means. {quote} So in my understanding, it can happen that in {{addApplication()}} the app was not rejected, for example AM does not request vCores and we have 0 vCores configure as max resources, but for a map container, 1 vCores is requested. {quote} Indeed, that can happen to custom resource types. In FairScheduler.allocate(), instead of rejecting an application if any request is rejected, we can just filtering out the ones that should be rejected by removing them from the ask list (with warning log) and proceed. Rejecting an application after it has starting running (FairScheduler.allocate() is called remotely by AM) seems counter-intuitive. I think we can signal AM by throwing a SchedulerInvalidResoureRequestException, which is propagated to AM. What do you think? {quote}About the uncovered unit test: Good point and I was thinking about that if we can reject an application only if the AM request is greater than 0 and we have 0 configured as max resource or simply in any case where the requested resource is greater than max resource, regardless if it is 0 or not. {quote} Never mind comment 4). That's based on my previous misunderstanding. If AM request is large than than the non-zero max-resource (steady fair share), we should not reject, because the queue may get instantaneous fair share that is large enough. That's not related to this patch. Let me know if something does not make sense. > Job hangs when a queue is specified and the maxResources of the queue cannot > satisfy the AM resource request > > > Key: YARN-8248 > URL: https://issues.apache.org/jira/browse/YARN-8248 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8248-001.patch, YARN-8248-002.patch, > YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, > YARN-8248-006.patch > > > Job hangs when mapreduce.job.queuename is specified and the queue has 0 of > any resource (vcores / memory / other) > In this scenario, the job should be immediately rejected upon submission > since the specified queue cannot serve the resource needs of the submitted > job. > > Command to run: > {code:java} > bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code} > fair-scheduler.xml queue config (excerpt): > > {code:java} > > 1 mb,0vcores > 9 mb,0vcores > 50 > -1.0f > 2.0 > fair > > {code} > Diagnostic message from the web UI: > {code:java} > Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is > not yet activated. (Resource request: exceeds current > queue or its parents maximum resource allowed).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation
[ https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475011#comment-16475011 ] Haibo Chen commented on YARN-8250: -- My understanding of SHED_QUEUED_CONTAINERS is to notify container scheduler to get rid of some opportunistic containers being queued. The intent of the new SCHEDULER_CONTAINERS is to let container scheduler try to launch opportunistic containers that are currently being queued. A follow-up would be also reusing SCHEDULER_CONTAINERS to preempt running opportunistic containers. I am not sure how to best align the two. The main reasons why we'd like to introduce a new container scheduler are 1) Minimize impact on GUARANTEED containers from over-allocating node with OPPORTUNISTIC containers. Queuing time of GUARANTEED containers would increase with more running OPPORTUNISTIC containers, which is the case with over-allocating. The code as in YARN-6675 gets complicated, Alternatively, we could launch GUARANTEED containers immediately and reply on cgroup-mechanism for preemption. 2) avoid aggressive OPPORTUNISTIC container launching. One thing to note is that in case of over-allocation, we'd rely on the resource utilization metrics to decide how much resources we can to launch OPPORTUNISTIC containers. The resource utilization metrics in NM is unfortunately only updated every few seconds. This can be problematic in that NM could end up with launching too many OPPORTUNISTIC containers before the metric is updated. The current default container scheduler launches containers aggressively, which could cause containers to be launched and killed shortly after. The new container scheduler only schedule OPPORTUNISTIC containers once whenever the utilization metric is updated. It is to my understanding that removing GUARANTEED container queuing would de-stablize cases like yours where nodes are running with a high utilization, and scheduling OPPORTUNISTIC containers only every few seconds would delay launch time in distributed scheduling. Hence, we created a plug-gable container scheduler so that we can choose to do things differently without causing issues to existing use cases. The new container scheduler should probably be named or documented so that it is only used when over-allocation is enabled. > Create another implementation of ContainerScheduler to support NM > overallocation > > > Key: YARN-8250 > URL: https://issues.apache.org/jira/browse/YARN-8250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8250-YARN-1011.00.patch, > YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch > > > YARN-6675 adds NM over-allocation support by modifying the existing > ContainerScheduler and providing a utilizationBased resource tracker. > However, the implementation adds a lot of complexity to ContainerScheduler, > and future tweak of over-allocation strategy based on how much containers > have been launched is even more complicated. > As such, this Jira proposes a new ContainerScheduler that always launch > guaranteed containers immediately and queues opportunistic containers. It > relies on a periodical check to launch opportunistic containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8123) Skip compiling old hamlet package when the Java version is 10 or upper
[ https://issues.apache.org/jira/browse/YARN-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475002#comment-16475002 ] genericqa commented on YARN-8123: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 33m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8123 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923349/YARN-8123.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux a15d2008e37a 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2d00a0c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20720/testReport/ | | Max. process+thread count | 405 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20720/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Skip compiling old hamlet package when the Java version is 10 or upper > -- > > Key: YARN-8123 > URL: https://issues.apache.org/jira/browse/YARN-8123 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp > Environment: Java 10 or upper >Reporter: Akira Aj
[jira] [Commented] (YARN-6919) Add default volume mount list
[ https://issues.apache.org/jira/browse/YARN-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474996#comment-16474996 ] genericqa commented on YARN-6919: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 237 unchanged - 0 fixed = 238 total (was 237) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 45s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 27s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}101m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-6919 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923344/YARN-6919.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 528af0ced50c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2d00a0c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.
[jira] [Commented] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy
[ https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474841#comment-16474841 ] Dinesh Chitlangia commented on YARN-7340: - [~yufeigu] - Thank you for assigning this to me. Are we looking for logging only the start time of requested resources or do we want to log the start time and end time for requested resources? I think we should log both the start and end time to avoid any ambiguity. What are your thoughts on this? > Missing the time stamp in exception message in Class NoOverCommitPolicy > --- > > Key: YARN-7340 > URL: https://issues.apache.org/jira/browse/YARN-7340 > Project: Hadoop YARN > Issue Type: Bug > Components: reservation system >Affects Versions: 3.1.0 >Reporter: Yufei Gu >Assignee: Dinesh Chitlangia >Priority: Minor > Labels: newbie++ > > It could be easily figured out by reading code. > {code} > throw new ResourceOverCommitException( > "Resources at time " + " would be overcommitted by " > + "accepting reservation: " + reservation.getReservationId()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component
[ https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8081: Attachment: YARN-8081.001.patch > Yarn Service Upgrade: Add support to upgrade a component > > > Key: YARN-8081 > URL: https://issues.apache.org/jira/browse/YARN-8081 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8081.001.patch > > > Yarn service upgrade should provide an API to upgrade the component. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component
[ https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8081: Description: Yarn service upgrade should provide an API to upg > Yarn Service Upgrade: Add support to upgrade a component > > > Key: YARN-8081 > URL: https://issues.apache.org/jira/browse/YARN-8081 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8081.001.patch > > > Yarn service upgrade should provide an API to upg -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component
[ https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8081: Description: Yarn service upgrade should provide an API to upgrade the component. (was: Yarn service upgrade should provide an API to upg) > Yarn Service Upgrade: Add support to upgrade a component > > > Key: YARN-8081 > URL: https://issues.apache.org/jira/browse/YARN-8081 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8081.001.patch > > > Yarn service upgrade should provide an API to upgrade the component. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8123) Skip compiling old hamlet package when the Java version is 10 or upper
[ https://issues.apache.org/jira/browse/YARN-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474835#comment-16474835 ] Dinesh Chitlangia commented on YARN-8123: - [~tasanuma0829] and [~ajisakaa] - Patch has been attached. Kindly help to review. > Skip compiling old hamlet package when the Java version is 10 or upper > -- > > Key: YARN-8123 > URL: https://issues.apache.org/jira/browse/YARN-8123 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp > Environment: Java 10 or upper >Reporter: Akira Ajisaka >Assignee: Dinesh Chitlangia >Priority: Major > Labels: newbie > Attachments: YARN-8123.001.patch > > > HADOOP-11423 skipped compiling old hamlet package when the Java version is 9, > however, it is not skipped with Java 10+. We need to fix it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8284) get_docker_command refactoring
[ https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474826#comment-16474826 ] Eric Yang commented on YARN-8284: - +1 looks good to me. > get_docker_command refactoring > -- > > Key: YARN-8284 > URL: https://issues.apache.org/jira/browse/YARN-8284 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0, 3.1.1 >Reporter: Jason Lowe >Assignee: Eric Badger >Priority: Minor > Attachments: YARN-8284.001.patch > > > YARN-8274 occurred because get_docker_command's helper functions each have to > remember to put the docker binary as the first argument. This is error prone > and causes code duplication for each of the helper functions. It would be > safer and simpler if get_docker_command initialized the docker binary > argument in one place and each of the helper functions only added the > arguments specific to their particular docker sub-command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation
[ https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474808#comment-16474808 ] Arun Suresh commented on YARN-8250: --- [~haibochen], I am not entirely convinced we really need to make the ContainerScheduler plug-able. Maybe if you you could provide a code snippet of how the new ContainerScheduler used for over-allocation needs to be different - I can maybe have a better context. You have created a new {{SCHEDULE_CONTAINERS}} event. Wondering if {{SHED_QUEUED_CONTAINERS}} should be re-used here ? > Create another implementation of ContainerScheduler to support NM > overallocation > > > Key: YARN-8250 > URL: https://issues.apache.org/jira/browse/YARN-8250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8250-YARN-1011.00.patch, > YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch > > > YARN-6675 adds NM over-allocation support by modifying the existing > ContainerScheduler and providing a utilizationBased resource tracker. > However, the implementation adds a lot of complexity to ContainerScheduler, > and future tweak of over-allocation strategy based on how much containers > have been launched is even more complicated. > As such, this Jira proposes a new ContainerScheduler that always launch > guaranteed containers immediately and queues opportunistic containers. It > relies on a periodical check to launch opportunistic containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6919) Add default volume mount list
[ https://issues.apache.org/jira/browse/YARN-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-6919: -- Attachment: YARN-6919.001.patch > Add default volume mount list > - > > Key: YARN-6919 > URL: https://issues.apache.org/jira/browse/YARN-6919 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-6919.001.patch > > > Piggybacking on YARN-5534, we should create a default list that bind mounts > selected volumes into all docker containers. This list will be empty by > default -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain
[ https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474761#comment-16474761 ] Haibo Chen commented on YARN-7933: -- I am okay with just remove the TODO comment, and have the discussion appid authentication in a separate jira. > [atsv2 read acls] Add TimelineWriter#writeDomain > - > > Key: YARN-7933 > URL: https://issues.apache.org/jira/browse/YARN-7933 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-7933.01.patch, YARN-7933.02.patch, > YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch > > > > Add an API TimelineWriter#writeDomain for writing the domain info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain
[ https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474759#comment-16474759 ] Haibo Chen commented on YARN-7933: -- If it's not in our design, I am inclined to remove it at this point in time. {quote}Timeline Token verification is at filter layer at the time of http connection establishment i.e even before it reaches servlets. {quote} Does that mean if two collectors are allocated on the same node, then one AM can forge data of another? This is probably an independent issue that applies to the other TimelineCollectorWebservices endpoint, putEntities(), which we can address in another jira. > [atsv2 read acls] Add TimelineWriter#writeDomain > - > > Key: YARN-7933 > URL: https://issues.apache.org/jira/browse/YARN-7933 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-7933.01.patch, YARN-7933.02.patch, > YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch > > > > Add an API TimelineWriter#writeDomain for writing the domain info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6677) Preempt all opportunistic containers when root container cgroup goes over memory limit
[ https://issues.apache.org/jira/browse/YARN-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6677: - Summary: Preempt all opportunistic containers when root container cgroup goes over memory limit (was: Pause and preempt containers when root container cgroup goes over memory limit) > Preempt all opportunistic containers when root container cgroup goes over > memory limit > -- > > Key: YARN-6677 > URL: https://issues.apache.org/jira/browse/YARN-6677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Miklos Szegedi >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation
[ https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474715#comment-16474715 ] Haibo Chen commented on YARN-8250: -- [~asuresh] Did you get a change to look at the patch? > Create another implementation of ContainerScheduler to support NM > overallocation > > > Key: YARN-8250 > URL: https://issues.apache.org/jira/browse/YARN-8250 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8250-YARN-1011.00.patch, > YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch > > > YARN-6675 adds NM over-allocation support by modifying the existing > ContainerScheduler and providing a utilizationBased resource tracker. > However, the implementation adds a lot of complexity to ContainerScheduler, > and future tweak of over-allocation strategy based on how much containers > have been launched is even more complicated. > As such, this Jira proposes a new ContainerScheduler that always launch > guaranteed containers immediately and queues opportunistic containers. It > relies on a periodical check to launch opportunistic containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8284) get_docker_command refactoring
[ https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474712#comment-16474712 ] genericqa commented on YARN-8284: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 39m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 73m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8284 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923323/YARN-8284.001.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 89f4b87ecdb5 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2d00a0c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20718/testReport/ | | Max. process+thread count | 334 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20718/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > get_docker_command refactoring > -- > > Key: YARN-8284 > URL: https://issues.apache.org/jira/browse/YARN-8284 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0, 3.1.1 >Reporter: Jason Lowe >Assignee: Eric Badger >Priority: Minor > Attachments: YARN-8284.001.patch > > > YARN-8274 occurred because get_docker_command's helper functions each have to > remember to put the docker binary as the first argument. This is error prone > and causes code duplication for each of the helper functions. It would be > safer and simpler if get_docker_command initialized the docker binary > argument in one place and each of the helper functi
[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474608#comment-16474608 ] Hudson commented on YARN-8130: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14195 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14195/]) YARN-8130 Race condition when container events are published for KILLED (haibochen: rev 2d00a0c71b5dde31e2cf8fcb96d9d541d41fb879) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelinePublisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelineEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelineEventType.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/TestNMTimelinePublisher.java > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Reporter: Charan Hebri >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8130.01.patch, YARN-8130.02.patch, > YARN-8130.03.patch > > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8284) get_docker_command refactoring
[ https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8284: -- Attachment: YARN-8284.001.patch > get_docker_command refactoring > -- > > Key: YARN-8284 > URL: https://issues.apache.org/jira/browse/YARN-8284 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0, 3.1.1 >Reporter: Jason Lowe >Assignee: Eric Badger >Priority: Minor > Attachments: YARN-8284.001.patch > > > YARN-8274 occurred because get_docker_command's helper functions each have to > remember to put the docker binary as the first argument. This is error prone > and causes code duplication for each of the helper functions. It would be > safer and simpler if get_docker_command initialized the docker binary > argument in one place and each of the helper functions only added the > arguments specific to their particular docker sub-command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474559#comment-16474559 ] Vrushali C commented on YARN-8130: -- thanks [~haibochen] , please go ahead > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Reporter: Charan Hebri >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-8130.01.patch, YARN-8130.02.patch, > YARN-8130.03.patch > > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications
[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474552#comment-16474552 ] Haibo Chen commented on YARN-8130: -- Checking this in later today if no objection > Race condition when container events are published for KILLED applications > -- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Reporter: Charan Hebri >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-8130.01.patch, YARN-8130.02.patch, > YARN-8130.03.patch > > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_02'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_01. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_02. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_02* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474543#comment-16474543 ] Haibo Chen commented on YARN-4599: -- Thanks [~miklos.szeg...@cloudera.com] for the patch! The TestContainersMonitor.testContainerKillOnMemoryOverflow failure seems related. I have a few comments/questions: 1) Error handling when writing to {{cgroup.event_control}} fails seems missing in oom_listener.c, do we need handle such an case? 2) Is 'descriptors->event_control_fd = -1;' necessary? 3) The comments for test_oom() does not quite make sense to me. My current understanding is that it adds the calling process to the given pgroup and simulates an OOM by keep asking OS for memory? 4) Can you please elaborate on how cgroup simulation is done in oom_listener_test_main.c? The child process that is added to the cgroup only does sleep(). 5) Doing a param matching in CGroupsHandlerImpl.GetCGroupParam() does not seem a good practice to me. Does it make sense to create a new method for the special case? 6) Let's wrap the new thread join in ContainersMonitorImpl with try-catch clause as we do with the monitoring thread. 7) The configuration changes are incompatible in that before the patch, poll-based pmcheck and vmcheck takes precedence over the cgroup based memory control mechanism. It is now reversed after the patch. if cgroup-based memory control is enabled, then poll-based pmcheck and vmcheck is disabled automatically. IIUC, one of the reasons is that we need to reuse the pmcheck and vmcheck flags which are dedicated to control the poll-based memory control. How about we create separate configurations for pm elastic control and vm elastic control? We can make sure they are mutual exclusive as indicated in CGroupElasticMemoryController. We want to keep the elastic memory control mechanism independent of the per-container memory control mechanism, so we can get ride of the shortcut in checkLimit() (warnings are probably more appropriate if we want to say the poll-based mechanism is not robust, which is an issue unrelated to what we are doing here.) 8) In CGroupElasticMemoryController, we can create a createOomHandler() method that is called by the constructor and overridden by the unit tests to avoid the test-only setOomHandler() method 9) bq. // TODO could we miss an event before the process starts? This is no longer an issue based on your experiment, per our offline discussion? 10) We only need two threads in the thread pool, one for reading the error stream, and the other for watching and logging OOM state, don't we. If so, we change executor = Executors.newFixedThreadPool(5); => executor = Executors.newFixedThreadPool(2); 11) I'm not quite sure how the watchdog thread can tell the elastic memory controller to stop. I guess once the watchdog thread calls stopListening(), the process is destroyed, `(read = events.read(event)` would return false, we'd realize in the memory controller thread that OOM is not resolved in time and throw an exception to crash NM? This process seems pretty obscure to me. Doe it make sense to have the stopListening logic in `if (!watchdog.get) {}` block instead? 12) Can we replace thrown.expected() statements with @Test(expected) which is more declarative? Similarly in TestDefaultOOMHandler. 13) In TestCGroupElasticMemoryController.testNormalExit(), not quite sure what the purpose of the sleep task is. Can you please add some comments there? 14) Can we add some comments to DefaultOOMHandler javadoc, especially which containers are considered to be killed first. 15) if new YarnRuntimeException("Could not find any containers but CGroups " + "reserved for containers ran out of memory. " + "I am giving up") is thrown in DefaultOOMHandler, CGroupElasticMemoryController simply logs the exception. Do we want to crash the NM as well in this case? 16) In TestDefaultOOMHandler.testBothContainersOOM(), I think we also need to verify container 2 is killed. Similarly, in testOneContainerOOM() and testNoContainerOOM(). > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also expli
[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474510#comment-16474510 ] Eric Yang commented on YARN-8108: - Kerberos SPN support by browser definition are: HTTP/, where is either white list server name or canonical DNS name of the server. Chrome, IE, and Firefox all shares the similar logic. Firefox and IE don't allow canonical DNS to prevent MITM attack. Safari and Chrome supports canonical DNS with options to disable canonical DNS. >From Server point of view, a single server can host multiple virtual hosts >with different web applications. This is technically possible to configure >web server to run with multiple SPN. It is incorrect to assume that same >virtual host can serve two different SPN for two different subset of URLs. >All browsers do not support subset of URLs to be served by one SPN, while >other subset of URLs to be served by another SPN. In Hadoop 0.2x, Hadoop components are designed to serve a collection of servlets (log, static, cluster) per port. Therefore, AuthenticationFilter can cover the entire port by targeting the fixed set of servlet for filtering, that matches browser expectation without problem. AuthenticationFilter was later reused in Hadoop 1.x and 2.x as Kerberos SPNEGO filter. The current problem is only surfaced when multiple web contexts are configured to share on the same port with same server hostname, and each web contexts tried to initialize its own SPN. This is not by design and it just happened due to code reuse and lack of testing. For Hadoop 2.x+ to offer embedded services securely, the individual AuthenticationFilter can be turned into one [security handler|http://www.eclipse.org/jetty/documentation/9.3.x/architecture.html#_handlers] to match Jetty design specification. This fall through the crack in open source when no one is looking because the first security mechanism for Hadoop was to implement a XSS filter (was committed as part of Chukwa) instead of security handler. Unfortunately, Hadoop security mechanisms followed the bottom up approach to implement as filter instead of following web application design to write security handler as Handlers. Due to lack of understanding that session persistence require authentication and authorization security mechanism to be built differently from web filters. The one line change is to loop through all Context and ensure all contexts are registered with the same AuthenticationFilter to apply one filter globally to all URLs. This is the reason that this one line patch can plug this security hole in the short term bug fix. The long term solution is writing security handler to match handler design to ensure no API breakage during jetty version upgrade and improve session persistence in Hadoop web applications, which is beyond the scope of this JIRA. > RM metrics rest API throws GSSException in kerberized environment > - > > Key: YARN-8108 > URL: https://issues.apache.org/jira/browse/YARN-8108 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kshitij Badani >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-8108.001.patch > > > Test is trying to pull up metrics data from SHS after kiniting as 'test_user' > It is throwing GSSException as follows > {code:java} > b2b460b80713|RUNNING: curl --silent -k -X GET -D > /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : > http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15 > 07:15:48,757|INFO|MainThread|machine.py:194 - > run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0 > 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - > getMetricsJsonData()|metrics: > > > > Error 403 GSSException: Failure unspecified at GSS-API level > (Mechanism level: Request is a replay (34)) > > HTTP ERROR 403 > Problem accessing /proxy/application_1518674952153_0070/metrics/json. > Reason: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Request is a replay (34)) > > > {code} > Rootcausing : proxyserver on RM can't be supported for Kerberos enabled > cluster because AuthenticationFilter is applied twice in Hadoop code (once in > httpServer2 for RM, and another instance from AmFilterInitializer for proxy > server). This will require code changes to hadoop-yarn-server-web-proxy > project -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8289) Modify distributedshell to support Node Attributes
[ https://issues.apache.org/jira/browse/YARN-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474472#comment-16474472 ] Naganarasimha G R commented on YARN-8289: - Based on the Scheduler API need to modify Distributed shell to support NodeAttributes > Modify distributedshell to support Node Attributes > -- > > Key: YARN-8289 > URL: https://issues.apache.org/jira/browse/YARN-8289 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Affects Versions: YARN-3409 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Major > > Modifications required in Distributed shell to support NodeAttributes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8289) Modify distributedshell to support Node Attributes
Naganarasimha G R created YARN-8289: --- Summary: Modify distributedshell to support Node Attributes Key: YARN-8289 URL: https://issues.apache.org/jira/browse/YARN-8289 Project: Hadoop YARN Issue Type: Sub-task Components: distributed-shell Affects Versions: YARN-3409 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Modifications required in Distributed shell to support NodeAttributes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474464#comment-16474464 ] Naganarasimha G R commented on YARN-7863: - Thanks [~sunilg], Would support you in the review of it ! > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil G >Assignee: Sunil G >Priority: Major > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8284) get_docker_command refactoring
[ https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reassigned YARN-8284: - Assignee: Eric Badger > get_docker_command refactoring > -- > > Key: YARN-8284 > URL: https://issues.apache.org/jira/browse/YARN-8284 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0, 3.1.1 >Reporter: Jason Lowe >Assignee: Eric Badger >Priority: Minor > > YARN-8274 occurred because get_docker_command's helper functions each have to > remember to put the docker binary as the first argument. This is error prone > and causes code duplication for each of the helper functions. It would be > safer and simpler if get_docker_command initialized the docker binary > argument in one place and each of the helper functions only added the > arguments specific to their particular docker sub-command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474436#comment-16474436 ] Hudson commented on YARN-8288: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14191 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14191/]) YARN-8288. Fix wrong number of table columns in Resource Model doc. (naganarasimha_gr: rev 8a2b5914f3a68148f40f99105acf5dafcc326e89) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceModel.md > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.1.1, 3.0.3 > > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474424#comment-16474424 ] Íñigo Goiri commented on YARN-8275: --- [~aw], thanks for the feedback, much appreciated. It looks like we can put all you proposed together into an umbrella for fixing the way Hadoop interacts with Windows. >From this thread, I see: * Move away from an external processese (winutils.exe) for native code: ** Replace by native Java APIs (e.g., symlinks) ** Replace by something like JNI or so * Fix the build system to fully leverage cmake instead of msbuild I would create an umbrella for this bigger task and make this JIRA just a subtask focusing on the YARN side (e.g., task). > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils-Functions.pdf, WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement
[ https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474418#comment-16474418 ] Sunil G commented on YARN-7494: --- Sorry for the delay here [~cheersyang] I am working on a patch to close ur comments and UT. Thank you. > Add muti node lookup support for better placement > - > > Key: YARN-7494 > URL: https://issues.apache.org/jira/browse/YARN-7494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G >Priority: Major > Attachments: YARN-7494.001.patch, YARN-7494.002.patch, > YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, > YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.v0.patch, > YARN-7494.v1.patch, multi-node-designProposal.png > > > Instead of single node, for effectiveness we can consider a multi node lookup > based on partition to start with. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8271) [UI2] Improve labeling of certain tables
[ https://issues.apache.org/jira/browse/YARN-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474410#comment-16474410 ] Hudson commented on YARN-8271: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14190 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14190/]) YARN-8271. [UI2] Improve labeling of certain tables. Contributed by (sunilg: rev 89d0b87ad324db09f14e00031d20635083d576ed) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-queue/capacity-queue-info.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/cluster-overview.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-queue/fair-queue-info.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-queue/fair-queue.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-tools.hbs * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-tools.js * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/helpers/node-menu.js * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-queue/capacity-queue.hbs > [UI2] Improve labeling of certain tables > > > Key: YARN-8271 > URL: https://issues.apache.org/jira/browse/YARN-8271 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8271.0001.patch > > > Update labeling for few items to avoid confusion > - Cluster Page (/cluster-overview): > -- "Finished apps" --> "Finished apps from all users" > -- "Running apps" --> "Running apps from all users" > - Queues overview page (/yarn-queues/root) && Per queue page > (/yarn-queue/root/apps) > -- "Running Apps" --> "Running apps from all users in queue " > - Nodes Page - side bar for all pages > -- "List of Applications" --> "List of Applications on this node" > -- "List of Containers" --> "List of Containers on this node" > - Yarn Tools > ** Yarn Tools --> YARN Tools > - Queue page > ** Running Apps: --> Running Apps From All Users -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8266) Clicking on application from cluster view should redirect to application attempt page
[ https://issues.apache.org/jira/browse/YARN-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474380#comment-16474380 ] Sunil G commented on YARN-8266: --- Looks straightforward. Committing shortly > Clicking on application from cluster view should redirect to application > attempt page > - > > Key: YARN-8266 > URL: https://issues.apache.org/jira/browse/YARN-8266 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: YARN-8266.001.patch > > > Steps: > 1) Start one application > 2) Go to cluster overview page > 3) Click on applicationId from Cluster Resource Usage By Application > This action redirects to > [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is > invalid url. It does not show any details. > Instead It should redirect to attempt page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474364#comment-16474364 ] Naganarasimha G R commented on YARN-8288: - Thanks [~cheersyang] , i agree with you. Committing this patch. > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8271) [UI2] Improve labeling of certain tables
[ https://issues.apache.org/jira/browse/YARN-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-8271: -- Summary: [UI2] Improve labeling of certain tables (was: Change UI2 labeling of certain tables to avoid confusion) > [UI2] Improve labeling of certain tables > > > Key: YARN-8271 > URL: https://issues.apache.org/jira/browse/YARN-8271 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: YARN-8271.0001.patch > > > Update labeling for few items to avoid confusion > - Cluster Page (/cluster-overview): > -- "Finished apps" --> "Finished apps from all users" > -- "Running apps" --> "Running apps from all users" > - Queues overview page (/yarn-queues/root) && Per queue page > (/yarn-queue/root/apps) > -- "Running Apps" --> "Running apps from all users in queue " > - Nodes Page - side bar for all pages > -- "List of Applications" --> "List of Applications on this node" > -- "List of Containers" --> "List of Containers on this node" > - Yarn Tools > ** Yarn Tools --> YARN Tools > - Queue page > ** Running Apps: --> Running Apps From All Users -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474175#comment-16474175 ] genericqa commented on YARN-8288: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 42m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 56m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8288 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12923278/YARN-8288.001.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 557a0f702bc9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f3f544b | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 334 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20717/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474124#comment-16474124 ] Sunil G commented on YARN-7863: --- Given YARN-7892 is now committed, updating this patch shortly > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil G >Assignee: Sunil G >Priority: Major > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474105#comment-16474105 ] Weiwei Yang commented on YARN-8288: --- Hi [~Naganarasimha] Those are user-defined properties, there are no default values for them. And latter section gives example how to configure them, which is pretty explanatory. What do you think? > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474102#comment-16474102 ] Naganarasimha G R commented on YARN-8288: - [~cheersyang], should it not be that we are putting the default value under "Value" column and push the description to the actual "Description" column ? > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure
[ https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474099#comment-16474099 ] Naganarasimha G R commented on YARN-7892: - Thanks for the review [~bibinchundatt] , [~sunilg] & [~leftnoteasy] and the commit by [~bibinchundatt]. > Revisit NodeAttribute class structure > - > > Key: YARN-7892 > URL: https://issues.apache.org/jira/browse/YARN-7892 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Major > Fix For: YARN-3409 > > Attachments: YARN-7892-YARN-3409.001.patch, > YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, > YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, > YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch, > YARN-7892-YARN-3409.007.patch, YARN-7892-YARN-3409.008.patch, > YARN-7892-YARN-3409.009.patch, YARN-7892-YARN-3409.010.patch > > > In the existing structure, we had kept the type and value along with the > attribute which would create confusion to the user to understand the APIs as > they would not be clear as to what needs to be sent for type and value while > fetching the mappings for node(s). > As well as equals will not make sense when we compare only for prefix and > name where as values for them might be different. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474079#comment-16474079 ] Weiwei Yang commented on YARN-8288: --- Uploaded the patch to fix this, targeting to 3.1.1. Please see screenshots in the attachment before and after the patch is applied. > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8288: -- Attachment: YARN-8288.001.patch > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8288: -- Attachment: before.jpg > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8288) Fix wrong number of table columns in Resource Model doc
[ https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8288: -- Attachment: after.jpg > Fix wrong number of table columns in Resource Model doc > --- > > Key: YARN-8288 > URL: https://issues.apache.org/jira/browse/YARN-8288 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8288.001.patch, after.jpg, before.jpg > > > In resource model doc, resource-types.xml and node-resource.xml description > table has wrong number of columns defined, see > [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8288) Fix wrong number of table columns in Resource Model doc
Weiwei Yang created YARN-8288: - Summary: Fix wrong number of table columns in Resource Model doc Key: YARN-8288 URL: https://issues.apache.org/jira/browse/YARN-8288 Project: Hadoop YARN Issue Type: Bug Reporter: Weiwei Yang Assignee: Weiwei Yang In resource model doc, resource-types.xml and node-resource.xml description table has wrong number of columns defined, see [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain
[ https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474013#comment-16474013 ] Rohith Sharma K S commented on YARN-7933: - bq. Isn't it the case that a TimelineClient must be able to authenticate with the TimelineCollector first before it can post data to that TimelineCollector? Here the intention I added is with ATS1.5 approach i.e if same client or different client publishes same domain id then collector need to check for ACLs for domain i.e owner. In our design, I was not sure about should we check owner for domain id, so I added a TODO. If this is not our design in future, we can remove this at any point of time. bq. Where do we check the delegation token on inside PerNodeTimelineCollectorService? Timeline Token verification is at filter layer at the time of http connection establishment i.e even before it reaches servlets. Follow the classes NodeTimelineCollectorManager#startWebApp TimelineAuthenticationFilter > [atsv2 read acls] Add TimelineWriter#writeDomain > - > > Key: YARN-7933 > URL: https://issues.apache.org/jira/browse/YARN-7933 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vrushali C >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-7933.01.patch, YARN-7933.02.patch, > YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch > > > > Add an API TimelineWriter#writeDomain for writing the domain info -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart
[ https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473961#comment-16473961 ] Gergo Repas commented on YARN-8191: --- [~wilfreds] Thanks for the explanation. I think returning *true* when removeEmptyIncompatibleQueues() does not remove the queue is intentional, as the javadoc of removeEmptyIncompatibleQueues() says: @return true if we can create queueToCreate or it already exists. But this is rather confusing, I'm thinking a way to refactor this (possibly to split up removeEmptyIncompatibleQueues()). > Fair scheduler: queue deletion without RM restart > - > > Key: YARN-8191 > URL: https://issues.apache.org/jira/browse/YARN-8191 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: Queue Deletion in Fair Scheduler.pdf, > YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, > YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, > YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, > YARN-8191.009.patch, YARN-8191.010.patch > > > The Fair Scheduler never cleans up queues even if they are deleted in the > allocation file, or were dynamically created and are never going to be used > again. Queues always remain in memory which leads to two following issues. > # Steady fairshares aren’t calculated correctly due to remaining queues > # WebUI shows deleted queues, which is confusing for users (YARN-4022). > We want to support proper queue deletion without restarting the Resource > Manager: > # Static queues without any entries that are removed from fair-scheduler.xml > should be deleted from memory. > # Dynamic queues without any entries should be deleted. > # RM Web UI should only show the queues defined in the scheduler at that > point in time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded
[ https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473925#comment-16473925 ] Szilard Nemeth commented on YARN-8273: -- Hi [~grepas]! Thanks for your patch. Here are some things I noticed: # In AppLogAggregatorImpl you added 2 {{LOG.warn(...)}} statements, I think they should be LOG.error instead. # In {{org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl#uploadLogsForContainers}}: You declared exc as a {{RuntimeException}}, the current code does not leverage this as exc will be assigned only an instance of {{LogAggregationDFSException}} so you could directly use type {{LogAggregationDFSException}} instead. > Log aggregation does not warn if HDFS quota in target directory is exceeded > --- > > Key: YARN-8273 > URL: https://issues.apache.org/jira/browse/YARN-8273 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: YARN-8273.000.patch > > > It appears that if an HDFS space quota is set on a target directory for log > aggregation and the quota is already exceeded when log aggregation is > attempted, zero-byte log files will be written to the HDFS directory, however > NodeManager logs do not reflect a failure to write the files successfully > (i.e. there are no ERROR or WARN messages to this effect). > An improvement may be worth investigating to alert users to this scenario, as > otherwise logs for a YARN application may be missing both on HDFS and locally > (after local log cleanup is done) and the user may not otherwise be informed. > Steps to reproduce: > * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB) > * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full > * Run a Spark or MR job in the cluster > * Observe that zero byte files are written to HDFS after job completion > * Observe that YARN container logs are also not present on the NM hosts (or > are deleted after yarn.nodemanager.delete.debug-delay-sec) > * Observe that no ERROR or WARN messages appear to be logged in the NM role > log -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8268) Fair scheduler: reservable queue is configured both as parent and leaf queue
[ https://issues.apache.org/jira/browse/YARN-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473906#comment-16473906 ] Gergo Repas commented on YARN-8268: --- [~haibochen] - thanks for the review and committing the change! [~wilfreds] - thanks for the review! > Fair scheduler: reservable queue is configured both as parent and leaf queue > > > Key: YARN-8268 > URL: https://issues.apache.org/jira/browse/YARN-8268 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8268.000.patch, YARN-8268.001.patch > > > The following allocation file > {code:java} > > > > someuser > > someuser > > > > > someuser > > > drf > > {code} > is being parsed as: {{PARENT=[root, root.dedicated], LEAF=[root.default, > root.dedicated]}} (AllocationConfiguration.configuredQueues). > The root.dedicated should only appear as a PARENT queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org