[ https://issues.apache.org/jira/browse/YARN-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802688#comment-17802688 ]
Shilun Fan commented on YARN-7884: ---------------------------------- Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a blocker. Retarget 3.5.0. > Race condition in registering YARN service in ZooKeeper > ------------------------------------------------------- > > Key: YARN-7884 > URL: https://issues.apache.org/jira/browse/YARN-7884 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Affects Versions: 3.1.0 > Reporter: Eric Yang > Priority: Major > > In Kerberos enabled cluster, there seems to be a race condition for > registering YARN service. > Yarn-service znode creation seems to happen after AM started and reporting > back to update components information. For some reason, Yarnservice znode > should have access to create the znode, but reported NoAuth. > {code} > 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry > user accounts: sasl:hbase > 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default > system acls: > [1,s{'world,'anyone} > , 31,s{'sasl,'yarn} > , 31,s{'sasl,'jhs} > , 31,s{'sasl,'hdfs-demo} > , 31,s{'sasl,'rm} > , 31,s{'sasl,'hive} > ] > 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs > [31,s{'sasl,'hbase} > , 31,s{'sasl,'hbase} > ] > 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering > class org.apache.hadoop.yarn.service.component.ComponentEventType for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler > 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering > class > org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType > for class > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler > 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of > the thread pool size is 500 > 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service > as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) > 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: > class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: > class org.apache.hadoop.ipc.DefaultRpcScheduler > 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - > Starting Socket Reader #1 for port 56859 > 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding > protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to > the server > 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server > Responder: starting > 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC > Server listener on 56859: starting > 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated > ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 > 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating > CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" > 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl > client: jaasClientEntry = Client, principal = > hbase/eyang-5.openstacklo...@example.com, keytab = > /etc/security/keytabs/hbase.service.keytab > 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to > ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 > 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering > appattempt_1517611904996_0001_000001, abc into registry > 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 > containers from previous attempt. > 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not > read component paths: `/users/hbase/services/yarn-service/abc/components': No > such file or directory: KeeperErrorCode = NoNode for > /registry/users/hbase/services/yarn-service/abc/components > 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component sleeper > 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT > sleeper]: 2 instances. > 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT > sleeper] Transitioned from INIT to FLEXING on FLEX event. > 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - > Failed to register app abc in registry > org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: > `/registry/users/hbase/services/yarn-service/abc': Not authorized to access > path; ACLs: [ > 0x01: 'world,'anyone > 0x1f: 'sasl,'yarn > 0x1f: 'sasl,'jhs > 0x1f: 'sasl,'hdfs-demo > 0x1f: 'sasl,'rm > 0x1f: 'sasl,'hive > 0x1f: 'sasl,'hbase > 0x1f: 'sasl,'hbase > ]: KeeperErrorCode = NoAuth for > /registry/users/hbase/services/yarn-service/abc > at > org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:412) > at > org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:637) > at > org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:679) > at > org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.bind(RegistryOperationsService.java:116) > at > org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:195) > at > org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:210) > at > org.apache.hadoop.yarn.service.ServiceScheduler$2.run(ServiceScheduler.java:462) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.zookeeper.KeeperException$NoAuthException: > KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at > org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:740) > at > org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:723) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > at > org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:720) > at > org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:484) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474) > at > org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:260) > at > org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:214) > at > org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:635) > ... 12 more > 2018-02-02 22:53:33,135 [AMRM Callback Handler Thread] INFO > service.ServiceScheduler - 2 containers allocated. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org