Eric Yang created YARN-7884: ------------------------------- Summary: Race condition in registering YARN service in ZooKeeper Key: YARN-7884 URL: https://issues.apache.org/jira/browse/YARN-7884 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.0 Reporter: Eric Yang
In Kerberos enabled cluster, there seems to be a race condition for registering YARN service. Yarn-service znode creation seems to happen after AM started and reporting back to update components information. For some reason, Yarnservice znode should have access to create the znode, but reported NoAuth. {code} 2018-02-02 22:53:30,442 [main] INFO service.ServiceScheduler - Set registry user accounts: sasl:hbase 2018-02-02 22:53:30,471 [main] INFO zk.RegistrySecurity - Registry default system acls: [1,s{'world,'anyone} , 31,s{'sasl,'yarn} , 31,s{'sasl,'jhs} , 31,s{'sasl,'hdfs-demo} , 31,s{'sasl,'rm} , 31,s{'sasl,'hive} ] 2018-02-02 22:53:30,472 [main] INFO zk.RegistrySecurity - Registry User ACLs [31,s{'sasl,'hbase} , 31,s{'sasl,'hbase} ] 2018-02-02 22:53:30,503 [main] INFO event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.ComponentEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler 2018-02-02 22:53:30,504 [main] INFO event.AsyncDispatcher - Registering class org.apache.hadoop.yarn.service.component.instance.ComponentInstanceEventType for class org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler 2018-02-02 22:53:30,528 [main] INFO impl.NMClientAsyncImpl - Upper bound of the thread pool size is 500 2018-02-02 22:53:30,531 [main] INFO service.ServiceMaster - Starting service as user hbase/eyang-5.openstacklo...@example.com (auth:KERBEROS) 2018-02-02 22:53:30,545 [main] INFO ipc.CallQueueManager - Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler 2018-02-02 22:53:30,554 [Socket Reader #1 for port 56859] INFO ipc.Server - Starting Socket Reader #1 for port 56859 2018-02-02 22:53:30,589 [main] INFO pb.RpcServerFactoryPBImpl - Adding protocol org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPB to the server 2018-02-02 22:53:30,606 [IPC Server Responder] INFO ipc.Server - IPC Server Responder: starting 2018-02-02 22:53:30,607 [IPC Server listener on 56859] INFO ipc.Server - IPC Server listener on 56859: starting 2018-02-02 22:53:30,607 [main] INFO service.ClientAMService - Instantiated ClientAMService at eyang-5.openstacklocal/172.26.111.20:56859 2018-02-02 22:53:30,609 [main] INFO zk.CuratorService - Creating CuratorService with connection fixed ZK quorum "eyang-1.openstacklocal:2181" 2018-02-02 22:53:30,615 [main] INFO zk.RegistrySecurity - Enabling ZK sasl client: jaasClientEntry = Client, principal = hbase/eyang-5.openstacklo...@example.com, keytab = /etc/security/keytabs/hbase.service.keytab 2018-02-02 22:53:30,752 [main] INFO client.RMProxy - Connecting to ResourceManager at eyang-1.openstacklocal/172.26.111.17:8032 2018-02-02 22:53:30,909 [main] INFO service.ServiceScheduler - Registering appattempt_1517611904996_0001_000001, abc into registry 2018-02-02 22:53:30,911 [main] INFO service.ServiceScheduler - Received 0 containers from previous attempt. 2018-02-02 22:53:31,072 [main] INFO service.ServiceScheduler - Could not read component paths: `/users/hbase/services/yarn-service/abc/components': No such file or directory: KeeperErrorCode = NoNode for /registry/users/hbase/services/yarn-service/abc/components 2018-02-02 22:53:31,074 [main] INFO service.ServiceScheduler - Triggering initial evaluation of component sleeper 2018-02-02 22:53:31,075 [main] INFO component.Component - [INIT COMPONENT sleeper]: 2 instances. 2018-02-02 22:53:31,094 [main] INFO component.Component - [COMPONENT sleeper] Transitioned from INIT to FLEXING on FLEX event. 2018-02-02 22:53:31,215 [pool-5-thread-1] ERROR service.ServiceScheduler - Failed to register app abc in registry org.apache.hadoop.registry.client.exceptions.NoPathPermissionsException: `/registry/users/hbase/services/yarn-service/abc': Not authorized to access path; ACLs: [ 0x01: 'world,'anyone 0x1f: 'sasl,'yarn 0x1f: 'sasl,'jhs 0x1f: 'sasl,'hdfs-demo 0x1f: 'sasl,'rm 0x1f: 'sasl,'hive 0x1f: 'sasl,'hbase 0x1f: 'sasl,'hbase ]: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc at org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:412) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:637) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkSet(CuratorService.java:679) at org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.bind(RegistryOperationsService.java:116) at org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:195) at org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:210) at org.apache.hadoop.yarn.service.ServiceScheduler$2.run(ServiceScheduler.java:462) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /registry/users/hbase/services/yarn-service/abc at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:740) at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:723) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:720) at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:484) at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474) at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:260) at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:214) at org.apache.hadoop.registry.client.impl.zk.CuratorService.zkCreate(CuratorService.java:635) ... 12 more 2018-02-02 22:53:33,135 [AMRM Callback Handler Thread] INFO service.ServiceScheduler - 2 containers allocated. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org