[ https://issues.apache.org/jira/browse/YARN-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911585#comment-16911585 ]
Eric Yang commented on YARN-9755: --------------------------------- [~Prabhu Joseph] Thank you for patch 003. Mapreduce job can run fine, but there seems to be another problem. When trying to submit yarn service application, the request hang with this exception: {code} 2019-08-20 17:06:04,996 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for rm/eyang-2.openstacklo...@example.com (auth:KERBEROS) 2019-08-20 17:06:05,003 INFO org.apache.hadoop.ipc.Server: Connection from 172.26.111.18:34295 for protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB is unauthorized for user hbase (auth:PROXY) via rm/eyang-2.openstacklo...@example.com (auth:KERBEROS) 2019-08-20 17:06:05,004 INFO org.apache.hadoop.io.retry.RetryInvocationHandler: org.apache.hadoop.security.authorize.AuthorizationException: User: rm/eyang-2.openstacklo...@example.com is not allowed to impersonate hbase, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm2 after 1 failover attempts. Trying to failover after sleeping for 35311ms. 2019-08-20 17:06:10,568 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 {code} After a few retries, Resource manager logs error with: {code} 2019-08-20 17:09:21,360 ERROR org.apache.hadoop.yarn.service.webapp.ApiServer: Failed to create service c: {} java.net.ConnectException: Call From eyang-2.openstacklocal/172.26.111.18 to eyang-1.openstacklocal:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.GeneratedConstructorAccessor70.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1557) at org.apache.hadoop.ipc.Client.call(Client.java:1499) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy96.getApplications(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplications(ApplicationClientProtocolPBClientImpl.java:316) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy97.getApplications(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplications(YarnClientImpl.java:629) at org.apache.hadoop.yarn.service.client.ServiceClient.verifyNoLiveAppInRM(ServiceClient.java:952) at org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:555) at org.apache.hadoop.yarn.service.webapp.ApiServer$2.run(ApiServer.java:142) at org.apache.hadoop.yarn.service.webapp.ApiServer$2.run(ApiServer.java:135) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891) at org.apache.hadoop.yarn.service.webapp.ApiServer.createService(ApiServer.java:135) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1780) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:180) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.ProxyUserAuthenticationFilter.doFilter(ProxyUserAuthenticationFilter.java:104) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1645) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:539) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) {code} This feature creates a circular dependency. "fs" object used by RM wants to use HDFS copy of YARN configuration, but the proxy ACL list is in common-site.xml. RM becomes unable to perform impersonation because it does not have ACL config to perform hdfs operation to fetch configuration. The workaround is to ensure fs object used by RM uses the bootstrapConf instead of HDFS copy of YARN configuration. However, the workaround may defeat the purpose of this feature unless we are very careful about the usage of this configuration is for applications that runs in YARN framework only. YARN framework itself uses the bootstrap copy. Is this a statement that we can confirm? > RM fails to start with FileSystemBasedConfigurationProvider > ----------------------------------------------------------- > > Key: YARN-9755 > URL: https://issues.apache.org/jira/browse/YARN-9755 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 3.3.0 > Reporter: Prabhu Joseph > Assignee: Prabhu Joseph > Priority: Major > Attachments: YARN-9755-001.patch, YARN-9755-002.patch, > YARN-9755-003.patch > > > RM fails to start with below exception when > FileSystemBasedConfigurationProvider is used. > *Exception:* > {code} > 2019-08-16 12:05:33,802 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting > ResourceManager > org.apache.hadoop.service.ServiceStateException: java.io.IOException: > java.io.IOException: Filesystem closed > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:868) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1281) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:1312) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1335) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1328) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1328) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1379) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1567) > Caused by: java.io.IOException: java.io.IOException: Filesystem closed > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:64) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:346) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:445) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > ... 14 more > Caused by: java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1682) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1586) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1701) > at > org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider.getConfigurationInputStream(FileSystemBasedConfigurationProvider.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:56) > {code} > FileSystemBasedConfigurationProvider uses the cached FileSystem causing the > issue. > *Configs:* > {code} > <property><name>yarn.resourcemanager.configuration.provider-class</name><value>org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider</value></property> > <property><name>yarn.resourcemanager.configuration.file-system-based-store</name><value>/yarn/conf</value></property> > [yarn@yarndocker-1 yarn]$ hadoop fs -ls /yarn/conf > -rw-r--r-- 3 yarn supergroup 4138 2019-08-16 13:09 > /yarn/conf/capacity-scheduler.xml > -rw-r--r-- 3 yarn supergroup 494 2019-08-16 11:41 > /yarn/conf/core-site.xml > -rw-r--r-- 3 yarn supergroup 11392 2019-08-16 11:52 > /yarn/conf/hadoop-policy.xml > -rw-r--r-- 3 yarn supergroup 11492 2019-08-16 11:41 > /yarn/conf/yarn-site.xml > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org