[ https://issues.apache.org/jira/browse/SPARK-23361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Saisai Shao resolved SPARK-23361. --------------------------------- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20657 [https://github.com/apache/spark/pull/20657] > Driver restart fails if it happens after 7 days from app submission > ------------------------------------------------------------------- > > Key: SPARK-23361 > URL: https://issues.apache.org/jira/browse/SPARK-23361 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.1.0 > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Priority: Major > Fix For: 2.4.0 > > > If you submit an app that is supposed to run for > 7 days (so using > \-\principal / \-\-keytab in cluster mode), and there's a failure that causes > the driver to restart after 7 days (that being the default token lifetime for > HDFS), the new driver will fail with an error like the following: > {noformat} > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (lots of uninteresting token info) can't be found in cache > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1409) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2123) > at > org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1253) > at > org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1249) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1249) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$6.apply(ApplicationMaster.scala:160) > {noformat} > Note: lines may not align with actual Apache code because that's our internal > build. > This happens because as part of the app submission, the launcher provides > delegation tokens to be used by the AM (=driver in this case), and those are > expired at that point in time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org