Hi guys, Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS in HA mode (with kerberos)?
I have a wordcount job running with Spark3 reading data on HDFS (hadoop 3.1) everything secure with kerberos. Everything works fine if the data folder is not encrypted (spark on k8s). If the data is on an encrypted folder, Spark3 on yarn is working fine but it doesn't work when Spark3 is running on K8S. I submit the job with spark-submit command and I provide the keytab and the principal to use. I got the kerberos error saying that there is no TGT to authenticate to the KMS (ranger kms, full stack trace of the error at the end of the mail) servers but in the log I can see that Spark get 2 delegation token, one for each KMS servers: ------------------------------ ------------------------------ 20/08/13 10:50:50 INFO HadoopDelegationTokenManager: Attempting to login to KDC using principal: mytestu...@paf.com 20/08/13 10:50:50 INFO HadoopDelegationTokenManager: Successfully logged into KDC. 20/08/13 10:50:52 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 20/08/13 10:50:52 INFO HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-237056190_16, ugi= mytestu...@paf.com (auth:KERBEROS)]] with renewer testuser 20/08/13 10:50:52 INFO DFSClient: Created token for testuser: HDFS_DELEGATION_TOKEN owner= mytestu...@paf.com, renewer=testuser, realUser=, issueDate=1597315852353, maxDate=1597920652353, sequenceNumber=55185062, masterKeyId=1964 on ha-hdfs:cluster2 20/08/13 10:50:52 INFO KMSClientProvider: New token created: (Kind: kms-dt, Service: kms://ht...@server2.paf.com:9393/kms, Ident: (kms-dt owner=testuser, renewer=testuser, realUser=, issueDate=1597315852642, maxDate=1597920652642, sequenceNumber=3929883, masterKeyId=623)) 20/08/13 10:50:52 INFO HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-237056190_16, ugi= testu...@paf.com (auth:KERBEROS)]] with renewer testu...@paf.com 20/08/13 10:50:52 INFO DFSClient: Created token for testuser: HDFS_DELEGATION_TOKEN owner=testu...@paf.com, renewer=testuser, realUser=, issueDate=1597315852744, maxDate=1597920652744, sequenceNumber=55185063, masterKeyId=1964 on ha-hdfs:cluster2 20/08/13 10:50:52 INFO KMSClientProvider: New token created: (Kind: kms-dt, Service: kms://ht...@server.paf.com:9393/kms, Ident: (kms-dt owner=testuser, renewer=testuser, realUser=, issueDate=1597315852839, maxDate=1597920652839, sequenceNumber=3929884, masterKeyId=624)) 20/08/13 10:50:52 INFO HadoopFSDelegationTokenProvider: Renewal interval is 86400104 for token HDFS_DELEGATION_TOKEN 20/08/13 10:50:52 INFO HadoopFSDelegationTokenProvider: Renewal interval is 86400108 for token kms-dt 20/08/13 10:50:54 INFO HiveConf: Found configuration file null 20/08/13 10:50:54 INFO HadoopDelegationTokenManager: Scheduling renewal in 18.0 h. 20/08/13 10:50:54 INFO HadoopDelegationTokenManager: Updating delegation tokens. 20/08/13 10:50:54 INFO SparkHadoopUtil: Updating delegation tokens for current user. 20/08/13 10:50:55 INFO SparkHadoopUtil: Updating delegation tokens for current user. ------------------- ------------------------------ In the core-site.xml, I have the following property for the 2 kms server ---------- <name>hadoop.security.key.provider.path</name> <value>kms://ht...@server.paf.com;server2.paf.com:9393/kms</value> --------- Does anyone have an idea how to make it work? Or at least anyone has been able to make it work? Does anyone know where the delegation tokens are saved during the execution of jobs on k8s and how it is shared between the executors? Thanks, Michel PS: The full stack trace of the error: ---------------------------- Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 22 in stage 0.0 failed 4 times, most recent failure: Lost task 22.3 in stage 0.0 (TID 23, 10.5.5.5, executor 1): java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: Error while authenticating with endpoint: https://server.paf.com:9393/kms/v1/keyversion/dir_tmp_key%400/_eek?eek_op=decrypt at org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:525) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:826) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:351) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:347) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:172) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:347) at org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:532) at org.apache.hadoop.hdfs.HdfsKMSUtil.decryptEncryptedDataEncryptionKey(HdfsKMSUtil.java:206) at org.apache.hadoop.hdfs.HdfsKMSUtil.createWrappedInputStream(HdfsKMSUtil.java:186) at org.apache.hadoop.hdfs.DFSClient.createWrappedInputStream(DFSClient.java:930) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:321) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:333) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:899) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:282) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:281) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:103) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Error while authenticating with endpoint: https://server.paf.com:9393/kms/v1/keyversion/dir_tmp_key%400/_eek?eek_op=decrypt at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:219) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:143) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:329) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:512) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:507) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:506) ... 41 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:360) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:204) ... 50 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:162) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:189) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:336) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator$1.run(KerberosAuthenticator.java:310) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:310) ... 51 more