Accessing a kerberized HDFS using Spark on Openshift

Gal Shinder Wed, 13 Oct 2021 03:14:16 -0700

Hi,

I have a pod on openshift 4.6 running a jupyter notebook with spark 3.1.1 and 
python 3.7 (based on open data hub, tweaked the dockerfile because I wanted 
this specific python version).


I'm trying to run spark in client mode using the image of google's spark 
operator (gcr.io/spark-operator/spark-py:v3.1.1), spark runs fine but I'm 
unable to connect to a kerberized cloudera hdfs, I've tried the examples 
outlined in the security documentation 
(https://github.com/apache/spark/blob/master/docs/security.md#secure-interaction-with-kubernetes)
 and numerous other combinations but nothing seems to work.

I managed to authenticate with kerberos by passing additional java parameters 
to the driver and executors (-Djava.security.krb5.conf), and passing the 
kerberos config to the executors using the configmap auto generated from the 
folder which SPARK_CONF points to on the driver, I'll try to pass the hadoop 
configuration files like that as well and set the hadoop home just to test the 
connection.
 
I don't want to use that solution in prod, 
`spark.kubernetes.kerberos.krb5.configMapName` and 
`spark.kubernetes.hadoop.configMapName` don't seem to do anything, the pod spec 
of the executors doesn't have those volumes, I'm using 
`spark.kubernetes.authenticate.oauthToken` to authenticate with k8s and I'm 
using a user who is a clusteradmin.

I also don't want to get a delegation token, figured I can just use the keytab 
even though the examples in the security documentation don't mention using a 
keytab with the configmaps.

The configuration I'm trying to use:
spark.kubernetes.authenticate.oauthToken with the oauth token of a cluster 
admin.

spark.kubernetes.hadoop.configMapName pointing to a configmap containing the 
core-site.xml and hdfs-site.xml I got from the cloudera manager

spark.kubernetes.kerberos.krb5.configMapName pointing to a configmap containing 
a krb5.conf

spark.kerberos.keytab 

spark.kerberos.principal


Thanks, 
Gal

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Accessing a kerberized HDFS using Spark on Openshift

Reply via email to