RE: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

Dave Ariens Fri, 26 Jun 2015 12:17:30 -0700

Hi Timothy,

Because I'm running Spark on Mesos alongside a secured Hadoop cluster, I need 
to ensure that my tasks running on the slaves perform a Kerberos login before 
accessing any HDFS resources.  To login, they just need the name of the 
principal (username) and a keytab file.  Then they just need to invoke the 
following java:


import org.apache.hadoop.security.UserGroupInformation
UserGroupInformation.loginUserFromKeytab(adminPrincipal, adminKeytab)

This is done in the driver in my Gist below, but I don't know how to run it 
within each executor on the slaves as tasks are ran.

Any help would be appreciated!


From: Timothy Chen [mailto:[email protected]]
Sent: Friday, June 26, 2015 12:50 PM
To: Dave Ariens
Cc: [email protected]
Subject: Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

Hi Dave,

I don't understand Keeberos much but if you know the exact steps that needs to 
happen I can see how we can make that happen with the Spark framework.

Tim

On Jun 26, 2015, at 8:49 AM, Dave Ariens 
<[email protected]<mailto:[email protected]>> wrote:

I understand that Kerberos support for accessing Hadoop resources in Spark only 
works when running Spark on YARN.  However, I'd really like to hack something 
together for Spark on Mesos running alongside a secured Hadoop cluster.  My 
simplified appplication (gist: 
https://gist.github.com/ariens/2c44c30e064b1790146a) receives a Kerberos 
principal and keytab when submitted.  The static main method called currently 
then performs a UserGroupInformation. loginUserFromKeytab(userPrincipal, 
userKeytab) and authenticates to the Hadoop.  This works on YARN (curiously 
without even without having to kinit first), but not on Mesos.  Is there a way 
to have the slaves  running the tasks perform the same kerberos login before 
they attempt to access HDFS?



Putting aside the security of Spark/Mesos and how that keytab would get 
distributed, I'm just looking for a working POC.



Is there a way to leverage the Broadcast capability to send a function that 
performs this?



https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.broadcast.Broadcast



Ideally, I'd love for this to not incur much overhead and just simply allow me 
to work around the absent Kerberos support...



Thanks,



Dave

RE: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

Reply via email to