Hi Timothy,

Because I'm running Spark on Mesos alongside a secured Hadoop cluster, I need 
to ensure that my tasks running on the slaves perform a Kerberos login before 
accessing any HDFS resources.  To login, they just need the name of the 
principal (username) and a keytab file.  Then they just need to invoke the 
following java:

import org.apache.hadoop.security.UserGroupInformation
UserGroupInformation.loginUserFromKeytab(adminPrincipal, adminKeytab)

This is done in the driver in my Gist below, but I don't know how to run it 
within each executor on the slaves as tasks are ran.

Any help would be appreciated!


From: Timothy Chen [mailto:t...@mesosphere.io]
Sent: Friday, June 26, 2015 12:50 PM
To: Dave Ariens
Cc: user@spark.apache.org
Subject: Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos

Hi Dave,

I don't understand Keeberos much but if you know the exact steps that needs to 
happen I can see how we can make that happen with the Spark framework.

Tim

On Jun 26, 2015, at 8:49 AM, Dave Ariens 
<dari...@blackberry.com<mailto:dari...@blackberry.com>> wrote:

I understand that Kerberos support for accessing Hadoop resources in Spark only 
works when running Spark on YARN.  However, I'd really like to hack something 
together for Spark on Mesos running alongside a secured Hadoop cluster.  My 
simplified appplication (gist: 
https://gist.github.com/ariens/2c44c30e064b1790146a) receives a Kerberos 
principal and keytab when submitted.  The static main method called currently 
then performs a UserGroupInformation. loginUserFromKeytab(userPrincipal, 
userKeytab) and authenticates to the Hadoop.  This works on YARN (curiously 
without even without having to kinit first), but not on Mesos.  Is there a way 
to have the slaves  running the tasks perform the same kerberos login before 
they attempt to access HDFS?



Putting aside the security of Spark/Mesos and how that keytab would get 
distributed, I'm just looking for a working POC.



Is there a way to leverage the Broadcast capability to send a function that 
performs this?



https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.broadcast.Broadcast



Ideally, I'd love for this to not incur much overhead and just simply allow me 
to work around the absent Kerberos support...



Thanks,



Dave

Reply via email to