There's a few security related issues that I am postponing dealing with. Once I get this working I'll look at the security side. Likely I'll be encouraging users to submit their jobs via docker containers. Regardless, getting the users keytab and principal name in the working environment of the executor isn't hard, it's being able to call the login method before the HDFS resources are accessed.
See the gist below. That login completes successfully but it's only on the driver. Once that HDFS resource is read with the Avro input format and key and the tasks are created inherently on the slaves they are reading from that HDFS resource within their own running environment (JVM?) and any file system instantiations performed by spark aren't by a UserGroupInformation resource associated to the principal. From: Marcelo Vanzin Sent: Friday, June 26, 2015 4:20 PM To: Tim Chen Cc: Olivier Girardot; Dave Ariens; user@spark.apache.org Subject: Re: Accessing Kerberos Secured HDFS Resources from Spark on Mesos On Fri, Jun 26, 2015 at 1:13 PM, Tim Chen <t...@mesosphere.io<mailto:t...@mesosphere.io>> wrote: So correct me if I'm wrong, sounds like all you need is a principal user name and also a keytab file downloaded right? I'm not familiar with Mesos so don't know what kinds of features it has, but at the very least it would need to start containers as the requesting users (like YARN does when running with Kerberos enabled), to avoid users being able to read each other's credentials. -- Marcelo