[ https://issues.apache.org/jira/browse/SPARK-14743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Saisai Shao updated SPARK-14743: -------------------------------- Component/s: YARN > Improve delegation token handling in secure clusters > ---------------------------------------------------- > > Key: SPARK-14743 > URL: https://issues.apache.org/jira/browse/SPARK-14743 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN > Affects Versions: 2.0.0 > Reporter: Marcelo Vanzin > > In a way, I'd consider this a parent bug of SPARK-7252. > Spark's current support for delegation tokens is a little all over the place: > - for HDFS, there's support for re-creating tokens if a principal and keytab > are provided > - for HBase and Hive, Spark will fetch delegation tokens so that apps can > work in cluster mode, but will not re-create them, so apps that need those > will stop working after 7 days > - for anything else, Spark doesn't do anything. Lots of other services use > delegation tokens, and supporting them as data sources in Spark becomes more > complicated because of that. e.g., Kafka will (hopefully) soon support them. > It would be nice if Spark had consistent support for handling delegation > tokens regardless of who needs them. I'd list these as the requirements: > - Spark to provide a generic interface for fetching delegation tokens. This > would allow Spark's delegation token support to be extended using some plugin > architecture (e.g. Java services), meaning Spark itself doesn't need to > support every possible service out there. > This would be used to fetch tokens when launching apps in cluster mode, and > when a principal and a keytab are provided to Spark. > - A way to manually update delegation tokens in Spark. For example, a new > SparkContext API, or some configuration that tells Spark to monitor a file > for changes and load tokens from said file. > This would allow external applications to manage tokens outside of Spark and > be able to update a running Spark application (think, for example, a job > sever like Oozie, or something like Hive-on-Spark which manages Spark apps > running remotely). > - A way to notify running code that new delegation tokens have been loaded. > This may not be strictly necessary; it might be possible for code to detect > that, e.g., by peeking into the UserGroupInformation structure. But an event > sent to the listener bus would allow applications to react when new tokens > are available (e.g., the Hive backend could re-create connections to the > metastore server using the new tokens). > Also, cc'ing [~busbey] and [~steve_l] since you've talked about this in the > mailing list recently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org