Re: Question about standalone Spark cluster reading from Kerberosed hadoop

Steve Loughran Fri, 23 Jun 2017 02:50:44 -0700

On 23 Jun 2017, at 10:22, Saisai Shao 
<sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>> wrote:


Spark running with standalone cluster manager currently doesn't support 
accessing security Hadoop. Basically the problem is that standalone mode Spark 
doesn't have the facility to distribute delegation tokens.

Currently only Spark on YARN or local mode supports security Hadoop.

Thanks
Jerry


There's possibly an ugly workaround where you ssh in to every node and log in 
direct to your kdc using a keytab you pushed out...that would eliminate the 
need for anything related to hadoop tokens. After all, that's essentially what 
spark-on-yarn does when when you give it keytab.


see also:  
https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details

On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong 
<kong.mu....@gmail.com<mailto:kong.mu....@gmail.com>> wrote:
Hi, all!

I was trying to read from a Kerberosed hadoop cluster from a standalone spark 
cluster.
Right now, I encountered some authentication issues with Kerberos:



java.io.IOException: Failed on local exception: java.io.IOException: 
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[TOKEN, KERBEROS]; Host Details : local host is: "XXXXXXXXXXXX"; 
destination host is: XXXXXXXXXXXXXXX;


I checked with klist, and principle/realm is correct.
I also used hdfs command line to poke HDFS from all the nodes, and it worked.
And if I submit job using local(client) mode, the job worked fine.

I tried to put everything from hadoop/conf to spark/conf and hive/conf to 
spark/conf.
Also tried edit spark/conf/spark-env.sh to add 
SPARK_SUBMIT_OPTS/SPARK_MASTER_OPTS/SPARK_SLAVE_OPTS/HADOOP_CONF_DIR/HIVE_CONF_DIR,
 and tried to export them in .bashrc as well.

However, I'm still experiencing the same exception.

Then I read some concerning posts about problems with kerberosed hadoop, some 
post like the following one:
http://blog.stratio.com/spark-kerberos-safe-story/
, which indicates that we can not access to kerberosed hdfs using standalone 
spark cluster.

I'm using spark 2.1.1, is it still the case that we can't access kerberosed 
hdfs with 2.1.1?

Thanks!


Best regards,
Mu

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

Reply via email to