Finally find the root cause and raise a bug issue in 
https://issues.apache.org/jira/browse/SPARK-21819



Thanks very much.
Keith

From: Sun, Keith
Sent: 2017年8月22日 8:48
To: user@spark.apache.org
Subject: A bug in spark or hadoop RPC with kerberos authentication?

Hello ,

I met this very weird issue, while easy to reproduce, and stuck me for more 
than 1 day .I suspect this may be an issue/bug related to the class loader.
Can you help confirm the root cause ?

I want to specify a customized Hadoop configuration set instead of those on the 
class path(we have a few hadoop clusters and all have Kerberos security and I 
want to support different configuration).
Code/error like below.


The work around I found is to place a core-site.xml on the class path with 
below 2 properties will work.
By checking  the rpc code under org.apache.hadoop.ipc.RPC, I suspect the RPC 
code may not see the UGI class in the same classloader.
So UGI is initialized with default value on the classpth which is simple 
authentication.

core-site.xml with the security setup on the classpath:
<configuration>
    <property>
        <name>hadoop.security.authentication</name>
         <value>kerberos</value>
    </property>
    <property>
        <name>hadoop.security.authorization</name>
        <value>true</value>
    </property>

</configuration>

------------error------------------
2673 [main] DEBUG 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  - 
DataTransferProtocol using SaslPropertiesResolver, configured QOP 
dfs.data.transfer.protection = privacy, configured class 
dfs.data.transfer.saslproperties.resolver.class = class 
org.apache.hadoop.security.WhitelistBasedResolver
2696 [main] DEBUG org.apache.hadoop.service.AbstractService  - Service: 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
2744 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - 
PrivilegedAction as:xxxxx@xxxxxxxCOM (auth:KERBEROS) 
from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136) //
2746 [main] DEBUG org.apache.hadoop.yarn.ipc.YarnRPC  - Creating YarnRPC for 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
2746 [main] DEBUG org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC  - Creating a 
HadoopYarnProtoRpc proxy for protocol interface 
org.apache.hadoop.yarn.api.ApplicationClientProtocol
2801 [main] DEBUG org.apache.hadoop.ipc.Client  - getting client out of cache: 
org.apache.hadoop.ipc.Client@748fe51d<mailto:org.apache.hadoop.ipc.Client@748fe51d>
2981 [main] DEBUG org.apache.hadoop.service.AbstractService  - Service 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
3004 [main] DEBUG org.apache.hadoop.ipc.Client  - The ping interval is 60000 ms.
3005 [main] DEBUG org.apache.hadoop.ipc.Client  - Connecting to 
yarn-rm-1/xxxxx:8032
3019 [IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032 from 
xxxxx@xxxxxx] DEBUG org.apache.hadoop.ipc.Client  - IPC Client (2012095985) 
connection to yarn-rm-1/xxxxx:8032 from xxxxx@xxxxxx: starting, having 
connections 1
3020 [IPC Parameter Sending Thread #0] DEBUG org.apache.hadoop.ipc.Client  - 
IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032 from xxxxx@xxxxxx 
sending #0
3025 [IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032 from 
xxxxx@xxxxxx] DEBUG org.apache.hadoop.ipc.Client  - IPC Client (2012095985) 
connection to yarn-rm-1/xxxxx:8032 from xxxxx@xxxxxx got value #-1
3026 [IPC Client (2012095985) connection to yarn-rm-1/xxxxx:8032 from 
xxxxx@xxxxxx] DEBUG org.apache.hadoop.ipc.Client  - closing ipc connection to 
yarn-rm-1/xxxxx:8032: SIMPLE authentication is not enabled.  Available:[TOKEN, 
KERBEROS]
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1131)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)
---------------code-------------------

      Configuration hc = new  Configuration(false);

      hc.addResource("myconf /yarn-site.xml");
      hc.addResource("myconf/core-site.xml");
      hc.addResource("myconf/hdfs-site.xml");
      hc.addResource("myconf/hive-site.xml");

      SparkConf sc = new SparkConf(true);
      // add config in spark conf as no xml in the classpath except those 
“default.xml” from Hadoop jars.
      hc.forEach(entry-> {
            if(entry.getKey().startsWith("hive")) {
                sc.set(entry.getKey(), entry.getValue());
            }else {
                sc.set("spark.hadoop."+entry.getKey(), entry.getValue());
            }
         });

       UserGroupInformation.setConfiguration(hc);
       UserGroupInformation.loginUserFromKeytab(Principal, Keytab);

      System.out.println("####spark-conf######");
      System.out.println(sc.toDebugString());


      SparkSession sparkSessesion= SparkSession
            .builder()
            .master("yarn-client") //"yarn-client", "local"
            .config(sc)
            .appName(SparkEAZDebug.class.getName())
            .enableHiveSupport()
            .getOrCreate();

Thanks very much.
Keith

Reply via email to