Re: How to use spark to access HBase with Security enabled

Bill Q Thu, 21 May 2015 14:32:08 -0700

What I found with the CDH-5.4.1 Spark 1.3, the
spark.executor.extraClassPath setting is not working. Had to use
SPARK_CLASSPATH instead.


On Thursday, May 21, 2015, Ted Yu <yuzhih...@gmail.com> wrote:

> Are the worker nodes colocated with HBase region servers ?
>
> Were you running as hbase super user ?
>
> You may need to login, using code similar to the following:
>
>       if (isSecurityEnabled()) {
>
>         SecurityUtil.login(conf, fileConfKey, principalConfKey, localhost);
>
>       }
>
> SecurityUtil is hadoop class.
>
>
> Cheers
>
> On Thu, May 21, 2015 at 1:58 AM, donhoff_h <165612...@qq.com
> <javascript:_e(%7B%7D,'cvml','165612...@qq.com');>> wrote:
>
>> Hi,
>>
>> Many thanks for the help. My Spark version is 1.3.0 too and I run it on
>> Yarn. According to your advice I have changed the configuration. Now my
>> program can read the hbase-site.xml correctly. And it can also authenticate
>> with zookeeper successfully.
>>
>> But I meet a new problem that is my program still can not pass the
>> authentication of HBase. Did you or anybody else ever meet such kind of
>> situation ?  I used a keytab file to provide the principal. Since it can
>> pass the authentication of the Zookeeper, I am sure the keytab file is OK.
>> But it jsut can not pass the authentication of HBase. The exception is
>> listed below and could you or anybody else help me ? Still many many thanks!
>>
>> ****************************Exception***************************
>> 15/05/21 16:03:18 INFO zookeeper.ZooKeeper: Initiating client connection,
>> connectString=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181
>> sessionTimeout=90000 watcher=hconnection-0x4e142a710x0,
>> quorum=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181,
>> baseZNode=/hbase
>> 15/05/21 16:03:18 INFO zookeeper.Login: successfully logged in.
>> 15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh thread started.
>> 15/05/21 16:03:18 INFO client.ZooKeeperSaslClient: Client will use GSSAPI
>> as SASL mechanism.
>> 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Opening socket connection to
>> server bgdt02.dev.hrb/130.1.9.98:2181. Will attempt to SASL-authenticate
>> using Login Context section 'Client'
>> 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Socket connection
>> established to bgdt02.dev.hrb/130.1.9.98:2181, initiating session
>> 15/05/21 16:03:18 INFO zookeeper.Login: TGT valid starting at:        Thu
>> May 21 16:03:18 CST 2015
>> 15/05/21 16:03:18 INFO zookeeper.Login: TGT expires:                  Fri
>> May 22 16:03:18 CST 2015
>> 15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh sleeping until: Fri
>> May 22 11:43:32 CST 2015
>> 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Session establishment
>> complete on server bgdt02.dev.hrb/130.1.9.98:2181, sessionid =
>> 0x24d46cb0ffd0020, negotiated timeout = 40000
>> 15/05/21 16:03:18 WARN mapreduce.TableInputFormatBase: initializeTable
>> called multiple times. Overwriting connection and table reference;
>> TableInputFormatBase will not close these old references when done.
>> 15/05/21 16:03:19 INFO util.RegionSizeCalculator: Calculating region
>> sizes for table "ns_dev1:hd01".
>> 15/05/21 16:03:19 WARN ipc.AbstractRpcClient: Exception encountered while
>> connecting to the server : javax.security.sasl.SaslException: GSS initiate
>> failed [Caused by GSSException: No valid credentials provided (Mechanism
>> level: Failed to find any Kerberos tgt)]
>> 15/05/21 16:03:19 ERROR ipc.AbstractRpcClient: SASL authentication
>> failed. The most likely cause is missing or invalid credentials. Consider
>> 'kinit'.
>> javax.security.sasl.SaslException: GSS initiate failed [Caused by
>> GSSException: No valid credentials provided (Mechanism level: Failed to
>> find any Kerberos tgt)]
>>                 at
>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>>                 at
>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>>                 at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:604)
>>                 at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:153)
>>                 at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:730)
>>                 at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:727)
>>                 at java.security.AccessController.doPrivileged(Native
>> Method)
>>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>>                 at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>                 at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:727)
>>                 at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:880)
>>                 at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:849)
>>                 at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1173)
>>                 at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
>>                 at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
>>                 at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:31751)
>>                 at
>> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:332)
>>                 at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:187)
>>                 at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>>                 at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
>>                 at
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:294)
>>                 at
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:275)
>>                 at
>> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>                 at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>                 at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>                 at java.lang.Thread.run(Thread.java:745)
>>
>> ***********************I aslo list my codes as below if someone can give
>> me some advice from it*************************
>> object TestHBaseRead {
>>  def main(args: Array[String]) {
>>    val conf = new SparkConf()
>>    val sc = new SparkContext(conf)
>>    val hbConf = HBaseConfiguration.create(sc.hadoopConfiguration)
>>    val tbName = if(args.length==1) args(0) else "ns_dev1:hd01"
>>    hbConf.set(TableInputFormat.INPUT_TABLE,tbName)
>>    //I print the content of hbConf to check if it read the correct
>> hbase-site.xml
>>    val it = hbConf.iterator()
>>    while(it.hasNext) {
>>      val e = it.next()
>>      println("Key="+ e.getKey +" Value="+e.getValue)
>>    }
>>
>>    val rdd =
>> sc.newAPIHadoopRDD(hbConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result])
>>    rdd.foreach(x=>{
>>      val key = x._1.toString
>>      val it = x._2.listCells().iterator()
>>     while(it.hasNext) {
>>       val c = it.next()
>>        val family = Bytes.toString(CellUtil.cloneFamily(c))
>>        val qualifier = Bytes.toString(CellUtil.cloneQualifier(c))
>>        val value = Bytes.toString(CellUtil.cloneValue(c))
>>        val tm = c.getTimestamp
>>        println("Key="+key+" Family="+family+" Qualifier="+qualifier+"
>> Value="+value+" TimeStamp="+tm)
>>      }
>>    })
>>    sc.stop()
>>  }
>> }
>>
>> ***************************I used the following command to run my
>> program**********************
>> spark-submit --class dhao.test.read.singleTable.TestHBaseRead --master
>> yarn-cluster --driver-java-options
>> "-Djava.security.auth.login.config=/home/spark/spark-hbase.jaas
>> -Djava.security.krb5.conf=/etc/krb5.conf" --conf
>> spark.executor.extraJavaOptions="-Djava.security.auth.login.config=/home/spark/spark-hbase.jaas
>> -Djava.security.krb5.conf=/etc/krb5.conf" /home/spark/myApps/TestHBase.jar
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Bill Q";<bill.q....@gmail.com
>> <javascript:_e(%7B%7D,'cvml','bill.q....@gmail.com');>>;
>> *发送时间:* 2015年5月20日(星期三) 晚上10:13
>> *收件人:* "donhoff_h"<165612...@qq.com
>> <javascript:_e(%7B%7D,'cvml','165612...@qq.com');>>;
>> *抄送:* "yuzhihong"<yuzhih...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','yuzhih...@gmail.com');>>; "user"<
>> user@spark.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@spark.apache.org');>>;
>> *主题:* Re: How to use spark to access HBase with Security enabled
>>
>> I have similar problem that I cannot pass the HBase configuration file as
>> extra classpath to Spark any more using
>> spark.executor.extraClassPath=MY_HBASE_CONF_DIR in the Spark 1.3. We used
>> to run this in 1.2 without any problem.
>>
>> On Tuesday, May 19, 2015, donhoff_h <165612...@qq.com
>> <javascript:_e(%7B%7D,'cvml','165612...@qq.com');>> wrote:
>>
>>>
>>> Sorry, this ref does not help me.  I have set up the configuration in
>>> hbase-site.xml. But it seems there are still some extra configurations to
>>> be set or APIs to be called to make my spark program be able to pass the
>>> authentication with the HBase.
>>>
>>> Does anybody know how to set authentication to a secured HBase in a
>>> spark program which use the API "newAPIHadoopRDD" to get information from
>>> HBase?
>>>
>>> Many Thanks!
>>>
>>> ------------------ 原始邮件 ------------------
>>> *发件人:* "yuzhihong";<yuzhih...@gmail.com>;
>>> *发送时间:* 2015年5月19日(星期二) 晚上9:54
>>> *收件人:* "donhoff_h"<165612...@qq.com>;
>>> *抄送:* "user"<user@spark.apache.org>;
>>> *主题:* Re: How to use spark to access HBase with Security enabled
>>>
>>> Please take a look at:
>>>
>>> http://hbase.apache.org/book.html#_client_side_configuration_for_secure_operation
>>>
>>> Cheers
>>>
>>> On Tue, May 19, 2015 at 5:23 AM, donhoff_h <165612...@qq.com> wrote:
>>>
>>>>
>>>> The principal is sp...@bgdt.dev.hrb. It is the user that I used to run
>>>> my spark programs. I am sure I have run the kinit command to make it take
>>>> effect. And I also used the HBase Shell to verify that this user has the
>>>> right to scan and put the tables in HBase.
>>>>
>>>> Now I still have no idea how to solve this problem. Can anybody help me
>>>> to figure it out? Many Thanks!
>>>>
>>>> ------------------ 原始邮件 ------------------
>>>> *发件人:* "yuzhihong";<yuzhih...@gmail.com>;
>>>> *发送时间:* 2015年5月19日(星期二) 晚上7:55
>>>> *收件人:* "donhoff_h"<165612...@qq.com>;
>>>> *抄送:* "user"<user@spark.apache.org>;
>>>> *主题:* Re: How to use spark to access HBase with Security enabled
>>>>
>>>> Which user did you run your program as ?
>>>>
>>>> Have you granted proper permission on hbase side ?
>>>>
>>>> You should also check master log to see if there was some clue.
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>> On May 19, 2015, at 2:41 AM, donhoff_h <165612...@qq.com> wrote:
>>>>
>>>> Hi, experts.
>>>>
>>>> I ran the "HBaseTest" program which is an example from the Apache Spark
>>>> source code to learn how to use spark to access HBase. But I met the
>>>> following exception:
>>>> Exception in thread "main"
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>>>> attempts=36, exceptions:
>>>> Tue May 19 16:59:11 CST 2015, null, java.net.SocketTimeoutException:
>>>> callTimeout=60000, callDuration=68648: row 'spark_t01,,00000000000000' on
>>>> table 'hbase:meta' at region=hbase:meta,,1.1588230740,
>>>> hostname=bgdt01.dev.hrb,16020,1431412877700, seqNum=0
>>>>
>>>> I also checked the RegionServer Log of the host "bgdt01.dev.hrb" listed
>>>> in the above exception. I found a few entries like the following one:
>>>> 2015-05-19 16:59:11,143 DEBUG
>>>> [RpcServer.reader=2,bindAddress=bgdt01.dev.hrb,port=16020] ipc.RpcServer:
>>>> RpcServer.listener,port=16020: Caught exception while
>>>> reading:Authentication is required
>>>>
>>>> The above entry did not point to my program clearly. But the time is
>>>> very near. Since my hbase version is HBase1.0.0 and I set security enabled,
>>>> I doubt the exception was caused by the Kerberos authentication.  But I am
>>>> not sure.
>>>>
>>>> Do anybody know if my guess is right? And if I am right, could anybody
>>>> tell me how to set Kerberos Authentication in a spark program? I don't know
>>>> how to do it. I already checked the API doc , but did not found any API
>>>> useful. Many Thanks!
>>>>
>>>> By the way, my spark version is 1.3.0. I also paste the code of
>>>> "HBaseTest" in the following:
>>>> ***************************Source Code******************************
>>>> object HBaseTest {
>>>>   def main(args: Array[String]) {
>>>>     val sparkConf = new SparkConf().setAppName("HBaseTest")
>>>>     val sc = new SparkContext(sparkConf)
>>>>     val conf = HBaseConfiguration.create()
>>>>     conf.set(TableInputFormat.INPUT_TABLE, args(0))
>>>>
>>>>     // Initialize hBase table if necessary
>>>>     val admin = new HBaseAdmin(conf)
>>>>     if (!admin.isTableAvailable(args(0))) {
>>>>       val tableDesc = new HTableDescriptor(args(0))
>>>>       admin.createTable(tableDesc)
>>>>     }
>>>>
>>>>     val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
>>>>       classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
>>>>       classOf[org.apache.hadoop.hbase.client.Result])
>>>>
>>>>     hBaseRDD.count()
>>>>
>>>>     sc.stop()
>>>>   }
>>>> }
>>>>
>>>>
>>>
>>
>> --
>> Many thanks.
>>
>>
>> Bill
>>
>>
>

-- 
Many thanks.


Bill

Re: How to use spark to access HBase with Security enabled

Reply via email to