What I found with the CDH-5.4.1 Spark 1.3, the spark.executor.extraClassPath setting is not working. Had to use SPARK_CLASSPATH instead.
On Thursday, May 21, 2015, Ted Yu <yuzhih...@gmail.com> wrote: > Are the worker nodes colocated with HBase region servers ? > > Were you running as hbase super user ? > > You may need to login, using code similar to the following: > > if (isSecurityEnabled()) { > > SecurityUtil.login(conf, fileConfKey, principalConfKey, localhost); > > } > > SecurityUtil is hadoop class. > > > Cheers > > On Thu, May 21, 2015 at 1:58 AM, donhoff_h <165612...@qq.com > <javascript:_e(%7B%7D,'cvml','165612...@qq.com');>> wrote: > >> Hi, >> >> Many thanks for the help. My Spark version is 1.3.0 too and I run it on >> Yarn. According to your advice I have changed the configuration. Now my >> program can read the hbase-site.xml correctly. And it can also authenticate >> with zookeeper successfully. >> >> But I meet a new problem that is my program still can not pass the >> authentication of HBase. Did you or anybody else ever meet such kind of >> situation ? I used a keytab file to provide the principal. Since it can >> pass the authentication of the Zookeeper, I am sure the keytab file is OK. >> But it jsut can not pass the authentication of HBase. The exception is >> listed below and could you or anybody else help me ? Still many many thanks! >> >> ****************************Exception*************************** >> 15/05/21 16:03:18 INFO zookeeper.ZooKeeper: Initiating client connection, >> connectString=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181 >> sessionTimeout=90000 watcher=hconnection-0x4e142a710x0, >> quorum=bgdt02.dev.hrb:2181,bgdt01.dev.hrb:2181,bgdt03.dev.hrb:2181, >> baseZNode=/hbase >> 15/05/21 16:03:18 INFO zookeeper.Login: successfully logged in. >> 15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh thread started. >> 15/05/21 16:03:18 INFO client.ZooKeeperSaslClient: Client will use GSSAPI >> as SASL mechanism. >> 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Opening socket connection to >> server bgdt02.dev.hrb/130.1.9.98:2181. Will attempt to SASL-authenticate >> using Login Context section 'Client' >> 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Socket connection >> established to bgdt02.dev.hrb/130.1.9.98:2181, initiating session >> 15/05/21 16:03:18 INFO zookeeper.Login: TGT valid starting at: Thu >> May 21 16:03:18 CST 2015 >> 15/05/21 16:03:18 INFO zookeeper.Login: TGT expires: Fri >> May 22 16:03:18 CST 2015 >> 15/05/21 16:03:18 INFO zookeeper.Login: TGT refresh sleeping until: Fri >> May 22 11:43:32 CST 2015 >> 15/05/21 16:03:18 INFO zookeeper.ClientCnxn: Session establishment >> complete on server bgdt02.dev.hrb/130.1.9.98:2181, sessionid = >> 0x24d46cb0ffd0020, negotiated timeout = 40000 >> 15/05/21 16:03:18 WARN mapreduce.TableInputFormatBase: initializeTable >> called multiple times. Overwriting connection and table reference; >> TableInputFormatBase will not close these old references when done. >> 15/05/21 16:03:19 INFO util.RegionSizeCalculator: Calculating region >> sizes for table "ns_dev1:hd01". >> 15/05/21 16:03:19 WARN ipc.AbstractRpcClient: Exception encountered while >> connecting to the server : javax.security.sasl.SaslException: GSS initiate >> failed [Caused by GSSException: No valid credentials provided (Mechanism >> level: Failed to find any Kerberos tgt)] >> 15/05/21 16:03:19 ERROR ipc.AbstractRpcClient: SASL authentication >> failed. The most likely cause is missing or invalid credentials. Consider >> 'kinit'. >> javax.security.sasl.SaslException: GSS initiate failed [Caused by >> GSSException: No valid credentials provided (Mechanism level: Failed to >> find any Kerberos tgt)] >> at >> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) >> at >> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:604) >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:153) >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:730) >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:727) >> at java.security.AccessController.doPrivileged(Native >> Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:727) >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:880) >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:849) >> at >> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1173) >> at >> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216) >> at >> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:31751) >> at >> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:332) >> at >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:187) >> at >> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:294) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:275) >> at >> java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> ***********************I aslo list my codes as below if someone can give >> me some advice from it************************* >> object TestHBaseRead { >> def main(args: Array[String]) { >> val conf = new SparkConf() >> val sc = new SparkContext(conf) >> val hbConf = HBaseConfiguration.create(sc.hadoopConfiguration) >> val tbName = if(args.length==1) args(0) else "ns_dev1:hd01" >> hbConf.set(TableInputFormat.INPUT_TABLE,tbName) >> //I print the content of hbConf to check if it read the correct >> hbase-site.xml >> val it = hbConf.iterator() >> while(it.hasNext) { >> val e = it.next() >> println("Key="+ e.getKey +" Value="+e.getValue) >> } >> >> val rdd = >> sc.newAPIHadoopRDD(hbConf,classOf[TableInputFormat],classOf[ImmutableBytesWritable],classOf[Result]) >> rdd.foreach(x=>{ >> val key = x._1.toString >> val it = x._2.listCells().iterator() >> while(it.hasNext) { >> val c = it.next() >> val family = Bytes.toString(CellUtil.cloneFamily(c)) >> val qualifier = Bytes.toString(CellUtil.cloneQualifier(c)) >> val value = Bytes.toString(CellUtil.cloneValue(c)) >> val tm = c.getTimestamp >> println("Key="+key+" Family="+family+" Qualifier="+qualifier+" >> Value="+value+" TimeStamp="+tm) >> } >> }) >> sc.stop() >> } >> } >> >> ***************************I used the following command to run my >> program********************** >> spark-submit --class dhao.test.read.singleTable.TestHBaseRead --master >> yarn-cluster --driver-java-options >> "-Djava.security.auth.login.config=/home/spark/spark-hbase.jaas >> -Djava.security.krb5.conf=/etc/krb5.conf" --conf >> spark.executor.extraJavaOptions="-Djava.security.auth.login.config=/home/spark/spark-hbase.jaas >> -Djava.security.krb5.conf=/etc/krb5.conf" /home/spark/myApps/TestHBase.jar >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "Bill Q";<bill.q....@gmail.com >> <javascript:_e(%7B%7D,'cvml','bill.q....@gmail.com');>>; >> *发送时间:* 2015年5月20日(星期三) 晚上10:13 >> *收件人:* "donhoff_h"<165612...@qq.com >> <javascript:_e(%7B%7D,'cvml','165612...@qq.com');>>; >> *抄送:* "yuzhihong"<yuzhih...@gmail.com >> <javascript:_e(%7B%7D,'cvml','yuzhih...@gmail.com');>>; "user"< >> user@spark.apache.org >> <javascript:_e(%7B%7D,'cvml','user@spark.apache.org');>>; >> *主题:* Re: How to use spark to access HBase with Security enabled >> >> I have similar problem that I cannot pass the HBase configuration file as >> extra classpath to Spark any more using >> spark.executor.extraClassPath=MY_HBASE_CONF_DIR in the Spark 1.3. We used >> to run this in 1.2 without any problem. >> >> On Tuesday, May 19, 2015, donhoff_h <165612...@qq.com >> <javascript:_e(%7B%7D,'cvml','165612...@qq.com');>> wrote: >> >>> >>> Sorry, this ref does not help me. I have set up the configuration in >>> hbase-site.xml. But it seems there are still some extra configurations to >>> be set or APIs to be called to make my spark program be able to pass the >>> authentication with the HBase. >>> >>> Does anybody know how to set authentication to a secured HBase in a >>> spark program which use the API "newAPIHadoopRDD" to get information from >>> HBase? >>> >>> Many Thanks! >>> >>> ------------------ 原始邮件 ------------------ >>> *发件人:* "yuzhihong";<yuzhih...@gmail.com>; >>> *发送时间:* 2015年5月19日(星期二) 晚上9:54 >>> *收件人:* "donhoff_h"<165612...@qq.com>; >>> *抄送:* "user"<user@spark.apache.org>; >>> *主题:* Re: How to use spark to access HBase with Security enabled >>> >>> Please take a look at: >>> >>> http://hbase.apache.org/book.html#_client_side_configuration_for_secure_operation >>> >>> Cheers >>> >>> On Tue, May 19, 2015 at 5:23 AM, donhoff_h <165612...@qq.com> wrote: >>> >>>> >>>> The principal is sp...@bgdt.dev.hrb. It is the user that I used to run >>>> my spark programs. I am sure I have run the kinit command to make it take >>>> effect. And I also used the HBase Shell to verify that this user has the >>>> right to scan and put the tables in HBase. >>>> >>>> Now I still have no idea how to solve this problem. Can anybody help me >>>> to figure it out? Many Thanks! >>>> >>>> ------------------ 原始邮件 ------------------ >>>> *发件人:* "yuzhihong";<yuzhih...@gmail.com>; >>>> *发送时间:* 2015年5月19日(星期二) 晚上7:55 >>>> *收件人:* "donhoff_h"<165612...@qq.com>; >>>> *抄送:* "user"<user@spark.apache.org>; >>>> *主题:* Re: How to use spark to access HBase with Security enabled >>>> >>>> Which user did you run your program as ? >>>> >>>> Have you granted proper permission on hbase side ? >>>> >>>> You should also check master log to see if there was some clue. >>>> >>>> Cheers >>>> >>>> >>>> >>>> On May 19, 2015, at 2:41 AM, donhoff_h <165612...@qq.com> wrote: >>>> >>>> Hi, experts. >>>> >>>> I ran the "HBaseTest" program which is an example from the Apache Spark >>>> source code to learn how to use spark to access HBase. But I met the >>>> following exception: >>>> Exception in thread "main" >>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after >>>> attempts=36, exceptions: >>>> Tue May 19 16:59:11 CST 2015, null, java.net.SocketTimeoutException: >>>> callTimeout=60000, callDuration=68648: row 'spark_t01,,00000000000000' on >>>> table 'hbase:meta' at region=hbase:meta,,1.1588230740, >>>> hostname=bgdt01.dev.hrb,16020,1431412877700, seqNum=0 >>>> >>>> I also checked the RegionServer Log of the host "bgdt01.dev.hrb" listed >>>> in the above exception. I found a few entries like the following one: >>>> 2015-05-19 16:59:11,143 DEBUG >>>> [RpcServer.reader=2,bindAddress=bgdt01.dev.hrb,port=16020] ipc.RpcServer: >>>> RpcServer.listener,port=16020: Caught exception while >>>> reading:Authentication is required >>>> >>>> The above entry did not point to my program clearly. But the time is >>>> very near. Since my hbase version is HBase1.0.0 and I set security enabled, >>>> I doubt the exception was caused by the Kerberos authentication. But I am >>>> not sure. >>>> >>>> Do anybody know if my guess is right? And if I am right, could anybody >>>> tell me how to set Kerberos Authentication in a spark program? I don't know >>>> how to do it. I already checked the API doc , but did not found any API >>>> useful. Many Thanks! >>>> >>>> By the way, my spark version is 1.3.0. I also paste the code of >>>> "HBaseTest" in the following: >>>> ***************************Source Code****************************** >>>> object HBaseTest { >>>> def main(args: Array[String]) { >>>> val sparkConf = new SparkConf().setAppName("HBaseTest") >>>> val sc = new SparkContext(sparkConf) >>>> val conf = HBaseConfiguration.create() >>>> conf.set(TableInputFormat.INPUT_TABLE, args(0)) >>>> >>>> // Initialize hBase table if necessary >>>> val admin = new HBaseAdmin(conf) >>>> if (!admin.isTableAvailable(args(0))) { >>>> val tableDesc = new HTableDescriptor(args(0)) >>>> admin.createTable(tableDesc) >>>> } >>>> >>>> val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], >>>> classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], >>>> classOf[org.apache.hadoop.hbase.client.Result]) >>>> >>>> hBaseRDD.count() >>>> >>>> sc.stop() >>>> } >>>> } >>>> >>>> >>> >> >> -- >> Many thanks. >> >> >> Bill >> >> > -- Many thanks. Bill