And I'm using Apache distribution of Spark not Cloudera. On Wed, Aug 10, 2016 at 12:06 PM, Aneela Saleem <[email protected]> wrote:
> Thanks Nkechi, > > I added this dependency as an external jar, when i compile the code, > unfortunately i got the following error: > > error: object cloudera is not a member of package com > [ERROR] import com.cloudera.spark.hbase.HBaseContext > > > > On Tue, Aug 9, 2016 at 7:51 PM, Nkechi Achara <[email protected]> > wrote: > >> hi, >> >> Due to the fact we are not on Hbase 2.00 we are using SparkOnHbase. >> >> Dependency: >> <dependency> >> <groupId>com.cloudera</groupId> >> <artifactId>spark-hbase</artifactId> >> <version>0.0.2-clabs</version> >> </dependency> >> >> It is quite a small snippet of code. For a general scan using a start and >> stop time as the scan time range. >> >> val conf = new SparkConf(). >> set("spark.shuffle.consolidateFiles", "true"). >> set("spark.kryo.registrationRequired", "false"). >> set("spark.serializer", "org.apache.spark.serializer.K >> ryoSerializer"). >> set("spark.kryoserializer.buffer", "30m"). >> set("spark.shuffle.spill", "true"). >> set("spark.shuffle.memoryFraction", "0.4") >> >> val sc = new SparkContext(conf) >> >> val scan = new Scan() >> scan.addColumn(columnName, "column1") >> scan.setTimeRange(scanRowStartTs, scanRowStopTs) >> hc.hbaseRDD(inputTableName,scan,filter) >> >> To run just use the following: >> >> spark-submit --class ClassName --master yarn-client --driver-memory >> 2000M --executor-memory 5G --keytab <location of keytab> --principal >> <location of principal> >> >> That should work in a general way. Obviously you can utilise other scan / >> put / gets etc methods. >> >> Thanks, >> >> Nkechi >> >> On 9 August 2016 at 15:20, Aneela Saleem <[email protected]> wrote: >> >> > Thanks Nkechi, >> > >> > Can you please direct me to some code snippet with hbase on spark >> module? >> > I've been trying that for last few days but did not found a workaround. >> > >> > >> > >> > On Tue, Aug 9, 2016 at 6:13 PM, Nkechi Achara <[email protected]> >> > wrote: >> > >> > > Hey, >> > > >> > > Have you tried hbase on spark module, or the spark-hbase module to >> > connect? >> > > The principal and keytab options should work out of the box for >> > kerberized >> > > access. I can attempt your code if you don't have the ability to use >> > those >> > > modules. >> > > >> > > Thanks >> > > K >> > > >> > > On 9 Aug 2016 2:25 p.m., "Aneela Saleem" <[email protected]> >> wrote: >> > > >> > > > Hi all, >> > > > >> > > > I'm trying to connect to Hbase with security enabled using spark >> job. I >> > > > have kinit'd from command line. When i run the following job i.e., >> > > > >> > > > /usr/local/spark-2/bin/spark-submit --keytab >> > > /etc/hadoop/conf/spark.keytab >> > > > --principal spark/hadoop-master@platalyticsrealm --class >> > > > com.platalytics.example.spark.App --master yarn >> --driver-class-path >> > > > /root/hbase-1.2.2/conf /home/vm6/project-1-jar-with-d >> ependencies.jar >> > > > >> > > > I get the error: >> > > > >> > > > 2016-08-07 20:43:57,617 WARN >> > > > [hconnection-0x24b5fa45-metaLookup-shared--pool2-t1] >> > > > ipc.RpcClientImpl: Exception encountered while connecting to the >> > server : >> > > > javax.security.sasl.SaslException: GSS initiate failed [Caused by >> > > > GSSException: No valid credentials provided (Mechanism level: >> Failed to >> > > > find any Kerberos tgt)] 2016-08-07 20:43:57,619 ERROR >> > > > [hconnection-0x24b5fa45-metaLookup-shared--pool2-t1] >> > ipc.RpcClientImpl: >> > > > SASL authentication failed. The most likely cause is missing or >> invalid >> > > > credentials. Consider 'kinit'. javax.security.sasl.SaslException: >> GSS >> > > > initiate failed [Caused by GSSException: No valid credentials >> provided >> > > > (Mechanism level: Failed to find any Kerberos tgt)] at >> > > > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge( >> > > > GssKrb5Client.java:212) >> > > > at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect( >> > > > HBaseSaslRpcClient.java:179) >> > > > at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection. >> > > > setupSaslConnection(RpcClientImpl.java:617) >> > > > at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection. >> > > > access$700(RpcClientImpl.java:162) at org.apache.hadoop.hbase.ipc. >> > > > RpcClientImpl$Connection$2.run(RpcClientImpl.java:743) >> > > > >> > > > Following is my code: >> > > > >> > > > System.setProperty("java.security.krb5.conf", "/etc/krb5.conf"); >> > > > System.setProperty("java.security.auth.login.config", >> > > > "/etc/hbase/conf/zk-jaas.conf"); >> > > > >> > > > val hconf = HBaseConfiguration.create() >> > > > val tableName = "emp" >> > > > hconf.set("hbase.zookeeper.quorum", "hadoop-master") >> > > > hconf.set(TableInputFormat.INPUT_TABLE, tableName) >> > > > hconf.set("hbase.zookeeper.property.clientPort", "2181") >> > > > hconf.set("hadoop.security.authentication", "kerberos") >> > > > hconf.set("hbase.security.authentication", "kerberos") >> > > > hconf.addResource(new Path("/etc/hbase/conf/core-site.xml")) >> > > > hconf.addResource(new Path("/etc/hbase/conf/hbase-site.xml")) >> > > > UserGroupInformation.setConfiguration(hconf) >> > > > val keyTab = "/etc/hadoop/conf/spark.keytab" >> > > > val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUG >> > > > I("spark/hadoop-master@platalyticsrealm", keyTab) >> > > > UserGroupInformation.setLoginUser(ugi) >> > > > ugi.doAs(new PrivilegedExceptionAction[Void]() { >> > > > override def run(): Void = { >> > > > val conf = new SparkConf >> > > > val sc = new SparkContext(conf) >> > > > sc.addFile(keyTab) >> > > > var hBaseRDD = sc.newAPIHadoopRDD(hconf, >> classOf[TableInputFormat], >> > > > classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], >> > > > classOf[org.apache.hadoop.hbase.client.Result]) >> > > > println("Number of Records found : " + hBaseRDD.count()) >> > > > hBaseRDD.foreach(x => { >> > > > println(new String(x._2.getRow())) >> > > > }) >> > > > sc.stop() >> > > > return null >> > > > } >> > > > }) >> > > > >> > > > Please have a look. And help me try finding the issue. >> > > > >> > > > Thanks >> > > > >> > > >> > >> > >
