Btw, there are some examples in the Spark GitHub repo that you may find helpful. Here's one <https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala> related to HBase.
On Tue, Sep 16, 2014 at 1:22 PM, <abraham.ja...@thomsonreuters.com> wrote: > *Hi, * > > > > *I had a similar situation in which I needed to read data from HBase and > work with the data inside of a spark context. After much goooogling, I > finally got mine to work. There are a bunch of steps that you need to do > get this working – * > > > > *The problem is that the spark context does not know anything about hbase, > so you have to provide all the information about hbase classes to both the > driver code and executor code…* > > > > > > SparkConf sconf = *new* SparkConf().setAppName(“App").setMaster("local"); > > JavaSparkContext sc = *new* JavaSparkContext(sconf); > > > > sparkConf.set("spark.executor.extraClassPath", "$(hbase classpath)"); > //ç===== > you will need to add this to tell the executor about the classpath for > HBase. > > > > Configuration conf = HBaseConfiguration.*create*(); > > conf.set(*TableInputFormat*.INPUT_TABLE, "Article"); > > > > JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sc. > *newAPIHadoopRDD*(conf, *TableInputFormat*.*class* > ,org.apache.hadoop.hbase.io.ImmutableBytesWritable.*class*, > > org.apache.hadoop.hbase.client.Result.*class*); > > > > > > *The when you submit the spark job – * > > > > > > *spark-submit --driver-class-path $(hbase classpath) --jars > /usr/lib/hbase/hbase-server.jar,/usr/lib/hbase/hbase-client.jar,/usr/lib/hbase/hbase-common.jar,/usr/lib/hbase/hbase-protocol.jar,/usr/lib/hbase/lib/protobuf-java-2.5.0.jar,/usr/lib/hbase/lib/htrace-core.jar > --class YourClassName --master local App.jar * > > > > > > Try this and see if it works for you. > > > > > > *From:* Y. Dong [mailto:tq00...@gmail.com] > *Sent:* Tuesday, September 16, 2014 8:18 AM > *To:* user@spark.apache.org > *Subject:* HBase and non-existent TableInputFormat > > > > Hello, > > > > I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simply > read from hbase. The Java code is attached. However the problem is > TableInputFormat does not even exist in hbase-client API, is there any > other way I can read from > > hbase? Thanks > > > > SparkConf sconf = *new* SparkConf().setAppName(“App").setMaster("local"); > > JavaSparkContext sc = *new* JavaSparkContext(sconf); > > > > Configuration conf = HBaseConfiguration.*create*(); > > conf.set(*TableInputFormat*.INPUT_TABLE, "Article"); > > > > JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sc. > *newAPIHadoopRDD*(conf, *TableInputFormat*.*class* > ,org.apache.hadoop.hbase.io.ImmutableBytesWritable.*class*, > > org.apache.hadoop.hbase.client.Result.*class*); > > > > > > >