Since hbase-spark is in its own module, you can pull the whole hbase-spark
subtree into hbase 1.0 root dir and add the following to root pom.xml:

Then you would be able to build the module yourself.

hbase-spark module uses APIs which are compatible with hbase 1.0


On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <> wrote:

> Hi Ted,
> I see that you’re working on the hbase-spark module for hbase. I recently
> packaged the SparkOnHBase project and gave it a test run. It works like a
> charm on CDH 5.4 and 5.5. All I had to do was
> add /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the
> classpath.txt file in /etc/spark/conf. Then, I ran spark-shell with “—jars
> /path/to/spark-hbase-0.0.2-clabs.jar” as an argument and used the
> easy-to-use HBaseContext for HBase operations. Now, I want to use the
> latest in Dataframes. Since the new functionality is only in the
> hbase-spark module, I want to know how to get it and package it for CDH
> 5.5, which still uses HBase 1.0.0. Can you tell me what version of hbase
> master is still backwards compatible?
> By the way, we are using Spark 1.6 if it matters.
> Thanks,
> Ben
> On Feb 10, 2016, at 2:34 AM, Ted Yu <> wrote:
> Have you tried adding hbase client jars to spark.executor.extraClassPath ?
> Cheers
> On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph <
>> wrote:
>> + Spark-Dev
>> For a Spark job on YARN accessing hbase table, added all hbase client
>> jars into spark.yarn.dist.files, NodeManager when launching container i.e
>> executor, does localization and brings all hbase-client jars into executor
>> CWD, but still the executor tasks fail with ClassNotFoundException of hbase
>> client jars, when i checked launch , Classpath does not have
>> $PWD/* and hence all the hbase client jars are ignored.
>> Is spark.yarn.dist.files not for adding jars into the executor classpath.
>> Thanks,
>> Prabhu Joseph
>> On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <
>> > wrote:
>>> Hi All,
>>>  When i do count on a Hbase table from Spark Shell which runs as
>>> yarn-client mode, the job fails at count().
>>> MASTER=yarn-client ./spark-shell
>>> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor,
>>> TableName}
>>> import org.apache.hadoop.hbase.client.HBaseAdmin
>>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>>> val conf = HBaseConfiguration.create()
>>> conf.set(TableInputFormat.INPUT_TABLE,"spark")
>>> val hBaseRDD = sc.newAPIHadoopRDD(conf,
>>> classOf[TableInputFormat],classOf[],classOf[org.apache.hadoop.hbase.client.Result])
>>> hBaseRDD.count()
>>> Tasks throw below exception, the actual exception is swallowed, a bug
>>> JDK-7172206. After installing hbase client on all NodeManager machines, the
>>> Spark job ran fine. So I confirmed that the issue is with executor
>>> classpath.
>>> But i am searching for some other way of including hbase jars in spark
>>> executor classpath instead of installing hbase client on all NM machines.
>>> Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that
>>> it localized all hbase jars, still the job fails. Tried
>>> spark.executor.extraClasspath, still the job fails.
>>> Is there any way we can access hbase from Executor without installing
>>> hbase-client on all machines.
>>> 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID
>>> 0, prabhuFS1): *java.lang.IllegalStateException: unread block data*
>>>         at
>>>         at
>>>         at
>>>         at
>>>         at
>>>         at
>>>         at
>>>         at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
>>>         at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
>>>         at
>>> org.apache.spark.executor.Executor$
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$
>>>         at
>>> Thanks,
>>> Prabhu Joseph

Reply via email to