Re: Problem with Hive HBase Integration - Running Mapper task

Edward Capriolo Wed, 16 Mar 2011 10:01:02 -0700

On Wed, Mar 16, 2011 at 12:51 PM, Abhijit Sharma
<abhijit.sha...@gmail.com> wrote:
> Hi,
> I am trying to connect the hive shell running on my laptop to a remote
> hadoop / hbase cluster and test out the HBase/Hive integration. I manage to
> connect and create the table in hbase from remote Hive shell. I am also
> passing the auxpath parameter to the shell (specifying the Hive/HBase
> integration related jars). In addition I have copied over these files to
> HDFS as well (I am using the user name hadoop - so the jars are stored in
> HDFS under /user/hadoop).
> However when  I fire a query on the HBase table - select * from h1 where
> key=12; - the map reduce job launches but the map task fails with the
> following error:
> ----
>
> java.io.IOException: Cannot create an instance of InputSplit class =
> org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
>       at
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:143)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:333)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> ----
> This basically indicates that the Mapper task is unable to locate the
> Hive/HBase storage handler that it requires when running. This happens even
> though this has been specified in the auxpath and uploaded to HDFS.
> Any ideas/pointers/debug options on what I might be doing wrong? Any help is
> much appreciated.
> p.s. the exploded jars do get copied too under the taskTracker directory on
> the cluster node
> Thanks


I have seen this error. This is oddness between hadoop,hive, and
map/reduce classpaths.

This is what I do
mkdir hive_home/auxlib
cp all hive and hbase jars here.
Also copy the hbase handler jar to auxlib.

Auxlib get pushed out by the distributed cache each job and you do not
need to use ADD_JAR XXXX;

But that is not enough! DOH! Planning the job and getting the splits
happen before the map tasks are launched.

For this i drop all the hbase libs in hadoop_home/lib  only on the
machine that is launching the job.

You can fiddle around with HADOOP_CLASSPATH and achieve similar results.

Good luck.

Re: Problem with Hive HBase Integration - Running Mapper task

Reply via email to