Update:
When I remove *client_list Array<string>*in both tables, it works fine.
So, the problem is how to join a shark table and a hbase table with
Array, Struct or Map ?
Any workaround here ?
Thank you.
Hao
Le 07/08/2013 10:09, Hao Ren a écrit :
Hi,
I have integrated hbase with Hive.
When joining a shark table with a hbase table.
It throws an exception:
java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be
cast to [Ljava.lang.Object;
at
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:98)
at
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:434)
at
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381)
at
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
at
shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:73)
at
shark.execution.FileSinkOperator$$anonfun$processPartition$1.apply(FileSinkOperator.scala:72)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
at
shark.execution.FileSinkOperator.processPartition(FileSinkOperator.scala:72)
at
shark.execution.FileSinkOperator$.writeFiles$1(FileSinkOperator.scala:133)
at
shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:138)
at
shark.execution.FileSinkOperator$$anonfun$executeProcessFileSinkPartition$1.apply(FileSinkOperator.scala:138)
at spark.scheduler.ResultTask.run(ResultTask.scala:77)
at spark.executor.Executor$TaskRunner.run(Executor.scala:98)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Here is my hbase_table:
CREATE TABLE hbase_dict (
idvisite string,
client_list Array<string>,
nb_client int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,clients:id_list,clients:nb")
TBLPROPERTIES(
"hbase.table.name" = "cookie_clients_dict",
"hbase.table.default.storage.type" = "binary")
;
It seems a SerDe problem. I have tried binary and string storage type.
They dont work.
The join query as below
SELECT * FROM hive_dict n join hbase_dict o on (o.idvisite =
n.idvisite);
where hive_dict is a native hive table.
I am new to hive and hbase. Googled a lot, but nothing found.
Any thought is highly appreciated.
Thank you in advance.
Hao.
--
Hao Ren
ClaraVista
www.claravista.fr