Hello,all: 【My environment versions are :Hadoop 2.6.0 、hive 1.2.1、tez 0.7.0】 Our term develop a plug-in in hive, its function is similiar to hive-hbase-handler. Now I executed a HQL “select count(*) from h_im;”(h_im is an external table, hbase table) in hive CLI, it throw exceptions: (I am sorry, I can not copy the error information here, because we use inner network,so some information will be omitted) —————---------------------------------------- INFO [Dispatcher thread: Central] history.HistoryEventHandler: .... ..... at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:172) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) ...... ....... Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable org.apache.hadoop.hive.hbase.... at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:367) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:149) .... 14 more Caused by : java.lang.NullPointerException at com.fiberhome.nebula.datacenter.hbasehandler.NBHBaseSerde.deserialize(NBHBaseSerde.java:210) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:145) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$2(MapOperator.java:143) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:512) ..... 18 more ------------------------------------------------------------------------------------- I know that, it is custome storageHandler about hive, but, now, my questions are about how the two(tez&hive) to work together: There, NBHBaseSerde is a custom SerDe: NBHBaseSerde extends ColumarSerDeBase implements Configurable{ @Override initialize() { ......} deserialize() { ......} .... } In order to debug and solve the error above, I printed some logs in related classes(local mode executed right, cluster mode is difficult to debug), but there is no log message printed in yarn 8088 container logs: (1)as showed above,the exceptions said,the nullpointer occured in “NBHBaseSerde.deserialize(NBHBaseSerde.java:210)”,and line 210 is : ------------------------------------------------------------------------------------------------------------- line 210: this.pair.setValue(zoneid); - ----------------------------------------------------------------------------------------------------------- I guess mybe "pair" is Null; so I printed one log before line 210( line 210 is not the first line in deserialize()): --------------------------------------------------------------------------------------------------------------- LOG.info("deserialize begine ....."); //this log message is in he first line of deserialize() LOG.info("....pair.toString....." + pair.toString());// this log message is just before "this.pair.setValue(zoneid)" ---------------------------------------------------------------------------------------------------------------- While after I changed NBHBaseSerde.class of the JAR file, some strange things happened that I still do not understand: ①there is no log message in hive log and yarn container log(port:8088) , no "deserialize begine .....",no "....pair.toString.....". ②the exception said " Caused by : java.lang.NullPointerException at com.fiberhome.nebula.datacenter.hbasehandler.NBHBaseSerde.deserialize(NBHBaseSerde.java:211) ", that is to say “LOG.info("....pair.toString....." + pair.toString());”is the error line. I was confused... they should be executed.But where were the log messages?
(2) the parameter "pair" was assigned a value in NBHBaseSerde.initialize(). There was a hint LOG message "Serde initializeation begine.." in the first line of NBHBaseSerde.initialize(), and I can only find one message of "Serde initializeation begine.." in hive log. So I guess NBHBaseSerde.initialize() was executed just one time during the entire process of HQL execution. It's said that,the log message can prove that this piece of code( NBHBaseSerde.initialize()) just executed only one time in the hive client, it was not called after job submitted.---------Am I right? There are some other parameters like "pair" which were setted values in NBHBaseSerde.initialize() lost thrie values after DAG job submitted to the cluster. So I use set() to save these values in NBHiveHBaseUtils.java, the method was resetting these parameters values in MapRecordProcessor.init(). Like this: ------------------------------------------------------------------------- legacyMRInput = getMRInput(inputs); //this is source code ...... NBHiveHBaseUtils.setPair(pair);//I added ....... ..... --------------------------------------------------------------------------- It was failed. Because I found that ,when I set "hive.compute.splits.in.am=true", the logical was different to triditional mr's, it seems MapRecordProcessor.init() was not executed(because log message in MapRecordProcessor.init() were not printed). But from the exception message, I also found this "org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:149)", In my hive source code: --------------------------------- line 147: MRTaskReporter mrReporter = new MRTaskReporter(getContext)); line 148: rpoc.init(mrReporter, input, outputs); line 149: rpoc.run(); ---------------------------------------- There rpoc is MapRecordProcessor. It means MapRecordProcessor.init() was executed. But why I couldn't find any log printed in it? I also add a LOG message before line 149, it wasn't printed in hive log or container log. why? I can not understand. (3)As the title says, I really can not understand what's tez's logic in processing hiveQL when need serialization and deserialization. I also study hive and tez source code, I know tez's split mechanism can connect custom storageHandler by HiveInputFormat. I think mybe I should to add NBHBaseSerde.initialize() in somewhere to call this logic again, but I have not found appropriate places. I am eager to get your guidance. I would very much appreciate your help. Any reply will be appreciated. Thankyou & Best Regards. ---LLBian