Hello,all:
     
      【My environment versions are :Hadoop 2.6.0 、hive 1.2.1、tez 0.7.0】
      Our term develop a plug-in in hive, its function is similiar to 
hive-hbase-handler. 
      Now I executed a HQL “select count(*) from h_im;”(h_im is an external 
table, hbase table) in hive CLI, it throw exceptions:
    (I am sorry, I can not copy the error information here, because we use 
inner network,so some information will be omitted)
      —————----------------------------------------
      INFO [Dispatcher thread: Central] history.HistoryEventHandler: .... .....
                  at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:172)
                  at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
                  at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
                  ......  .......
      Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing writable org.apache.hadoop.hive.hbase....
                   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
                   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
                   at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:367)
                   at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:149)
                   .... 14 more
      Caused by : java.lang.NullPointerException
                   at 
com.fiberhome.nebula.datacenter.hbasehandler.NBHBaseSerde.deserialize(NBHBaseSerde.java:210)
                   at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:145)
                   at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$2(MapOperator.java:143)
                   at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:512)
                   ..... 18 more
-------------------------------------------------------------------------------------
         I know that, it is custome storageHandler about hive, but, now, my 
questions are about how the two(tez&hive) to work together:
       There, NBHBaseSerde is a custom SerDe:
                   NBHBaseSerde  extends ColumarSerDeBase implements 
Configurable{
                            @Override
                            initialize() { ......}
                            
                            deserialize() { ......}                         
                           ....
                   }
          In order to debug and solve the error above, I printed some logs in 
related classes(local mode executed right, cluster mode is difficult to debug), 
but there is no log message printed in yarn 8088 container logs:
(1)as showed above,the exceptions said,the nullpointer occured in 
“NBHBaseSerde.deserialize(NBHBaseSerde.java:210)”,and line 210 is :
     
-------------------------------------------------------------------------------------------------------------
  line 210:  this.pair.setValue(zoneid);
   - 
-----------------------------------------------------------------------------------------------------------
     
     I guess mybe "pair" is Null; so I printed one log before line 210( line 
210 is not the first line in deserialize()):
   
---------------------------------------------------------------------------------------------------------------
LOG.info("deserialize begine ....."); //this log message is in he first line of 
deserialize() 
LOG.info("....pair.toString....." + pair.toString());// this log message is 
just before "this.pair.setValue(zoneid)"
----------------------------------------------------------------------------------------------------------------
       While after I changed NBHBaseSerde.class of the JAR file, some strange 
things happened that I still do not understand:
       ①there is no log message in hive log and yarn container log(port:8088) , 
no "deserialize begine .....",no "....pair.toString.....".
       ②the exception said " Caused by : java.lang.NullPointerException at 
com.fiberhome.nebula.datacenter.hbasehandler.NBHBaseSerde.deserialize(NBHBaseSerde.java:211)
 ", that is to say “LOG.info("....pair.toString....." + pair.toString());”is 
the error line. 
        I was confused... they should be executed.But where were the log 
messages? 

(2)   the parameter "pair" was assigned a value in NBHBaseSerde.initialize(). 
             There was a hint LOG message "Serde initializeation begine.." in 
the first line of NBHBaseSerde.initialize(), and  I can only find one message 
of "Serde initializeation  begine.." in hive log. So I guess 
NBHBaseSerde.initialize() was executed just one time during the entire process 
of HQL execution. 
            It's said that,the log message can prove that this piece of code( 
NBHBaseSerde.initialize()) just executed only one time in the hive client, it 
was not called after job submitted.---------Am I right?

        There are some other parameters like "pair" which were setted  values 
in NBHBaseSerde.initialize() lost thrie values after DAG job submitted to the 
cluster. So I use set() to save these values in NBHiveHBaseUtils.java, the 
method was resetting these parameters values in MapRecordProcessor.init(). Like 
this:
-------------------------------------------------------------------------
       legacyMRInput = getMRInput(inputs); //this is source code
       ......
       NBHiveHBaseUtils.setPair(pair);//I added
        ....... .....
---------------------------------------------------------------------------
        It was failed.  Because I found that ,when I set 
"hive.compute.splits.in.am=true", the logical was different to triditional 
mr's, it seems MapRecordProcessor.init() was not executed(because log message 
in MapRecordProcessor.init() were not printed).
         But from the exception message, I  also found this 
"org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:149)",
 In my hive source code:
---------------------------------
line 147: MRTaskReporter mrReporter = new MRTaskReporter(getContext));
line 148: rpoc.init(mrReporter, input, outputs);
line 149: rpoc.run();
----------------------------------------
     There rpoc is MapRecordProcessor. It means MapRecordProcessor.init() was 
executed. But why I couldn't find any log printed in it?
      I also add a LOG message before line 149, it wasn't printed in hive log 
or container log. why? I can not understand.

(3)As the title says, I really can not understand what's tez's logic in 
processing hiveQL when need serialization and deserialization. I also study 
hive and tez source code,  I know tez's split mechanism can connect  custom 
storageHandler by HiveInputFormat.  I think mybe I should to add 
NBHBaseSerde.initialize() in somewhere to call this logic again, but I  have 
not found appropriate places. 

   I am eager to get your guidance. I would very much appreciate your help.
   Any reply will be appreciated.

Thankyou & Best Regards.

---LLBian



     
      

Reply via email to