Hi, I am writing an Hadoop application that uses HBase as both source and sink.
There is no reducer job in my application. I am using TableOutputFormat as the OutputFormatClass. I read it on the Internet that it is experimentally faster to directly instantiate HTable and use HTable.batch() in the Map than to use TableOutputFormat as the Map's OutputClass So I looked into the source code, org.apache.hadoop.hbase.mapreduce.TableOutputFormat. It looked like TableRecordWriter does not support batch updates, since TableRecordWriter.write() called HTable.put(new Put()). Am I right on this matter? Or does TableOutputFormat automatically do batch updates somehow? Or is there a specific way to do batch updates with TableOutputFormat? Any explanation is greatly appreciated. Ed
