TableOutputFormat also does this...
table.setAutoFlush(false);
Check out the HBase book for how the writebuffer works with the HBase client.
http://hbase.apache.org/book.html#client
-----Original Message-----
From: edward choi [mailto:[email protected]]
Sent: Tuesday, June 21, 2011 10:23 PM
To: [email protected]; [email protected]
Subject: TableOutputFormat not efficient than direct HBase API calls?
Hi,
I am writing an Hadoop application that uses HBase as both source and sink.
There is no reducer job in my application.
I am using TableOutputFormat as the OutputFormatClass.
I read it on the Internet that it is experimentally faster to directly
instantiate HTable and use HTable.batch() in the Map than to use
TableOutputFormat as the Map's OutputClass
So I looked into the source code,
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.
It looked like TableRecordWriter does not support batch updates, since
TableRecordWriter.write() called HTable.put(new Put()).
Am I right on this matter? Or does TableOutputFormat automatically do batch
updates somehow?
Or is there a specific way to do batch updates with TableOutputFormat?
Any explanation is greatly appreciated.
Ed