Hello all,

I have an use case where I need to write 1 million to 10 million records
periodically (with intervals of 1 minutes to 10 minutes), into an HBase
table.

Once the insert is completed, these records are queried immediately from
another program - multiple reads.

So, this is one massive write followed by many reads.

I have two approaches to insert these records into the HBase table -

Use HTable or HTableMultiplexer to stream the data to HBase table.

or

Write the data to HDFS store as a sequence file (avro in my case) - run map
reduce job using HFileOutputFormat and then load the output files into
HBase cluster.
Something like,

  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
  loader.doBulkLoad(new Path(outputDir), hTable);


In my use case which approach would be better?

If I use HTable interface, would the inserted data be in the HBase cache,
before flushing to the files, for immediate read queries?

If I use map reduce job to insert, would the data be loaded into the HBase
cache immediately? or only the output files would be copied to respective
hbase table specific directories?

So, which approach is better for write and then immediate multiple read
operations?

Thanks,
Gautam

Reply via email to