Hello all, I have an use case where I need to write 1 million to 10 million records periodically (with intervals of 1 minutes to 10 minutes), into an HBase table.
Once the insert is completed, these records are queried immediately from another program - multiple reads. So, this is one massive write followed by many reads. I have two approaches to insert these records into the HBase table - Use HTable or HTableMultiplexer to stream the data to HBase table. or Write the data to HDFS store as a sequence file (avro in my case) - run map reduce job using HFileOutputFormat and then load the output files into HBase cluster. Something like, LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf); loader.doBulkLoad(new Path(outputDir), hTable); In my use case which approach would be better? If I use HTable interface, would the inserted data be in the HBase cache, before flushing to the files, for immediate read queries? If I use map reduce job to insert, would the data be loaded into the HBase cache immediately? or only the output files would be copied to respective hbase table specific directories? So, which approach is better for write and then immediate multiple read operations? Thanks, Gautam