best approach for write and immediate read use case

2013-08-23 Thread Gautam Borah
Hello all, I have an use case where I need to write 1 million to 10 million records periodically (with intervals of 1 minutes to 10 minutes), into an HBase table. Once the insert is completed, these records are queried immediately from another program - multiple reads. So, this is one massive

Re: best approach for write and immediate read use case

2013-08-23 Thread Ted Yu
Can you tell us the average size of your records and how much heap is given to the region servers ? Thanks On Aug 23, 2013, at 12:11 AM, Gautam Borah gautam.bo...@gmail.com wrote: Hello all, I have an use case where I need to write 1 million to 10 million records periodically (with

Re: best approach for write and immediate read use case

2013-08-23 Thread Gautam Borah
Hi, Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value, table has one column family. I have setup a cluster for testing - 1 master and 3 region servers. Each have a heap size of 3 GB, single cpu. I have pre-split the table into 30 regions. I do not have to keep data

Re: best approach for write and immediate read use case

2013-08-23 Thread Ted Yu
Assuming you are using 0.94, the default value for hbase.regionserver.global.memstore.lowerLimit is 0.35 Meaning, memstore on each region server would be able to hold 3000M * 0.35 / 60 = 17.5 mil records (roughly). bq. If I use HTable interface, would the inserted data be in the HBase cache,

Re: best approach for write and immediate read use case

2013-08-23 Thread Gautam Borah
Thanks Ted for your response, and clarifying the behavior for using HTable interface. What would be the behavior for inserting data using map reduce job? would the recently added records be in the memstore? or I need to load them for read queries after the insert is done? Thanks, Gautam On

Re: best approach for write and immediate read use case

2013-08-23 Thread Anoop John
What would be the behavior for inserting data using map reduce job? would the recently added records be in the memstore? or I need to load them for read queries after the insert is done? Using MR you have 2 options for insertion. One will create the HFiles directly as o/p (Using