Dear All,

I am using HBASE 0.94.1 with Hadoop 0.23.1. I have written a multi-threaded 
thrift client to load the data into HBASE using BatchMutations. The size of 
each batch is 1000 rows and the table in HBASE is split into 10 regions. The 
rows are increasing incrementally(0...999999) with offsets applied for each of 
the threads(0..99999, 100000...199999, 200000...299999, ...), so in theory 
every thread is expected to write in different region. The individual regions 
are wide, i.e. every region is expected to store about 100000 rows, so this 
makes it a total of 1000000 rows across all the regions.

I am using thrift server/client and only 1 region server as per the default 
HBase setup.

So if I spawn 10 threads with offsets applied accordingly I was expecting the 
regions to be getting parallely filled up which does not seem to be the case. 
All the inserts pile into the the same region which make the writes inefficient 
due to frequent compacting cycles blocking all the threads. If the threads 
would have been writing to different regions, this problem could have been much 
smaller.

I am not sure if I am missing out on anything, any ideas would be very helpful.

Thanks and Regards
Pankaj Misra

________________________________

Impetus Ranked in the Top 50 India's Best Companies to Work For 2012.

Impetus webcast 'Designing a Test Automation Framework for Multi-vendor 
Interoperable Systems' available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Reply via email to