I looked at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in 0.94
In processBatchCallback(), starting line 1538, // step 1: break up into regionserver-sized chunks and build the data structs Map<HRegionLocation, MultiAction<R>> actionsByServer = new HashMap<HRegionLocation, MultiAction<R>>(); for (int i = 0; i < workingList.size(); i++) { So we do group individual action by server. FYI On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Doug made a good point. > > Take a look at the performance gain for parallel scan (bottom chart > compared to top chart): > https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png > > See > https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for > explanation of the two methods. > > Cheers > > On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil > <doug.m...@explorysmedical.com>wrote: > >> >> Hi there, regarding this... >> >> > We are passing random 10000 row-keys as input, while HBase is taking >> around >> > 17 secs to return 10000 records. >> >> >> …. Given that you are generating 10,000 random keys, your multi-get is >> very likely hitting all 5 nodes of your cluster. >> >> >> Historically, multi-Get used to first sort the requests by RS and then >> *serially* go the RS to process the multi-Get. I'm not sure of the >> current (0.94.x) behavior if it multi-threads or not. >> >> One thing you might want to consider is confirming that client behavior, >> and if it's not multi-threading then perform a test that does the same RS >> sorting via... >> >> >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# >> getRegionLocation%28byte[<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[> >> ]%29 >> >> …. and then spin up your own threads (one per target RS) and see what >> happens. >> >> >> >> On 4/15/13 9:04 AM, "Ankit Jain" <ankitjainc...@gmail.com> wrote: >> >> >Hi Liang, >> > >> >Thanks Liang for reply.. >> > >> >Ans1: >> >I tried by using HFile block size of 32 KB and bloom filter is enabled. >> >The >> >random read performance is 10000 records in 23 secs. >> > >> >Ans2: >> >We are retrieving all the 10000 rows in one call. >> > >> >Ans3: >> >Disk detai: >> >Model Number: ST2000DM001-1CH164 >> >Serial Number: Z1E276YF >> > >> >Please suggest some more optimization >> > >> >Thanks, >> >Ankit Jain >> > >> >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <xieli...@xiaomi.com> wrote: >> > >> >> First, it's probably helpless to set block size to 4KB, please refer to >> >> the beginning of HFile.java: >> >> >> >> Smaller blocks are good >> >> * for random access, but require more memory to hold the block index, >> >>and >> >> may >> >> * be slower to create (because we must flush the compressor stream at >> >>the >> >> * conclusion of each data block, which leads to an FS I/O flush). >> >> Further, due >> >> * to the internal caching in Compression codec, the smallest possible >> >> block >> >> * size would be around 20KB-30KB. >> >> >> >> Second, is it a single-thread test client or multi-threads? we couldn't >> >> expect too much if the requests are one by one. >> >> >> >> Third, could you provide more info about your DN disk numbers and IO >> >> utils ? >> >> >> >> Thanks, >> >> Liang >> >> ________________________________________ >> >> 发件人: Ankit Jain [ankitjainc...@gmail.com] >> >> 发送时间: 2013年4月15日 18:53 >> >> 收件人: user@hbase.apache.org >> >> 主题: Re: HBase random read performance >> >> >> >> Hi Anoop, >> >> >> >> Thanks for reply.. >> >> >> >> I tried by setting Hfile block size 4KB and also enabled the bloom >> >> filter(ROW). The maximum read performance that I was able to achieve is >> >> 10000 records in 14 secs (size of record is 1.6KB). >> >> >> >> Please suggest some tuning.. >> >> >> >> Thanks, >> >> Ankit Jain >> >> >> >> >> >> >> >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal < >> >> rishabh.agra...@impetus.co.in> wrote: >> >> >> >> > Interesting. Can you explain why this happens? >> >> > >> >> > -----Original Message----- >> >> > From: Anoop Sam John [mailto:anoo...@huawei.com] >> >> > Sent: Monday, April 15, 2013 3:47 PM >> >> > To: user@hbase.apache.org >> >> > Subject: RE: HBase random read performance >> >> > >> >> > Ankit >> >> > I guess you might be having default HFile block size >> >> > which is 64KB. >> >> > For random gets a lower value will be better. Try will some thing >> like >> >> 8KB >> >> > and check the latency? >> >> > >> >> > Ya ofcourse blooms can help (if major compaction was not done at the >> >>time >> >> > of testing) >> >> > >> >> > -Anoop- >> >> > ________________________________________ >> >> > From: Ankit Jain [ankitjainc...@gmail.com] >> >> > Sent: Saturday, April 13, 2013 11:01 AM >> >> > To: user@hbase.apache.org >> >> > Subject: HBase random read performance >> >> > >> >> > Hi All, >> >> > >> >> > We are using HBase 0.94.5 and Hadoop 1.0.4. >> >> > >> >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node). >> >>Each >> >> > regionserver has 8 GB RAM. >> >> > >> >> > We have loaded 25 millions records in HBase table, regions are >> >>pre-split >> >> > into 16 regions and all the regions are equally loaded. >> >> > >> >> > We are getting very low random read performance while performing >> multi >> >> get >> >> > from HBase. >> >> > >> >> > We are passing random 10000 row-keys as input, while HBase is taking >> >> around >> >> > 17 secs to return 10000 records. >> >> > >> >> > Please suggest some tuning to increase HBase read performance. >> >> > >> >> > Thanks, >> >> > Ankit Jain >> >> > iLabs >> >> > >> >> > >> >> > >> >> > -- >> >> > Thanks, >> >> > Ankit Jain >> >> > >> >> > ________________________________ >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > NOTE: This message may contain information that is confidential, >> >> > proprietary, privileged or otherwise protected by law. The message is >> >> > intended solely for the named addressee. If received in error, please >> >> > destroy and notify the sender. Any use of this email is prohibited >> >>when >> >> > received in error. Impetus does not represent, warrant and/or >> >>guarantee, >> >> > that the integrity of this communication has been maintained nor that >> >>the >> >> > communication is free of errors, virus, interception or interference. >> >> > >> >> >> >> >> >> >> >> -- >> >> Thanks, >> >> Ankit Jain >> >> >> > >> > >> > >> >-- >> >Thanks, >> >Ankit Jain >> >> >