Re: 答复: HBase random read performance

Ted Yu Mon, 15 Apr 2013 07:13:39 -0700

I looked
at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java in
0.94


In processBatchCallback(), starting line 1538,

        // step 1: break up into regionserver-sized chunks and build the
data structs
        Map<HRegionLocation, MultiAction<R>> actionsByServer =
          new HashMap<HRegionLocation, MultiAction<R>>();
        for (int i = 0; i < workingList.size(); i++) {

So we do group individual action by server.

FYI

On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Doug made a good point.
>
> Take a look at the performance gain for parallel scan (bottom chart
> compared to top chart):
> https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png
>
> See
> https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=13628300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628300for
>  explanation of the two methods.
>
> Cheers
>
> On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil 
> <doug.m...@explorysmedical.com>wrote:
>
>>
>> Hi there, regarding this...
>>
>> > We are passing random 10000 row-keys as input, while HBase is taking
>> around
>> > 17 secs to return 10000 records.
>>
>>
>> ….  Given that you are generating 10,000 random keys, your multi-get is
>> very likely hitting all 5 nodes of your cluster.
>>
>>
>> Historically, multi-Get used to first sort the requests by RS and then
>> *serially* go the RS to process the multi-Get.  I'm not sure of the
>> current (0.94.x) behavior if it multi-threads or not.
>>
>> One thing you might want to consider is confirming that client behavior,
>> and if it's not multi-threading then perform a test that does the same RS
>> sorting via...
>>
>>
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
>> getRegionLocation%28byte[<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[>
>> ]%29
>>
>> …. and then spin up your own threads (one per target RS) and see what
>> happens.
>>
>>
>>
>> On 4/15/13 9:04 AM, "Ankit Jain" <ankitjainc...@gmail.com> wrote:
>>
>> >Hi Liang,
>> >
>> >Thanks Liang for reply..
>> >
>> >Ans1:
>> >I tried by using HFile block size of 32 KB and bloom filter is enabled.
>> >The
>> >random read performance is 10000 records in 23 secs.
>> >
>> >Ans2:
>> >We are retrieving all the 10000 rows in one call.
>> >
>> >Ans3:
>> >Disk detai:
>> >Model Number:       ST2000DM001-1CH164
>> >Serial Number:      Z1E276YF
>> >
>> >Please suggest some more optimization
>> >
>> >Thanks,
>> >Ankit Jain
>> >
>> >On Mon, Apr 15, 2013 at 5:11 PM, 谢良 <xieli...@xiaomi.com> wrote:
>> >
>> >> First, it's probably helpless to set block size to 4KB, please refer to
>> >> the beginning of HFile.java:
>> >>
>> >>  Smaller blocks are good
>> >>  * for random access, but require more memory to hold the block index,
>> >>and
>> >> may
>> >>  * be slower to create (because we must flush the compressor stream at
>> >>the
>> >>  * conclusion of each data block, which leads to an FS I/O flush).
>> >> Further, due
>> >>  * to the internal caching in Compression codec, the smallest possible
>> >> block
>> >>  * size would be around 20KB-30KB.
>> >>
>> >> Second, is it a single-thread test client or multi-threads? we couldn't
>> >> expect too much if the requests are one by one.
>> >>
>> >> Third, could you provide more info about  your DN disk numbers and IO
>> >> utils ?
>> >>
>> >> Thanks,
>> >> Liang
>> >> ________________________________________
>> >> 发件人: Ankit Jain [ankitjainc...@gmail.com]
>> >> 发送时间: 2013年4月15日 18:53
>> >> 收件人: user@hbase.apache.org
>> >> 主题: Re: HBase random read performance
>> >>
>> >> Hi Anoop,
>> >>
>> >> Thanks for reply..
>> >>
>> >> I tried by setting Hfile block size 4KB and also enabled the bloom
>> >> filter(ROW). The maximum read performance that I was able to achieve is
>> >> 10000 records in 14 secs (size of record is 1.6KB).
>> >>
>> >> Please suggest some tuning..
>> >>
>> >> Thanks,
>> >> Ankit Jain
>> >>
>> >>
>> >>
>> >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal <
>> >> rishabh.agra...@impetus.co.in> wrote:
>> >>
>> >> > Interesting. Can you explain why this happens?
>> >> >
>> >> > -----Original Message-----
>> >> > From: Anoop Sam John [mailto:anoo...@huawei.com]
>> >> > Sent: Monday, April 15, 2013 3:47 PM
>> >> > To: user@hbase.apache.org
>> >> > Subject: RE: HBase random read performance
>> >> >
>> >> > Ankit
>> >> >                  I guess you might be having default HFile block size
>> >> > which is 64KB.
>> >> > For random gets a lower value will be better. Try will some thing
>> like
>> >> 8KB
>> >> > and check the latency?
>> >> >
>> >> > Ya ofcourse blooms can help (if major compaction was not done at the
>> >>time
>> >> > of testing)
>> >> >
>> >> > -Anoop-
>> >> > ________________________________________
>> >> > From: Ankit Jain [ankitjainc...@gmail.com]
>> >> > Sent: Saturday, April 13, 2013 11:01 AM
>> >> > To: user@hbase.apache.org
>> >> > Subject: HBase random read performance
>> >> >
>> >> > Hi All,
>> >> >
>> >> > We are using HBase 0.94.5 and Hadoop 1.0.4.
>> >> >
>> >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node).
>> >>Each
>> >> > regionserver has 8 GB RAM.
>> >> >
>> >> > We have loaded 25 millions records in HBase table, regions are
>> >>pre-split
>> >> > into 16 regions and all the regions are equally loaded.
>> >> >
>> >> > We are getting very low random read performance while performing
>> multi
>> >> get
>> >> > from HBase.
>> >> >
>> >> > We are passing random 10000 row-keys as input, while HBase is taking
>> >> around
>> >> > 17 secs to return 10000 records.
>> >> >
>> >> > Please suggest some tuning to increase HBase read performance.
>> >> >
>> >> > Thanks,
>> >> > Ankit Jain
>> >> > iLabs
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Thanks,
>> >> > Ankit Jain
>> >> >
>> >> > ________________________________
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > NOTE: This message may contain information that is confidential,
>> >> > proprietary, privileged or otherwise protected by law. The message is
>> >> > intended solely for the named addressee. If received in error, please
>> >> > destroy and notify the sender. Any use of this email is prohibited
>> >>when
>> >> > received in error. Impetus does not represent, warrant and/or
>> >>guarantee,
>> >> > that the integrity of this communication has been maintained nor that
>> >>the
>> >> > communication is free of errors, virus, interception or interference.
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks,
>> >> Ankit Jain
>> >>
>> >
>> >
>> >
>> >--
>> >Thanks,
>> >Ankit Jain
>>
>>
>

Re: 答复: HBase random read performance

Reply via email to