Re: HBase Random Read latency > 100ms

Jean-Daniel Cryans Wed, 09 Oct 2013 11:11:20 -0700

On Wed, Oct 9, 2013 at 10:59 AM, Vladimir Rodionov
<[email protected]>wrote:


> I can't say for SCR. There is a possibility that the feature is broken, of
> course.
> But the fact that hbase.regionserver.checksum.verify does not affect
> performance means that OS caches
> effectively HDFS checksum files.
>

See "OS cache + SCR" VS "HBase CRC over OS cache+SCR" in this document I
shared some time ago:
https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html

In an all-in-memory test it shows a pretty big difference.

J-D


>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [email protected]
>
> ________________________________________
> From: Ramu M S [[email protected]]
> Sent: Wednesday, October 09, 2013 12:11 AM
> To: [email protected]; lars hofhansl
> Subject: Re: HBase Random Read latency > 100ms
>
> Hi All,
>
> Sorry. There was some mistake in the tests (Clients were not reduced,
> forgot to change the parameter before running tests).
>
> With 8 Clients and,
>
> SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8
> SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2
>
> Still, SCR disabled gives better results, which confuse me. Can anyone
> clarify?
>
> Also, I tried setting the parameter (hbase.regionserver.checksum.verify as
> true) Lars suggested with SCR disabled.
> Average Latency is around 9.8 ms, a fraction lesser.
>
> Thanks
> Ramu
>
>
> On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S <[email protected]> wrote:
>
> > Hi All,
> >
> > I just ran only 8 parallel clients,
> >
> > With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8
> > With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2
> >
> > I always thought SCR enabled, allows a client co-located with the
> DataNode
> > to read HDFS file blocks directly. This gives a performance boost to
> > distributed clients that are aware of locality.
> >
> > Is my understanding wrong OR it doesn't apply to my scenario?
> >
> > Meanwhile I will try setting the parameter suggested by Lars and post you
> > the results.
> >
> > Thanks,
> > Ramu
> >
> >
> > On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl <[email protected]> wrote:
> >
> >> Good call.
> >> Could try to enable hbase.regionserver.checksum.verify, which will cause
> >> HBase to do its own checksums rather than relying on HDFS (and which
> saves
> >> 1 IO per block get).
> >>
> >> I do think you can expect the index blocks to be cached at all times.
> >>
> >> -- Lars
> >> ________________________________
> >> From: Vladimir Rodionov <[email protected]>
> >> To: "[email protected]" <[email protected]>
> >> Sent: Tuesday, October 8, 2013 8:44 PM
> >> Subject: RE: HBase Random Read latency > 100ms
> >>
> >>
> >> Upd.
> >>
> >> Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO
> >> (data + .crc) in a worst case. I think if Bloom Filter is enabled than
> >> it is going to be 6 File IO in a worst case (large data set), therefore
> >> you will have not 5 IO requests in queue but up to 20-30 IO requests in
> a
> >> queue
> >> This definitely explains > 100ms avg latency.
> >>
> >>
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: [email protected]
> >>
> >> ________________________________________
> >>
> >> From: Vladimir Rodionov
> >> Sent: Tuesday, October 08, 2013 7:24 PM
> >> To: [email protected]
> >> Subject: RE: HBase Random Read latency > 100ms
> >>
> >> Ramu,
> >>
> >> You have 8 server boxes and 10 client. You have 40 requests in parallel
> -
> >> 5 per RS/DN?
> >>
> >> You have 5 requests on random reads in a IO queue of your single RAID1.
> >> With avg read latency of 10 ms, 5 requests in queue will give us 30ms.
> Add
> >> some overhead
> >> of HDFS + HBase and you will probably have your issue explained ?
> >>
> >> Your bottleneck is your disk system, I think. When you serve most of
> >> requests from disks as in your large data set scenario, make sure you
> have
> >> adequate disk sub-system and
> >> that it is configured properly. Block Cache and OS page can not help you
> >> in this case as working data set is larger than both caches.
> >>
> >> Good performance numbers in small data set scenario are explained by the
> >> fact that data fits into OS page cache and Block Cache - you do not read
> >> data from disk even if
> >> you disable block cache.
> >>
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: [email protected]
> >>
> >> ________________________________________
> >> From: Ramu M S [[email protected]]
> >> Sent: Tuesday, October 08, 2013 6:00 PM
> >> To: [email protected]
> >> Subject: Re: HBase Random Read latency > 100ms
> >>
> >> Hi All,
> >>
> >> After few suggestions from the mails earlier I changed the following,
> >>
> >> 1. Heap Size to 16 GB
> >> 2. Block Size to 16KB
> >> 3. HFile size to 8 GB (Table now has 256 regions, 32 per server)
> >> 4. Data Locality Index is 100 in all RS
> >>
> >> I have clients running in 10 machines, each with 4 threads. So total 40.
> >> This is same in all tests.
> >>
> >> Result:
> >>            1. Average latency is still >100ms.
> >>            2. Heap occupancy is around 2-2.5 GB in all RS
> >>
> >> Few more tests carried out yesterday,
> >>
> >> TEST 1: Small data set (100 Million records, each with 724 bytes).
> >> ===========================================
> >> Configurations:
> >> 1. Heap Size to 1 GB
> >> 2. Block Size to 16KB
> >> 3. HFile size to 1 GB (Table now has 128 regions, 16 per server)
> >> 4. Data Locality Index is 100 in all RS
> >>
> >> I disabled Block Cache on the table, to make sure I read everything from
> >> disk, most of the time.
> >>
> >> Result:
> >>    1. Average Latency is 8ms and throughput went up to 6K/Sec per RS.
> >>    2. With Block Cache enabled again, I got average latency around 2ms
> >> and throughput of 10K/Sec per RS.
> >>        Heap occupancy around 650 MB
> >>    3. Increased the Heap to 16GB, with Block Cache still enabled, I got
> >> average latency around 1 ms and throughput 20K/Sec per RS
> >>        Heap Occupancy around 2-2.5 GB in all RS
> >>
> >> TEST 2: Large Data set (1.8 Billion records, each with 724 bytes)
> >> ==================================================
> >> Configurations:
> >> 1. Heap Size to 1 GB
> >> 2. Block Size to 16KB
> >> 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server)
> >> 4. Data Locality Index is 100 in all RS
> >>
> >> Result:
> >>   1. Average Latency is > 500ms to start with and gradually decreases,
> but
> >> even after around 100 Million reads it is still >100 ms
> >>   2. Block Cache = TRUE/FALSE does not make any difference here. Even
> Heap
> >> Size (1GB / 16GB) does not make any difference.
> >>   3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB
> >> under 1GB Heap.
> >>
> >> GC Time in all of the scenarios is around 2ms/Second, as shown in the
> >> Cloudera Manager.
> >>
> >> Reading most of the items from Disk in less data scenario gives better
> >> results and very low latencies.
> >>
> >> Number of regions per RS and HFile size does make a huge difference in
> my
> >> Cluster.
> >> Keeping 100 Regions per RS as max(Most of the discussions suggest this),
> >> should I restrict the HFile size to 1GB? and thus reducing the storage
> >> capacity (From 700 GB to 100GB per RS)?
> >>
> >> Please advice.
> >>
> >> Thanks,
> >> Ramu
> >>
> >>
> >> On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov
> >> <[email protected]>wrote:
> >>
> >> > What are your current heap and block cache sizes?
> >> >
> >> > Best regards,
> >> > Vladimir Rodionov
> >> > Principal Platform Engineer
> >> > Carrier IQ, www.carrieriq.com
> >> > e-mail: [email protected]
> >> >
> >> > ________________________________________
> >> > From: Ramu M S [[email protected]]
> >> > Sent: Monday, October 07, 2013 10:55 PM
> >> > To: [email protected]
> >> > Subject: Re: HBase Random Read latency > 100ms
> >> >
> >> > Hi All,
> >> >
> >> > Average Latency is still around 80ms.
> >> > I have done the following,
> >> >
> >> > 1. Enabled Snappy Compression
> >> > 2. Reduce the HFile size to 8 GB
> >> >
> >> > Should I attribute these results to bad Disk Configuration OR anything
> >> else
> >> > to investigate?
> >> >
> >> > - Ramu
> >> >
> >> >
> >> > On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S <[email protected]>
> wrote:
> >> >
> >> > > Vladimir,
> >> > >
> >> > > Thanks for the Insights into Future Caching features. Looks very
> >> > > interesting.
> >> > >
> >> > > - Ramu
> >> > >
> >> > >
> >> > > On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov <
> >> > > [email protected]> wrote:
> >> > >
> >> > >> Ramu,
> >> > >>
> >> > >> If your working set of data fits into 192GB you may get additional
> >> boost
> >> > >> by utilizing OS page cache, or wait until
> >> > >> 0.98 release which introduces new bucket cache implementation (port
> >> of
> >> > >> Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not
> >> > released
> >> > >> yet
> >> > >> but is due soon). Both caches stores data off-heap, but Facebook
> >> version
> >> > >> can store encoded and compressed data and vanilla bucket cache does
> >> not.
> >> > >> There are some options how to utilize efficiently available RAM (at
> >> > least
> >> > >> in upcoming HBase releases)
> >> > >> . If your data set does not fit RAM then your only hope is your 24
> >> SAS
> >> > >> drives. Depending on your RAID settings, disk IO perf, HDFS
> >> > configuration
> >> > >> (I think the latest Hadoop is preferable here).
> >> > >>
> >> > >> OS page cache is most vulnerable and volatile, it can not be
> >> controlled
> >> > >> and can be easily polluted by either some other processes or by
> HBase
> >> > >> itself (long scan).
> >> > >> With Block cache you have more control but the first truly usable
> >> > >> *official* implementation is going to be a part of 0.98 release.
> >> > >>
> >> > >> As far as I understand, your use case would definitely covered by
> >> > >> something similar to BigTable ScanCache (RowCache) , but there is
> no
> >> > such
> >> > >> cache in HBase yet.
> >> > >> One major advantage of RowCache vs BlockCache (apart from being
> much
> >> > more
> >> > >> efficient in RAM usage) is resilience to Region compactions. Each
> >> minor
> >> > >> Region compaction invalidates partially
> >> > >> Region's data in BlockCache and major compaction invalidates this
> >> > >> Region's data completely. This is not the case with RowCache (would
> >> it
> >> > be
> >> > >> implemented).
> >> > >>
> >> > >> Best regards,
> >> > >> Vladimir Rodionov
> >> > >> Principal Platform Engineer
> >> > >> Carrier IQ, www.carrieriq.com
> >> > >> e-mail: [email protected]
> >> > >>
> >> > >> ________________________________________
> >> > >> From: Ramu M S [[email protected]]
> >> > >> Sent: Monday, October 07, 2013 5:25 PM
> >> > >> To: [email protected]
> >> > >> Subject: Re: HBase Random Read latency > 100ms
> >> > >>
> >> > >> Vladimir,
> >> > >>
> >> > >> Yes. I am fully aware of the HDD limitation and wrong
> configurations
> >> wrt
> >> > >> RAID.
> >> > >> Unfortunately, the hardware is leased from others for this work
> and I
> >> > >> wasn't consulted to decide the h/w specification for the tests that
> >> I am
> >> > >> doing now. Even the RAID cannot be turned off or set to RAID-0
> >> > >>
> >> > >> Production system is according to the Hadoop needs (100 Nodes with
> 16
> >> > Core
> >> > >> CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely
> >> turned
> >> > >> off, so we are creating 1 Virtual Disk containing only 1 Physical
> >> Disk
> >> > and
> >> > >> the VD RAID level set to* *RAID-0). These systems are still not
> >> > >> available. If
> >> > >> you have any suggestion on the production setup, I will be glad to
> >> hear.
> >> > >>
> >> > >> Also, as pointed out earlier, we are planning to use HBase also as
> >> an in
> >> > >> memory KV store to access the latest data.
> >> > >> That's why RAM was considered huge in this configuration. But looks
> >> like
> >> > >> we
> >> > >> would run into more problems than any gains from this.
> >> > >>
> >> > >> Keeping that aside, I was trying to get the maximum out of the
> >> current
> >> > >> cluster or as you said Is 500-1000 OPS the max I could get out of
> >> this
> >> > >> setup?
> >> > >>
> >> > >> Regards,
> >> > >> Ramu
> >> > >>
> >> > >>
> >> > >>
> >> > >> Confidentiality Notice:  The information contained in this message,
> >> > >> including any attachments hereto, may be confidential and is
> intended
> >> > to be
> >> > >> read only by the individual or entity to whom this message is
> >> > addressed. If
> >> > >> the reader of this message is not the intended recipient or an
> agent
> >> or
> >> > >> designee of the intended recipient, please note that any review,
> use,
> >> > >> disclosure or distribution of this message or its attachments, in
> any
> >> > form,
> >> > >> is strictly prohibited.  If you have received this message in
> error,
> >> > please
> >> > >> immediately notify the sender and/or [email protected]
> >> > >> delete or destroy any copy of this message and its attachments.
> >> > >>
> >> > >
> >> > >
> >> >
> >> > Confidentiality Notice:  The information contained in this message,
> >> > including any attachments hereto, may be confidential and is intended
> >> to be
> >> > read only by the individual or entity to whom this message is
> >> addressed. If
> >> > the reader of this message is not the intended recipient or an agent
> or
> >> > designee of the intended recipient, please note that any review, use,
> >> > disclosure or distribution of this message or its attachments, in any
> >> form,
> >> > is strictly prohibited.  If you have received this message in error,
> >> please
> >> > immediately notify the sender and/or [email protected] and
> >> > delete or destroy any copy of this message and its attachments.
> >> >
> >>
> >> Confidentiality Notice:  The information contained in this message,
> >> including any attachments hereto, may be confidential and is intended
> to be
> >> read only by the individual or entity to whom this message is
> addressed. If
> >> the reader of this message is not the intended recipient or an agent or
> >> designee of the intended recipient, please note that any review, use,
> >> disclosure or distribution of this message or its attachments, in any
> form,
> >> is strictly prohibited.  If you have received this message in error,
> please
> >> immediately notify the sender and/or [email protected] and
> >> delete or destroy any copy of this message and its attachments.
> >>
> >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or [email protected] and
> delete or destroy any copy of this message and its attachments.
>

Re: HBase Random Read latency > 100ms

Reply via email to