On Wed, Oct 9, 2013 at 10:59 AM, Vladimir Rodionov <[email protected]>wrote:
> I can't say for SCR. There is a possibility that the feature is broken, of > course. > But the fact that hbase.regionserver.checksum.verify does not affect > performance means that OS caches > effectively HDFS checksum files. > See "OS cache + SCR" VS "HBase CRC over OS cache+SCR" in this document I shared some time ago: https://docs.google.com/spreadsheet/pub?key=0Ao87IrzZJSaydENaem5USWg4TlRKcHl0dEtTS2NBOUE&output=html In an all-in-memory test it shows a pretty big difference. J-D > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Ramu M S [[email protected]] > Sent: Wednesday, October 09, 2013 12:11 AM > To: [email protected]; lars hofhansl > Subject: Re: HBase Random Read latency > 100ms > > Hi All, > > Sorry. There was some mistake in the tests (Clients were not reduced, > forgot to change the parameter before running tests). > > With 8 Clients and, > > SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8 > SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2 > > Still, SCR disabled gives better results, which confuse me. Can anyone > clarify? > > Also, I tried setting the parameter (hbase.regionserver.checksum.verify as > true) Lars suggested with SCR disabled. > Average Latency is around 9.8 ms, a fraction lesser. > > Thanks > Ramu > > > On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S <[email protected]> wrote: > > > Hi All, > > > > I just ran only 8 parallel clients, > > > > With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8 > > With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2 > > > > I always thought SCR enabled, allows a client co-located with the > DataNode > > to read HDFS file blocks directly. This gives a performance boost to > > distributed clients that are aware of locality. > > > > Is my understanding wrong OR it doesn't apply to my scenario? > > > > Meanwhile I will try setting the parameter suggested by Lars and post you > > the results. > > > > Thanks, > > Ramu > > > > > > On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl <[email protected]> wrote: > > > >> Good call. > >> Could try to enable hbase.regionserver.checksum.verify, which will cause > >> HBase to do its own checksums rather than relying on HDFS (and which > saves > >> 1 IO per block get). > >> > >> I do think you can expect the index blocks to be cached at all times. > >> > >> -- Lars > >> ________________________________ > >> From: Vladimir Rodionov <[email protected]> > >> To: "[email protected]" <[email protected]> > >> Sent: Tuesday, October 8, 2013 8:44 PM > >> Subject: RE: HBase Random Read latency > 100ms > >> > >> > >> Upd. > >> > >> Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO > >> (data + .crc) in a worst case. I think if Bloom Filter is enabled than > >> it is going to be 6 File IO in a worst case (large data set), therefore > >> you will have not 5 IO requests in queue but up to 20-30 IO requests in > a > >> queue > >> This definitely explains > 100ms avg latency. > >> > >> > >> > >> Best regards, > >> Vladimir Rodionov > >> Principal Platform Engineer > >> Carrier IQ, www.carrieriq.com > >> e-mail: [email protected] > >> > >> ________________________________________ > >> > >> From: Vladimir Rodionov > >> Sent: Tuesday, October 08, 2013 7:24 PM > >> To: [email protected] > >> Subject: RE: HBase Random Read latency > 100ms > >> > >> Ramu, > >> > >> You have 8 server boxes and 10 client. You have 40 requests in parallel > - > >> 5 per RS/DN? > >> > >> You have 5 requests on random reads in a IO queue of your single RAID1. > >> With avg read latency of 10 ms, 5 requests in queue will give us 30ms. > Add > >> some overhead > >> of HDFS + HBase and you will probably have your issue explained ? > >> > >> Your bottleneck is your disk system, I think. When you serve most of > >> requests from disks as in your large data set scenario, make sure you > have > >> adequate disk sub-system and > >> that it is configured properly. Block Cache and OS page can not help you > >> in this case as working data set is larger than both caches. > >> > >> Good performance numbers in small data set scenario are explained by the > >> fact that data fits into OS page cache and Block Cache - you do not read > >> data from disk even if > >> you disable block cache. > >> > >> > >> Best regards, > >> Vladimir Rodionov > >> Principal Platform Engineer > >> Carrier IQ, www.carrieriq.com > >> e-mail: [email protected] > >> > >> ________________________________________ > >> From: Ramu M S [[email protected]] > >> Sent: Tuesday, October 08, 2013 6:00 PM > >> To: [email protected] > >> Subject: Re: HBase Random Read latency > 100ms > >> > >> Hi All, > >> > >> After few suggestions from the mails earlier I changed the following, > >> > >> 1. Heap Size to 16 GB > >> 2. Block Size to 16KB > >> 3. HFile size to 8 GB (Table now has 256 regions, 32 per server) > >> 4. Data Locality Index is 100 in all RS > >> > >> I have clients running in 10 machines, each with 4 threads. So total 40. > >> This is same in all tests. > >> > >> Result: > >> 1. Average latency is still >100ms. > >> 2. Heap occupancy is around 2-2.5 GB in all RS > >> > >> Few more tests carried out yesterday, > >> > >> TEST 1: Small data set (100 Million records, each with 724 bytes). > >> =========================================== > >> Configurations: > >> 1. Heap Size to 1 GB > >> 2. Block Size to 16KB > >> 3. HFile size to 1 GB (Table now has 128 regions, 16 per server) > >> 4. Data Locality Index is 100 in all RS > >> > >> I disabled Block Cache on the table, to make sure I read everything from > >> disk, most of the time. > >> > >> Result: > >> 1. Average Latency is 8ms and throughput went up to 6K/Sec per RS. > >> 2. With Block Cache enabled again, I got average latency around 2ms > >> and throughput of 10K/Sec per RS. > >> Heap occupancy around 650 MB > >> 3. Increased the Heap to 16GB, with Block Cache still enabled, I got > >> average latency around 1 ms and throughput 20K/Sec per RS > >> Heap Occupancy around 2-2.5 GB in all RS > >> > >> TEST 2: Large Data set (1.8 Billion records, each with 724 bytes) > >> ================================================== > >> Configurations: > >> 1. Heap Size to 1 GB > >> 2. Block Size to 16KB > >> 3. HFile size to 1 GB (Table now has 2048 regions, 256 per server) > >> 4. Data Locality Index is 100 in all RS > >> > >> Result: > >> 1. Average Latency is > 500ms to start with and gradually decreases, > but > >> even after around 100 Million reads it is still >100 ms > >> 2. Block Cache = TRUE/FALSE does not make any difference here. Even > Heap > >> Size (1GB / 16GB) does not make any difference. > >> 3. Heap occupancy is around 2-2.5 GB under 16GB Heap and around 650 MB > >> under 1GB Heap. > >> > >> GC Time in all of the scenarios is around 2ms/Second, as shown in the > >> Cloudera Manager. > >> > >> Reading most of the items from Disk in less data scenario gives better > >> results and very low latencies. > >> > >> Number of regions per RS and HFile size does make a huge difference in > my > >> Cluster. > >> Keeping 100 Regions per RS as max(Most of the discussions suggest this), > >> should I restrict the HFile size to 1GB? and thus reducing the storage > >> capacity (From 700 GB to 100GB per RS)? > >> > >> Please advice. > >> > >> Thanks, > >> Ramu > >> > >> > >> On Wed, Oct 9, 2013 at 4:58 AM, Vladimir Rodionov > >> <[email protected]>wrote: > >> > >> > What are your current heap and block cache sizes? > >> > > >> > Best regards, > >> > Vladimir Rodionov > >> > Principal Platform Engineer > >> > Carrier IQ, www.carrieriq.com > >> > e-mail: [email protected] > >> > > >> > ________________________________________ > >> > From: Ramu M S [[email protected]] > >> > Sent: Monday, October 07, 2013 10:55 PM > >> > To: [email protected] > >> > Subject: Re: HBase Random Read latency > 100ms > >> > > >> > Hi All, > >> > > >> > Average Latency is still around 80ms. > >> > I have done the following, > >> > > >> > 1. Enabled Snappy Compression > >> > 2. Reduce the HFile size to 8 GB > >> > > >> > Should I attribute these results to bad Disk Configuration OR anything > >> else > >> > to investigate? > >> > > >> > - Ramu > >> > > >> > > >> > On Tue, Oct 8, 2013 at 10:56 AM, Ramu M S <[email protected]> > wrote: > >> > > >> > > Vladimir, > >> > > > >> > > Thanks for the Insights into Future Caching features. Looks very > >> > > interesting. > >> > > > >> > > - Ramu > >> > > > >> > > > >> > > On Tue, Oct 8, 2013 at 10:45 AM, Vladimir Rodionov < > >> > > [email protected]> wrote: > >> > > > >> > >> Ramu, > >> > >> > >> > >> If your working set of data fits into 192GB you may get additional > >> boost > >> > >> by utilizing OS page cache, or wait until > >> > >> 0.98 release which introduces new bucket cache implementation (port > >> of > >> > >> Facebook L2 cache). You can try vanilla bucket cache in 0.96 (not > >> > released > >> > >> yet > >> > >> but is due soon). Both caches stores data off-heap, but Facebook > >> version > >> > >> can store encoded and compressed data and vanilla bucket cache does > >> not. > >> > >> There are some options how to utilize efficiently available RAM (at > >> > least > >> > >> in upcoming HBase releases) > >> > >> . If your data set does not fit RAM then your only hope is your 24 > >> SAS > >> > >> drives. Depending on your RAID settings, disk IO perf, HDFS > >> > configuration > >> > >> (I think the latest Hadoop is preferable here). > >> > >> > >> > >> OS page cache is most vulnerable and volatile, it can not be > >> controlled > >> > >> and can be easily polluted by either some other processes or by > HBase > >> > >> itself (long scan). > >> > >> With Block cache you have more control but the first truly usable > >> > >> *official* implementation is going to be a part of 0.98 release. > >> > >> > >> > >> As far as I understand, your use case would definitely covered by > >> > >> something similar to BigTable ScanCache (RowCache) , but there is > no > >> > such > >> > >> cache in HBase yet. > >> > >> One major advantage of RowCache vs BlockCache (apart from being > much > >> > more > >> > >> efficient in RAM usage) is resilience to Region compactions. Each > >> minor > >> > >> Region compaction invalidates partially > >> > >> Region's data in BlockCache and major compaction invalidates this > >> > >> Region's data completely. This is not the case with RowCache (would > >> it > >> > be > >> > >> implemented). > >> > >> > >> > >> Best regards, > >> > >> Vladimir Rodionov > >> > >> Principal Platform Engineer > >> > >> Carrier IQ, www.carrieriq.com > >> > >> e-mail: [email protected] > >> > >> > >> > >> ________________________________________ > >> > >> From: Ramu M S [[email protected]] > >> > >> Sent: Monday, October 07, 2013 5:25 PM > >> > >> To: [email protected] > >> > >> Subject: Re: HBase Random Read latency > 100ms > >> > >> > >> > >> Vladimir, > >> > >> > >> > >> Yes. I am fully aware of the HDD limitation and wrong > configurations > >> wrt > >> > >> RAID. > >> > >> Unfortunately, the hardware is leased from others for this work > and I > >> > >> wasn't consulted to decide the h/w specification for the tests that > >> I am > >> > >> doing now. Even the RAID cannot be turned off or set to RAID-0 > >> > >> > >> > >> Production system is according to the Hadoop needs (100 Nodes with > 16 > >> > Core > >> > >> CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely > >> turned > >> > >> off, so we are creating 1 Virtual Disk containing only 1 Physical > >> Disk > >> > and > >> > >> the VD RAID level set to* *RAID-0). These systems are still not > >> > >> available. If > >> > >> you have any suggestion on the production setup, I will be glad to > >> hear. > >> > >> > >> > >> Also, as pointed out earlier, we are planning to use HBase also as > >> an in > >> > >> memory KV store to access the latest data. > >> > >> That's why RAM was considered huge in this configuration. But looks > >> like > >> > >> we > >> > >> would run into more problems than any gains from this. > >> > >> > >> > >> Keeping that aside, I was trying to get the maximum out of the > >> current > >> > >> cluster or as you said Is 500-1000 OPS the max I could get out of > >> this > >> > >> setup? > >> > >> > >> > >> Regards, > >> > >> Ramu > >> > >> > >> > >> > >> > >> > >> > >> Confidentiality Notice: The information contained in this message, > >> > >> including any attachments hereto, may be confidential and is > intended > >> > to be > >> > >> read only by the individual or entity to whom this message is > >> > addressed. If > >> > >> the reader of this message is not the intended recipient or an > agent > >> or > >> > >> designee of the intended recipient, please note that any review, > use, > >> > >> disclosure or distribution of this message or its attachments, in > any > >> > form, > >> > >> is strictly prohibited. If you have received this message in > error, > >> > please > >> > >> immediately notify the sender and/or [email protected] > >> > >> delete or destroy any copy of this message and its attachments. > >> > >> > >> > > > >> > > > >> > > >> > Confidentiality Notice: The information contained in this message, > >> > including any attachments hereto, may be confidential and is intended > >> to be > >> > read only by the individual or entity to whom this message is > >> addressed. If > >> > the reader of this message is not the intended recipient or an agent > or > >> > designee of the intended recipient, please note that any review, use, > >> > disclosure or distribution of this message or its attachments, in any > >> form, > >> > is strictly prohibited. If you have received this message in error, > >> please > >> > immediately notify the sender and/or [email protected] and > >> > delete or destroy any copy of this message and its attachments. > >> > > >> > >> Confidentiality Notice: The information contained in this message, > >> including any attachments hereto, may be confidential and is intended > to be > >> read only by the individual or entity to whom this message is > addressed. If > >> the reader of this message is not the intended recipient or an agent or > >> designee of the intended recipient, please note that any review, use, > >> disclosure or distribution of this message or its attachments, in any > form, > >> is strictly prohibited. If you have received this message in error, > please > >> immediately notify the sender and/or [email protected] and > >> delete or destroy any copy of this message and its attachments. > >> > > > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
