How many Parallel GC were you using? Regarding block cache - just to see I understood this right: if your are doing a massive read in HBase it's better to turn off block caching through the Scan attribute?
On Thursday, October 10, 2013, Otis Gospodnetic wrote: > Hi Ramu, > > I think I saw mentions of this possibly being a GC issue.... though > now it seems it may be a disk IO issue? > > 3 things: > 1) http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector/ > - our G1 experience, with HBase specificallytrivute > 2) If you can share some of your performance graphs (GC, disk IO, JVM > memory pools, HBase specific ones, etc.) people will likely be able to > provide better help > 3) You can do 2) with SPM (see sig), and actually you can send email > to this ML with your graphs directly from SPM. :) > > Otis > -- > Solr & ElasticSearch Support -- http://sematext.com/ > Performance Monitoring -- http://sematext.com/spm > > > > On Wed, Oct 9, 2013 at 3:11 AM, Ramu M S <[email protected]> wrote: > > Hi All, > > > > Sorry. There was some mistake in the tests (Clients were not reduced, > > forgot to change the parameter before running tests). > > > > With 8 Clients and, > > > > SCR Enabled : Average Latency is 25 ms, IO Wait % is around 8 > > SCR Disabled: Average Latency is 10 ms, IO Wait % is around 2 > > > > Still, SCR disabled gives better results, which confuse me. Can anyone > > clarify? > > > > Also, I tried setting the parameter (hbase.regionserver.checksum.verify > as > > true) Lars suggested with SCR disabled. > > Average Latency is around 9.8 ms, a fraction lesser. > > > > Thanks > > Ramu > > > > > > On Wed, Oct 9, 2013 at 3:32 PM, Ramu M S <[email protected]> wrote: > > > >> Hi All, > >> > >> I just ran only 8 parallel clients, > >> > >> With SCR Enabled : Average Latency is 80 ms, IO Wait % is around 8 > >> With SCR Disabled: Average Latency is 40 ms, IO Wait % is around 2 > >> > >> I always thought SCR enabled, allows a client co-located with the > DataNode > >> to read HDFS file blocks directly. This gives a performance boost to > >> distributed clients that are aware of locality. > >> > >> Is my understanding wrong OR it doesn't apply to my scenario? > >> > >> Meanwhile I will try setting the parameter suggested by Lars and post > you > >> the results. > >> > >> Thanks, > >> Ramu > >> > >> > >> On Wed, Oct 9, 2013 at 2:29 PM, lars hofhansl <[email protected]> wrote: > >> > >>> Good call. > >>> Could try to enable hbase.regionserver.checksum.verify, which will > cause > >>> HBase to do its own checksums rather than relying on HDFS (and which > saves > >>> 1 IO per block get). > >>> > >>> I do think you can expect the index blocks to be cached at all times. > >>> > >>> -- Lars > >>> ________________________________ > >>> From: Vladimir Rodionov <[email protected]> > >>> To: "[email protected]" <[email protected]> > >>> Sent: Tuesday, October 8, 2013 8:44 PM > >>> Subject: RE: HBase Random Read latency > 100ms > >>> > >>> > >>> Upd. > >>> > >>> Each HBase Get = 2 HDFS read IO (index block + data block)= 4 File IO > >>> (data + .crc) in a worst case. I think if Bloom Filter is enabled than > >>> it is going to be 6 File IO in a worst case (large data set), therefore > >>> you will have not 5 IO requests in queue but up to 20-30 IO requests > in a > >>> queue > >>> This definitely explains > 100ms avg latency. > >>> > >>> > >>> > >>> Best regards, > >>> Vladimir Rodionov > >>> Principal Platform Engineer > >>> Carrier IQ, www.carrieriq.com > >>> e-mail: [email protected] > >>> > >>> ________________________________________ > >>> > >>> From: Vladimir Rodionov > >>> Sent: Tuesday, October 08, 2013 7:24 PM > >>> To: [email protected] > >>> Subject: RE: HBase Random Read latency > 100ms > >>> > >>> Ramu, > >>> > >>> You have 8 server boxes and 10 client. You have 40 requests in > parallel - > >>> 5 per RS/DN? > >>> > >>> You have 5 requests on random reads in a IO queue of your single RAID1. > >>> With avg read latency of 10 ms, 5 requests in queue will give us 30ms. > Add > >>> some overhead > >>> of HDFS + HBase and you will probably have your issue explained ? > >>> > >>> Your bottleneck is your disk system, I think. When you serve most of >
