Re: Spikes when writing data to HBase

Vladimir Rodionov Tue, 11 Aug 2015 19:25:47 -0700

Can you post code snippet? Pastbin link is fine.

-Vlad


On Tue, Aug 11, 2015 at 4:03 PM, Serega Sheypak <serega.shey...@gmail.com>
wrote:

> Probably I found something. Response time decreases when parallelism grows.
> What I did:
>
> 1. wrap business logic controller into java main class. My controller does
> some logic and puts/gets to hbase with checkAndPut (sometimes)
> 2. create HConnection
> 3. pass HConnection to controller
> 4. wrap controller execution into codahale metrics
> 5. execute controller in several threads simultaneously. The same happens
> in servlet environment
>
> I can't explain result.
> 1. I used 10 threads and 100000 iterations in each.
>
> *RESULT:  99% <= 28.81 milliseconds which sounds GOOD!*
> -- Meters
> ----------------------------------------------------------------------
> putMeter
>              count = 414914
>          mean rate = 885.58 events/second
>      1-minute rate = 911.56 events/second
>      5-minute rate = 778.16 events/second
>     15-minute rate = 549.72 events/second
>
> -- Timers
> ----------------------------------------------------------------------
> putTimer
>              count = 414914
>          mean rate = 884.66 calls/second
>      1-minute rate = 911.53 calls/second
>      5-minute rate = 765.60 calls/second
>     15-minute rate = 515.06 calls/second
>                min = 4.87 milliseconds
>                max = 211.77 milliseconds
>               mean = 10.81 milliseconds
>             stddev = 5.43 milliseconds
>             median = 10.34 milliseconds
>               75% <= 11.59 milliseconds
>               95% <= 14.41 milliseconds
>               98% <= 19.59 milliseconds
>               99% <= 28.81 milliseconds
>             99.9% <= 60.67 milliseconds
>
> I've increased count of threads to 100:
> *RESULT: 99% <= 112.09 milliseconds*
> -- Meters
> ----------------------------------------------------------------------
> putMeter
>              count = 1433056
>          mean rate = 2476.46 events/second
>      1-minute rate = 2471.18 events/second
>      5-minute rate = 2483.28 events/second
>     15-minute rate = 2512.52 events/second
>
> -- Timers
> ----------------------------------------------------------------------
> putTimer
>              count = 1433058
>          mean rate = 2474.61 calls/second
>      1-minute rate = 2468.45 calls/second
>      5-minute rate = 2446.45 calls/second
>     15-minute rate = 2383.23 calls/second
>                min = 10.03 milliseconds
>                max = 853.05 milliseconds
>               mean = 40.71 milliseconds
>             stddev = 39.04 milliseconds
>             median = 35.60 milliseconds
>               75% <= 47.69 milliseconds
>               95% <= 71.79 milliseconds
>               98% <= 85.83 milliseconds
>               99% <= 112.09 milliseconds
>             99.9% <= 853.05 milliseconds
>
> Is it possible to explain it? Could it be a problem in some
> pooling/threading inside HConnection?
>
> please see what happened to compactions during test:
> http://www.bigdatapath.com/wp-content/uploads/2015/08/compations.png
>
> get/put ops
> http://www.bigdatapath.com/wp-content/uploads/2015/08/get_ops.png
>
> slow ops:
> http://www.bigdatapath.com/wp-content/uploads/2015/08/slow_ops.png
>
> 2015-08-11 23:43 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>:
>
> > >How about GC activity? ApplicationStopTime? Do you track that?
> > yes, jviusalm says it's ok, newrelic also doesn't show something strange.
> > HBase also says it's OK.
> >
> > Profiler says most time thread is waiting for response from hbase side.
> My
> > assumption is:
> > 1. I have weird bug in HBase configuration
> > 2. I have undiscovered problems with networking (BUT the same tomcats
> > write data to flume with higher rate, no data loss at all)
> > 3. I have weird problem with HConnection HConnectionManager is
> > multithreaded env, when same servlet instance shared across many threads
> > 4. some mystic process somewhere in the cluster....
> >
> > >Is the issue reproducible? or you got it first time?
> > always. Spikes disappear during night, but RPM doesn't change too much.
> >
> > I will run my controller code out of tomcat and see how it goes. I'm
> going
> > to isolate components...
> >
> >
> > 2015-08-11 23:36 GMT+02:00 Vladimir Rodionov <vladrodio...@gmail.com>:
> >
> >> How about GC activity? ApplicationStopTime? Do you track that?
> >>
> >> Is the issue reproducible? or you got it first time?
> >>
> >> Start with RS logs and try to find anything suspicious in a period of a
> >> very high latency. 1.5 sec HBase write latency does not look right.
> >>
> >> -Vlad
> >>
> >> On Tue, Aug 11, 2015 at 2:08 PM, Serega Sheypak <
> serega.shey...@gmail.com
> >> >
> >> wrote:
> >>
> >> > Hi Vladimir!
> >> >
> >> > Here are graphs. Servlet (3 tomcats on 3 different hosts write to
> HBase)
> >> > http://www.bigdatapath.com/wp-content/uploads/2015/08/01_apps1.png
> >> > See how response time jump. I can't explain it. Write load is
> >> really-really
> >> > low.
> >> >
> >> > all RS have even load. I see request-metrics in HBase master web UI.
> >> > Tables are pre-splitted. I have 10 RS and pre-splitted tables on 50
> >> > regions.
> >> >
> >> >   >1. How large is your single write?
> >> > 1-2KB
> >> >
> >> >    >2. Do you see any RegionTooBusyException in a client log files
> >> > no HBase related exceptions. Response
> >> >
> >> >  >  3. How large is your table ( # of regions, # of column families)
> >> > 1 column familiy, table_01 150GB, table_02 130 GB
> >> >
> >> > I have two "major tables", here are stats for them:
> >> > http://www.bigdatapath.com/wp-content/uploads/2015/08/table_02.png
> >> > http://www.bigdatapath.com/wp-content/uploads/2015/08/table_01.png
> >> >    >4. RS memory related config: Max heap
> >> >
> >> >    5. memstore size (if not default - 0.4)
> >> > hbase.regionserver.global.memstore.upperLimit=0.4
> >> > hbase.regionserver.global.memstore.lowerLimit=0.38
> >> > RS heapsize=8GB
> >> >
> >> > >*Do you see any region splits?  *
> >> > no, never happened since tables are pre-splitted
> >> >
> >> > 2015-08-11 18:54 GMT+02:00 Vladimir Rodionov <vladrodio...@gmail.com
> >:
> >> >
> >> > > *Common questions:*
> >> > >
> >> > >
> >> > >    1. How large is your single write?
> >> > >    2. Do you see any RegionTooBusyException in a client log files
> >> > >    3. How large is your table ( # of regions, # of column families)
> >> > >    4. RS memory related config: Max heap
> >> > >    5. memstore size (if not default - 0.4)
> >> > >
> >> > >
> >> > > Memstore flush
> >> > >
> >> > > hbase.hregion.memstore.flush.size = 256M
> >> > > hbase.hregion.memstore.block.multiplier = N (do not block writes) N
> *
> >> > 256M
> >> > > MUST be greater than overall memstore size (HBASE_HEAPSIZE *
> >> > > hbase.regionserver.global.memstore.size)
> >> > >
> >> > > WAL files.
> >> > >
> >> > > Set HDFS block size to 256MB. hbase.regionserver.hlog.blocksize =
> 0.95
> >> > HDFS
> >> > > block size (256MB * 0.95). Keep hbase.regionserver.hlog.blocksize *
> >> > > hbase.regionserver.maxlogs just a bit above
> >> > > hbase.regionserver.global.memstore.lowerLimit
> >> > > (0.35-0.45) * HBASE_HEAPSIZE to avoid premature memstore flushing.
> >> > >
> >> > > *Do you see any region splits?  *
> >> > >
> >> > > Region split blocks writes. Try to presplit table and avoid
> splitting
> >> > after
> >> > > that. Disable splitting completely
> >> > >
> >> > > hbase.regionserver.region.split.policy
> >> > > =org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy
> >> > >
> >> > > -Vlad
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Tue, Aug 11, 2015 at 3:22 AM, Serega Sheypak <
> >> > serega.shey...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi, we are using version 1.0.0+cdh5.4.4+160
> >> > > > We have heavy write load, ~ 10K per econd
> >> > > > We have 10 nodes 7 disks each. I read some perf notes, they state
> >> that
> >> > > > HBase can handle 1K per second writes per node without any
> problems.
> >> > > >
> >> > > >
> >> > > > I see some spikes on "writers". Write operation timing "jumps"
> from
> >> > > 40-50ms
> >> > > > to 200-500ms Probably I hit memstore limit. RegionServer starts to
> >> > flush
> >> > > > memstore and stop to accept updates.
> >> > > >
> >> > > > I have several questions:
> >> > > > 1. Does 4/(8 in hyperthreading) CPU + 7HDD node could absorb 1K
> >> writes
> >> > > per
> >> > > > second?
> >> > > > 2. What is the right way to fight with blocked writes?
> >> > > > 2.1. What I did:
> >> > > > hbase.hregion.memstore.flush.size to 256M to produce larger HFiles
> >> when
> >> > > > flushing memstore
> >> > > > base.hregion.memstore.block.multiplier to 4, since I have only one
> >> > > > intensive-write table. Let it grow
> >> > > > hbase.regionserver.optionallogflushinterval to 10s, i CAN loose
> some
> >> > > data,
> >> > > > NP here. The idea that I reduce I/O pressure on disks.
> >> > > > ===
> >> > > > Not sure if I can correctly play with these parameters.
> >> > > > hbase.hstore.blockingStoreFiles=10
> >> > > > hbase.hstore.compactionThreshold=3
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Spikes when writing data to HBase

Reply via email to