Re: Confusing Region Split behavior in YCSB Tests

Ted Yu Thu, 18 Nov 2010 21:49:16 -0800

We currently used patched 0.20.6 with HBASE-2473 in production.

On Thu, Nov 18, 2010 at 9:45 PM, Suraj Varma <svarma...@gmail.com> wrote:


> Ah, that's a nice feature - thanks for pointing this out. Once I get on
> 0.90, I can probably use this instead of scripting splits.
>
> Thanks,
> --Suraj
>
> On Thu, Nov 18, 2010 at 11:13 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > FYI You can preconfigure the number of regions when you create your
> table:
> > https://issues.apache.org/jira/browse/HBASE-2473
> >
> > On Thu, Nov 18, 2010 at 11:10 AM, Suraj Varma <svarma...@gmail.com>
> wrote:
> >
> > > What I'm trying to find is how much improvement in throughput &
> reduction
> > > in
> > > latency can I hope to get by spreading out a table across multiple
> region
> > > servers.
> > >
> > > We have some tables that are wide, but short ... it currently fits in a
> > > single region on a single region server. I'm trying to determine if it
> is
> > > worth splitting it out further and distribute it across multiple region
> > > servers (in an attempt to reduce any network / single server
> > bottlenecks).
> > >
> > > Since it is just an experiment on my dev cluster, I'm doing this
> manually
> > > (planning to script it if I can a substantial improvement).
> > > --Suraj
> > >
> > > On Tue, Nov 16, 2010 at 2:46 PM, Michael Segel <
> > michael_se...@hotmail.com
> > > >wrote:
> > >
> > > >
> > > > Ok,
> > > > Silly question... why are you manually splitting your regions?
> > > >
> > > > I mean what do you hope to gain?
> > > >
> > > > -Mike
> > > >
> > > >
> > > > > Date: Tue, 16 Nov 2010 13:37:10 -0800
> > > > > Subject: Re: Confusing Region Split behavior in YCSB Tests
> > > > > From: svarma...@gmail.com
> > > > > To: user@hbase.apache.org
> > > > >
> > > > > Ok - so, I ran another series of tests on the region split behavior
> > > > against
> > > > > our 0.20.6 cluster. This time, as suggested, I ran the split table
> > > > followed
> > > > > by a flush table and finally a major_compact table before each of
> the
> > > > region
> > > > > splits. I split the original region to 2, 4, 8, 16 and 32 regions
> via
> > > the
> > > > > split command.
> > > > >
> > > > > Here are the results on pastebin - the clients are the same 25 JVMs
> > > each
> > > > > with 7 worker threads, 5 RS cluster as mentioned in my earlier
> email.
> > > > >
> > > > > http://pastebin.com/4D03hjEy
> > > > >
> > > > > The results were the same - the throughput drop and latency
> increases
> > > are
> > > > > substantial (graphing this on excel etc really brings this out) as
> > the
> > > > > region split from 1 region to 2 regions. After that as the region
> > > splits
> > > > > more, the numbers are hover around the same area. The 1-region
> > timings
> > > > stand
> > > > > way apart from the split-region timings.
> > > > >
> > > > > Thanks,
> > > > > --Suraj
> > > > >
> > > > > On Mon, Nov 15, 2010 at 11:06 AM, Suraj Varma <svarma...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hmm - no, I didn't do a major compact after the splits.
> > > > > >
> > > > > > The data size was fairly small - and I expected it to be loaded
> > into
> > > > block
> > > > > > cache (size set to 0.6) ...
> > > > > >
> > > > > > Let me run a re-test by doing major compact as well to see how it
> > > > behaves.
> > > > > > Thanks for the idea.
> > > > > > --Suraj
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 12, 2010 at 3:24 PM, Todd Lipcon <t...@cloudera.com>
> > > > wrote:
> > > > > >
> > > > > >> One thought:
> > > > > >>
> > > > > >> When you do splits, do you also major compact after the splits
> > > before
> > > > > >> starting the read test? Maybe it's possible that when you just
> > have
> > > > one
> > > > > >> region it's all local data, but as you split it, it's non-local.
> > > With
> > > > the
> > > > > >> small data sizes, it all fits in cache so it doesn't matter, but
> > > with
> > > > the
> > > > > >> big data size, you're doing non-local reads and they don't fit
> in
> > > > cache.
> > > > > >>
> > > > > >> -Todd
> > > > > >>
> > > > > >> On Fri, Nov 12, 2010 at 1:12 PM, Suraj Varma <
> svarma...@gmail.com
> > >
> > > > wrote:
> > > > > >>
> > > > > >> > 1) 100% random reads. Sorry - I should have mentioned that.
> > > > > >> > 2) Hmm ... I was using 0.1.2; I had to fix the hbase db client
> > to
> > > > work
> > > > > >> with
> > > > > >> > hbase-0.20.6 ..., so I had taken a git pull on Oct 18th. I see
> > > that
> > > > as
> > > > > >> of
> > > > > >> > Oct 26th, the README file has been updated with the 0.1.3
> update
> > > > (with
> > > > > >> the
> > > > > >> > 0.20.6 fixes as well).
> > > > > >> > 3) Ok, I'll try out 0.90-RC from trunk since that's what we
> > > planned
> > > > to
> > > > > >> use
> > > > > >> > next. I ran it against 0.20.6 since that's what our
> environment
> > > has
> > > > now.
> > > > > >> > 4) Yes - heap is default, 1000m. Good question though - I went
> > > back
> > > > and
> > > > > >> > looked at our environment and I'm sorry I made a mistake in my
> > > > previous
> > > > > >> > mail. The test environment boxes (running the RS/DN) have only
> > 4GB
> > > > RAM
> > > > > >> ...
> > > > > >> > I
> > > > > >> > confused it with another environment. Sorry! So - because the
> > > boxes
> > > > have
> > > > > >> > only 4GB RAM, we could only give it the defaults.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > --Suraj
> > > > > >> >
> > > > > >> >
> > > > > >> > On Fri, Nov 12, 2010 at 11:52 AM, Jean-Daniel Cryans <
> > > > > >> jdcry...@apache.org
> > > > > >> > >wrote:
> > > > > >> >
> > > > > >> > > Weird results indeed, nothing comes to mind at the moment
> that
> > > > could
> > > > > >> > > explain this behavior. I do have a few questions/comments:
> > > > > >> > >
> > > > > >> > >  - It's not clear to me from your email what kind of
> operation
> > > > your
> > > > > >> > > were doing. Was it 100% random reads? Or a mix of reads and
> > > > writes?
> > > > > >> > > Any scans?
> > > > > >> > >  - Which version of YCSB were you using? The latest one,
> 1.3,
> > > has
> > > > a
> > > > > >> > > fix for client-side performance in HBase (ycsb was
> > deserializing
> > > > the
> > > > > >> > > Results wrong)
> > > > > >> > >  - Woud you be able to give the latest of version of 0.89 a
> > try?
> > > > Or
> > > > > >> > > even maybe trunk, for which we are in the process of
> branching
> > > and
> > > > > >> > > cutting out an RC? 0.20.6 is now quite old and I'm not sure
> > how
> > > > useful
> > > > > >> > > it would be to try to optimize it.
> > > > > >> > >  - How much heap did you give to HBase? You said default
> > > > > >> > > configurations, so 1000m? Why not more?
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > J-D
> > > > > >> > >
> > > > > >> > > On Fri, Nov 12, 2010 at 10:52 AM, Suraj Varma <
> > > > svarma...@gmail.com>
> > > > > >> > wrote:
> > > > > >> > > > Hello All:
> > > > > >> > > > I'm using YCSB to execute tests on the effects of region
> > > splits
> > > > and
> > > > > >> I'm
> > > > > >> > > > seeing some confusing results.
> > > > > >> > > >
> > > > > >> > > > The aim of the experiment was to determine how much of an
> > > impact
> > > > on
> > > > > >> > > > throughput/average latency/95th-percentile latency is
> > observed
> > > > when
> > > > > >> for
> > > > > >> > > the
> > > > > >> > > > same load the regions are split and distributed across
> > region
> > > > > >> servers.
> > > > > >> > I
> > > > > >> > > > expected to see results that show the throughput increase
> > and
> > > > 95-th
> > > > > >> > > > percentile latency reduce (trending) as the regions are
> > > > distributed
> > > > > >> > > >
> > > > > >> > > > For the 1K row size, this worked as expected - the
> > throughput
> > > > > >> increased
> > > > > >> > > and
> > > > > >> > > > latency decreased as the regions were spread out.
> > > > > >> > > >
> > > > > >> > > > However, for the 100K and 500K row sizes (closer to what
> we
> > > plan
> > > > to
> > > > > >> > have
> > > > > >> > > in
> > > > > >> > > > our HBase table), the results were the opposite;
> > > > > >> > > > The throughput & latency results were best for the
> 1-region
> > > case
> > > > > >> (i.e.
> > > > > >> > > all
> > > > > >> > > > data fits in one region) and degraded quite a bit when the
> > > > region
> > > > > >> split
> > > > > >> > > and
> > > > > >> > > > got distributed to two region servers. As the region is
> > split
> > > > > >> further,
> > > > > >> > > there
> > > > > >> > > > is improvement over the first split case in the 500K run,
> > but
> > > > not as
> > > > > >> > much
> > > > > >> > > in
> > > > > >> > > > the 100K run - in either case ... the results never
> approach
> > > the
> > > > one
> > > > > >> > > region
> > > > > >> > > > case.
> > > > > >> > > >
> > > > > >> > > > During the 1-region runs, the CPU on the region server
> > hosting
> > > > the
> > > > > >> > table
> > > > > >> > > is
> > > > > >> > > > pegged at 88-90%, the load average hovers between 3.2-4.2
> > (1m)
> > > > ...
> > > > > >> as
> > > > > >> > the
> > > > > >> > > > tests proceeds the load average drops off to be closer to
> 3,
> > > > till it
> > > > > >> > > ends.
> > > > > >> > > > During the split region runs, the CPU on the region
> servers
> > > > keeps
> > > > > >> > > reducing
> > > > > >> > > > 52% (4 region split), 30% (8 region split)... 8% (for 32
> > > region
> > > > > >> split)
> > > > > >> > > while
> > > > > >> > > > the load was fairly low (around 0.2 - 0.5 on each box).
> > These
> > > > are
> > > > > >> rough
> > > > > >> > > > numbers just to give an idea of how the RS's were
> behaving.
> > > > > >> > > >
> > > > > >> > > > *Test Environment:
> > > > > >> > > > *Apache HBase 0.20.6 (+ HBASE-2599 patch), Apache Hadoop
> > > 0.20.2,
> > > > > >> > > > 5 RS/DN, 4-core, 8GB boxes (using default HBase
> > configurations
> > > > 1000m
> > > > > >> > JVM
> > > > > >> > > > heaps for RS/DN)
> > > > > >> > > >
> > > > > >> > > > *YCSB Clients:
> > > > > >> > > > *5 (4-core, 8GB) hosts, each with 5 YCSB client processes
> > > > (Xmx1024m)
> > > > > >> > and
> > > > > >> > > > each YCSB process with threadcount=7 (so, 25 client JVMs
> > with
> > > 7
> > > > > >> threads
> > > > > >> > > each
> > > > > >> > > > hitting 5 RS)
> > > > > >> > > >
> > > > > >> > > > *YCSB configuration:
> > > > > >> > > > *recordcount=<varying to fit the entire table in a single
> > > region
> > > > at
> > > > > >> the
> > > > > >> > > > start of the test> ... e.g. 400 * 500K rows, 2000 * 100K
> > rows,
> > > > > >> 250000 *
> > > > > >> > > 1K
> > > > > >> > > > rows
> > > > > >> > > > operationcount=250000
> > > > > >> > > > threadcount=7
> > > > > >> > > > requestdistribution=<tried first with zipfian and then
> with
> > > > uniform
> > > > > >> -
> > > > > >> > > same
> > > > > >> > > > results>; the results below are 1K, 100K = zipfian and
> 500K
> > =
> > > > > >> uniform
> > > > > >> > > > (though I had tried zipfian earlier with pretty much
> similar
> > > > trends)
> > > > > >> > > >
> > > > > >> > > > Rowsize => tried with 1K (10 cols of 100 bytes each), 100K
> > (10
> > > > cols
> > > > > >> > with
> > > > > >> > > 1K
> > > > > >> > > > bytes each), 500K (10 cols with 50K bytes each)
> > > > > >> > > >
> > > > > >> > > > *Test Method:
> > > > > >> > > > *1) Start with a HTable fitting in 1-region;
> > > > > >> > > > 2) Run the YCSB test;
> > > > > >> > > > 3) Split region using web ui;
> > > > > >> > > > 4) Run the same YCSB test;
> > > > > >> > > > 5) Keep repeating for 1, 4, 8, 16, 32 regions. Some splits
> > > were
> > > > > >> ignored
> > > > > >> > > if
> > > > > >> > > > the regions didn't get distributed across RS.
> > > > > >> > > > 6) Tried it with 1K row size, 100K row size, 500K row size
> > and
> > > > got
> > > > > >> > > similar
> > > > > >> > > > results.
> > > > > >> > > >
> > > > > >> > > > *Expected Result:
> > > > > >> > > > *Throughput increases as regions spread across region
> > server;
> > > > > >> Average
> > > > > >> > > > Latency decreases as well.
> > > > > >> > > >
> > > > > >> > > > *Actual Result:
> > > > > >> > > > *The 1K row size gives expected results
> > > > > >> > > > For 100K and 500K row sizes, the single region case gives
> > > > highest
> > > > > >> > > Throughput
> > > > > >> > > > (ops/sec) and lowest Average and 95-th percentile latency.
> > > > > >> > > >
> > > > > >> > > > Here's a snippet and I'm pastebin-ing a larger dataset to
> > keep
> > > > the
> > > > > >> mail
> > > > > >> > > > short.
> > > > > >> > > >
> > > > > >> > > > (e.g. 100K row size - 1 row, 10 fields each with 10K size)
> > > > > >> > > > Throughput (ops/sec); Row size = 100K
> > > > > >> > > > --------------------------------------
> > > > > >> > > > ycsbjvm, 1 REG, 8 REG,    32 REG
> > > > > >> > > > ycsbjvm1, 209.2164009,    129.5659024,    129.1324537
> > > > > >> > > > ycsbjvm2, 207.6238078,    130.4403405,    132.0112843
> > > > > >> > > > ycsbjvm3, 215.2983856,    129.5349652,    130.1714966
> > > > > >> > > > ycsbjvm4, 209.3266172,    128.5535964,    130.7984592
> > > > > >> > > > ycsbjvm5, 209.1666591,    128.8050626,    132.4272952
> > > > > >> > > >
> > > > > >> > > > (Throughput decreases from 1 region to 8 region, increases
> > > > slightly
> > > > > >> > from
> > > > > >> > > 8
> > > > > >> > > > reg to 32 reg)
> > > > > >> > > >
> > > > > >> > > > 95-th Percentile Latency (ms); Row size = 100K
> > > > > >> > > > ycsbjvm, 1 REG, 8 REG,    32 REG
> > > > > >> > > > ycsbjvm1, 218,    221,    240
> > > > > >> > > > ycsbjvm2, 219,    221,    229
> > > > > >> > > > ycsbjvm3, 217,    221,    235
> > > > > >> > > > ycsbjvm4, 219,    221,    236
> > > > > >> > > > ycsbjvm5, 219,    222,    229
> > > > > >> > > > ycsbjvm6, 219,    221,    228
> > > > > >> > > >
> > > > > >> > > > (95-th Percentile Latency increases from 1 region to 8
> > region
> > > to
> > > > 32
> > > > > >> > > region)
> > > > > >> > > >
> > > > > >> > > > *Paste Bin of full results from all client YCSB JVMs
> > > > > >> > > > *1K Row Size Run Results: http://pastebin.com/X2HXn99T
> > > > > >> > > > 100K Row Size Run Results: http://pastebin.com/wjQzux3c
> > > > > >> > > > 500K Row Size Run Results: http://pastebin.com/6Tz1GpPP
> > > > > >> > > >
> > > > > >> > > > Would like your ideas / inputs into how I can explain this
> > > > behavior;
> > > > > >> > let
> > > > > >> > > me
> > > > > >> > > > know if you want additional details.
> > > > > >> > > > Thanks in advance.
> > > > > >> > > > --Suraj
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Todd Lipcon
> > > > > >> Software Engineer, Cloudera
> > > > > >>
> > > > > >
> > > > > >
> > > >
> > > >
> > >
> >
>

Re: Confusing Region Split behavior in YCSB Tests

Reply via email to