Re: Inconsistent scan performance

James Johansville Fri, 25 Mar 2016 12:24:05 -0700

Hello all,

I have 13 RegionServers and presplit into 13 regions (which motivated my
comment that I aligned my queries with the regionservers, which obviously
isn't accurate). I have been testing using a multiple of 13 for partitioned
scans.


Here are my current region setup -- I converted the row identifier to a
logical identifier known to my code for easier read access. There are 26
regions split roughly evenly over a logical range of 0-996.

Here are region boundaries. The splits aren't perfect, so I assume for a 26
partition scan, each scan is going to read most of one region and a tiny
bit of another region (maybe three). I have given each server an index and
marked its datacenter with an index as well -- so the first three regions
are on server #9, and so on.

0 region range=[EMPTY]~id:38 server index=9 dc=1
1 region range=id:38~id:77 server index=9 dc=1
2 region range=id:77~id:115 server index=9 dc=1
3 region range=id:115~id:154 server index=10 dc=3
4 region range=id:154~id:192 server index=3 dc=2
5 region range=id:192~id:231 server index=10 dc=3
6 region range=id:231~id:269 server index=10 dc=3
7 region range=id:269~id:308 server index=10 dc=3
8 region range=id:308~id:346 server index=1 dc=1
9 region range=id:346~id:385 server index=4 dc=2
10 region range=id:385~id:423 server index=2 dc=2
11 region range=id:423~id:462 server index=5 dc=3
12 region range=id:462~id:500 server index=11 dc=1
13 region range=id:500~id:539 server index=10 dc=3
14 region range=id:539~id:577 server index=11 dc=1
15 region range=id:577~id:616 server index=11 dc=1
16 region range=id:616~id:654 server index=6 dc=2
17 region range=id:654~id:693 server index=12 dc=3
18 region range=id:693~id:730 server index=8 dc=1
19 region range=id:730~id:769 server index=4 dc=2
20 region range=id:769~id:806 server index=12 dc=3
21 region range=id:806~id:845 server index=11 dc=1
22 region range=id:845~id:883 server index=9 dc=1
23 region range=id:883~id:921 server index=12 dc=3
24 region range=id:921~id:958 server index=0 dc=3
25 region range=id:958:[EMPTY] server index=12 dc=3


Here are my scan ranges -- the end isn't inclusive. Each scan range
corresponds to a particular partition number in sequence.
0 scan range=id:0~id:39
1 scan range=id:39~id:78
2 scan range=id:78~id:117
3 scan range=id:117~id:156
4 scan range=id:156~id:195
5 scan range=id:195~id:234
6 scan range=id:234~id:273
7 scan range=id:273~id:312
8 scan range=id:312~id:351
9 scan range=id:351~id:389
10 scan range=id:389~id:427
11 scan range=id:427~id:465
12 scan range=id:465~id:503
13 scan range=id:503~id:541
14 scan range=id:541~id:579
15 scan range=id:579~id:617
16 scan range=id:617~id:655
17 scan range=id:655~id:693
18 scan range=id:693~id:731
19 scan range=id:731~id:769
20 scan range=id:769~id:807
21 scan range=id:807~id:845
22 scan range=id:845~id:883
23 scan range=id:883~id:921
24 scan range=id:921~id:959
25 scan range=id:959~id:997

I ran the utility twice on 26-range.
total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:433.525
ops/sec:4095.191742114065
total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:427.884
ops/sec:4147.028166512419
total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:674.357
ops/sec:2631.773674774637
total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:376.375
ops/sec:4720.573895715709
total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:47.484
ops/sec:37431.74543003959
total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.443
ops/sec:45113.50556499252
total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:407.076
ops/sec:4356.2184948265185
total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:436.169
ops/sec:4066.749356327479
total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:617.88
ops/sec:2869.99255518871
total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:27.348
ops/sec:63163.0832236361
total # partitions:26; partition_number:10; rows:1734960 elapsed_sec:64.383
ops/sec:26947.48613764503
total # partitions:26; partition_number:11; rows:1729728
elapsed_sec:381.221 ops/sec:4537.336610522505
total # partitions:26; partition_number:12; rows:1732999 elapsed_sec:59.049
ops/sec:29348.49023692188
total # partitions:26; partition_number:13; rows:1731442
elapsed_sec:369.975 ops/sec:4679.889181701466
total # partitions:26; partition_number:14; rows:1732026
elapsed_sec:698.667 ops/sec:2479.0436645784043
total # partitions:26; partition_number:15; rows:1727737
elapsed_sec:398.964 ops/sec:4330.558646895459
total # partitions:26; partition_number:16; rows:1728357
elapsed_sec:695.636 ops/sec:2484.570953774675
total # partitions:26; partition_number:17; rows:1733250
elapsed_sec:392.337 ops/sec:4417.758202769558
total # partitions:26; partition_number:18; rows:1742667
elapsed_sec:410.238 ops/sec:4247.94143887207
total # partitions:26; partition_number:19; rows:1742777
elapsed_sec:374.509 ops/sec:4653.498313792192
total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:44.447
ops/sec:39204.445744369696
total # partitions:26; partition_number:21; rows:1742248
elapsed_sec:404.692 ops/sec:4305.120931473812
total # partitions:26; partition_number:22; rows:1743288
elapsed_sec:420.886 ops/sec:4141.948175990648
total # partitions:26; partition_number:23; rows:1741106
elapsed_sec:374.782 ops/sec:4645.650004535971
total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.059
ops/sec:51047.71132446637
total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.873
ops/sec:44671.67442698017

total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:687.482
ops/sec:2582.4210670243006
total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:687.132
ops/sec:2582.3961043875124
total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:57.592
ops/sec:30815.998749826365
total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:369.096
ops/sec:4813.6690725448125
total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:49.486
ops/sec:35917.41098492503
total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.008
ops/sec:45616.59146841673
total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:405.897
ops/sec:4368.871905926873
total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:368.597
ops/sec:4812.274652262498
total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:401.001
ops/sec:4422.210917179758ex
total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:381.702
ops/sec:4525.477990683832
total # partitions:26; partition_number:10; rows:1734960
elapsed_sec:659.132 ops/sec:2632.1890000788917
total # partitions:26; partition_number:11; rows:1729728 elapsed_sec:65.535
ops/sec:26393.95742732891
total # partitions:26; partition_number:12; rows:1732999
elapsed_sec:367.694 ops/sec:4713.15550430521
total # partitions:26; partition_number:13; rows:1731442
elapsed_sec:394.045 ops/sec:4394.020987450672
total # partitions:26; partition_number:14; rows:1732026
elapsed_sec:692.832 ops/sec:2499.9220590272967
total # partitions:26; partition_number:15; rows:1727737
elapsed_sec:637.833 ops/sec:2708.760757126082
total # partitions:26; partition_number:16; rows:1728357 elapsed_sec:35.41
ops/sec:48809.85597288902
total # partitions:26; partition_number:17; rows:1733250
elapsed_sec:348.711 ops/sec:4970.448308197906
total # partitions:26; partition_number:18; rows:1742667
elapsed_sec:747.852 ops/sec:2330.2297780844337
total # partitions:26; partition_number:19; rows:1742777 elapsed_sec:21.719
ops/sec:80242.04613472075
total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:41.713
ops/sec:41774.02728166279
total # partitions:26; partition_number:21; rows:1742248
elapsed_sec:398.595 ops/sec:4370.97304281288
total # partitions:26; partition_number:22; rows:1743288
elapsed_sec:416.296 ops/sec:4187.61650364164
total # partitions:26; partition_number:23; rows:1741106 elapsed_sec:376.67
ops/sec:4622.364403854833
total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.025
ops/sec:51098.72152828803
total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.673
ops/sec:44902.69697204768

I can't make rhyme or reason of it. There seems to be an imperfect
region-to-region correspondence, but some queries that could cross multiple
datacenters in theory appear fast. Some scans that are restricted to only
one regionserver are fast, some are slow. Anyone with more experience who
might have insight?


On Fri, Mar 25, 2016 at 9:20 AM, Stack <st...@duboce.net> wrote:

> On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > James:
> > Another experiment you can do is to enable region replica - HBASE-10070.
> >
> > This would bring down the read variance greatly.
> >
> >
> Suggest you NOT do this James.
>
> Lets figure your issue as-is rather than compound by adding yet more moving
> parts.
>
> St.Ack
>
>
>
>
> > > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nkey...@gmail.com>
> wrote:
> > >
> > > The read path is much more complex than the write one, so the response
> > time
> > > has much more variance.
> > > The gap is so wide here that I would bet on Ted's or Stack's points,
> but
> > > here are a few other sources of variance:
> > > - hbase cache: as Anoop said, may be the data is already in the hbase
> > cache
> > > (setCacheBlocks(false), means "don't add blocks to the cache", not
> "don't
> > > use the cache")
> > > - OS cache: and if the data is not in HBase cache may be it is in the
> > > operating system cache (for example if you run the test multiple times)
> > > - data locality: if you're lucky the data is local to the region
> server.
> > If
> > > you're not, the reads need an extra network hoop.
> > > - number of store: more hfiles/stores per region => slower reads.
> > > - number of versions and so on: sub case of the previous one: if the
> rows
> > > have been updated multiple times and the compaction has not ran yet,
> you
> > > will read much more data.
> > > - (another subcase): the data has not been flushed yet and is available
> > in
> > > the memstore => fast read.
> > >
> > > None of these points has any importance for the the write path.
> Basically
> > > the writes variance says nothing about the variance you will get on the
> > > reads.
> > >
> > > IIRC, locality and number of stores are visible in HBase UI. Doing a
> > table
> > > flush and then running a major compaction generally helps to stabilize
> > > response time when you do a test. But it should not explain the x25
> > you're
> > > seeing, there is something else somewhere else. I don't get the
> > > regionserver boundaries you're mentioning: there is no boundary between
> > > regionservers. A regionserver can host A->D and M->S while another
> hosts
> > > D->M and S->Z for example.
> > >
> > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <anoop.hb...@gmail.com>
> > wrote:
> > >>
> > >> I see you set cacheBlocks to be false on the Scan.   By any chance on
> > >> some other RS(s), the data you are looking for is already in cache?
> > >> (Any previous scan or  by cache on write)  And there are no concurrent
> > >> writes any way right?   This much difference in time !  One
> > >> possibility is blocks avail or not avail in cache..
> > >>
> > >> -Anoop-
> > >>
> > >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
> > >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > >>> james.johansvi...@gmail.com> wrote:
> > >>>
> > >>>> Hello all,
> > >>>>
> > >>>> So, I wrote a Java application for HBase that does a partitioned
> > >> full-table
> > >>>> scan according to a set number of partitions. For example, if there
> > are
> > >> 20
> > >>>> partitions specified, then 20 separate full scans are launched that
> > >> cover
> > >>>> an equal slice of the row identifier range.
> > >>>>
> > >>>> The rows are uniformly distributed throughout the RegionServers.
> > >>>
> > >>>
> > >>> How many RegionServers? How many Regions? Are Regions evenly
> > distributed
> > >>> across the servers? If you put all partitions on one machine and then
> > run
> > >>> your client, do the timings even out?
> > >>>
> > >>> The disparity seems really wide.
> > >>>
> > >>> St.Ack
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>> I
> > >>>> confirmed this through the hbase shell. I have only one column
> family,
> > >> and
> > >>>> each row has the same number of column qualifiers.
> > >>>>
> > >>>> My problem is that the individual scan performance is wildly
> > >> inconsistent
> > >>>> even though they fetch approximately a similar number of rows. This
> > >>>> inconsistency appears to be random with respect to hosts or
> > >> regionservers
> > >>>> or partitions or CPU cores. I am the only user of the fleet and not
> > >> running
> > >>>> any other concurrent HBase operation.
> > >>>>
> > >>>> I started measuring from the beginning of the scan and stopped
> > measuring
> > >>>> after the scan was completed. I am not doing any logic with the
> > results,
> > >>>> just scanning them.
> > >>>>
> > >>>> For ~230K rows fetched per scan, I am getting anywhere from 4
> seconds
> > to
> > >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone
> have
> > >> any
> > >>>> insight? By comparison, a similar utility I wrote to upsert to
> > >>>> regionservers was very consistent in ops/sec and I had no issues
> with
> > >> it.
> > >>>>
> > >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB
> > heap, I
> > >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of
> > log
> > >>>> output I saved that used 130 partitions.
> > >>>>
> > >>>> total # partitions:130; partition id:47; rows:232730
> elapsed_sec:6.401
> > >>>> ops/sec:36358.38150289017
> > >>>> total # partitions:130; partition id:100; rows:206890
> > elapsed_sec:6.636
> > >>>> ops/sec:31176.91380349608
> > >>>> total # partitions:130; partition id:63; rows:233437
> elapsed_sec:7.586
> > >>>> ops/sec:30772.08014764039
> > >>>> total # partitions:130; partition id:9; rows:232585
> elapsed_sec:32.985
> > >>>> ops/sec:7051.235410034865
> > >>>> total # partitions:130; partition id:19; rows:234192
> > elapsed_sec:38.733
> > >>>> ops/sec:6046.3170939508955
> > >>>> total # partitions:130; partition id:1; rows:232860
> elapsed_sec:48.479
> > >>>> ops/sec:4803.316900101075
> > >>>> total # partitions:130; partition id:125; rows:205334
> > elapsed_sec:41.911
> > >>>> ops/sec:4899.286583474505
> > >>>> total # partitions:130; partition id:123; rows:206622
> > elapsed_sec:42.281
> > >>>> ops/sec:4886.875901705258
> > >>>> total # partitions:130; partition id:54; rows:232811
> > elapsed_sec:49.083
> > >>>> ops/sec:4743.210480206996
> > >>>>
> > >>>> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> > >>>> insight into how I can make the read performance more consistent?
> > >>>>
> > >>>> Thanks!
> > >>
> >
>

Re: Inconsistent scan performance

Reply via email to