Hello all, Thank you for the help. It was the network latency between datacenters that caused it; once I started forcing scans to be performed near the regionserver they are located within, the performance issues disappeared. I am surprised I did not see nearly as much variance within writes even though these were crossing datacenters as well.
On Sat, Mar 26, 2016 at 11:00 PM, <la...@apache.org> wrote: > Hi James, > > The low numbers are _way_ slow. Way lower than what we typically see. > I assume the region servers have spinning drives? How many?Can you report > metrics like IOWAIT. Also as Nicolas mentioned, can you report on the data > locality index on each region server? > > What's the networking between the machines? I assume 1ge? And all NICs are > good (none reports a lower link speed?) > > Is it always the same servers that are slow, or does it vary? > > Lastly, how big are the rows? > > Thanks. > > -- Lars > > From: James Johansville <james.johansvi...@gmail.com> > To: user@hbase.apache.org > Sent: Friday, March 25, 2016 12:23 PM > Subject: Re: Inconsistent scan performance > > Hello all, > > I have 13 RegionServers and presplit into 13 regions (which motivated my > comment that I aligned my queries with the regionservers, which obviously > isn't accurate). I have been testing using a multiple of 13 for partitioned > scans. > > Here are my current region setup -- I converted the row identifier to a > logical identifier known to my code for easier read access. There are 26 > regions split roughly evenly over a logical range of 0-996. > > Here are region boundaries. The splits aren't perfect, so I assume for a 26 > partition scan, each scan is going to read most of one region and a tiny > bit of another region (maybe three). I have given each server an index and > marked its datacenter with an index as well -- so the first three regions > are on server #9, and so on. > > 0 region range=[EMPTY]~id:38 server index=9 dc=1 > 1 region range=id:38~id:77 server index=9 dc=1 > 2 region range=id:77~id:115 server index=9 dc=1 > 3 region range=id:115~id:154 server index=10 dc=3 > 4 region range=id:154~id:192 server index=3 dc=2 > 5 region range=id:192~id:231 server index=10 dc=3 > 6 region range=id:231~id:269 server index=10 dc=3 > 7 region range=id:269~id:308 server index=10 dc=3 > 8 region range=id:308~id:346 server index=1 dc=1 > 9 region range=id:346~id:385 server index=4 dc=2 > 10 region range=id:385~id:423 server index=2 dc=2 > 11 region range=id:423~id:462 server index=5 dc=3 > 12 region range=id:462~id:500 server index=11 dc=1 > 13 region range=id:500~id:539 server index=10 dc=3 > 14 region range=id:539~id:577 server index=11 dc=1 > 15 region range=id:577~id:616 server index=11 dc=1 > 16 region range=id:616~id:654 server index=6 dc=2 > 17 region range=id:654~id:693 server index=12 dc=3 > 18 region range=id:693~id:730 server index=8 dc=1 > 19 region range=id:730~id:769 server index=4 dc=2 > 20 region range=id:769~id:806 server index=12 dc=3 > 21 region range=id:806~id:845 server index=11 dc=1 > 22 region range=id:845~id:883 server index=9 dc=1 > 23 region range=id:883~id:921 server index=12 dc=3 > 24 region range=id:921~id:958 server index=0 dc=3 > 25 region range=id:958:[EMPTY] server index=12 dc=3 > > > Here are my scan ranges -- the end isn't inclusive. Each scan range > corresponds to a particular partition number in sequence. > 0 scan range=id:0~id:39 > 1 scan range=id:39~id:78 > 2 scan range=id:78~id:117 > 3 scan range=id:117~id:156 > 4 scan range=id:156~id:195 > 5 scan range=id:195~id:234 > 6 scan range=id:234~id:273 > 7 scan range=id:273~id:312 > 8 scan range=id:312~id:351 > 9 scan range=id:351~id:389 > 10 scan range=id:389~id:427 > 11 scan range=id:427~id:465 > 12 scan range=id:465~id:503 > 13 scan range=id:503~id:541 > 14 scan range=id:541~id:579 > 15 scan range=id:579~id:617 > 16 scan range=id:617~id:655 > 17 scan range=id:655~id:693 > 18 scan range=id:693~id:731 > 19 scan range=id:731~id:769 > 20 scan range=id:769~id:807 > 21 scan range=id:807~id:845 > 22 scan range=id:845~id:883 > 23 scan range=id:883~id:921 > 24 scan range=id:921~id:959 > 25 scan range=id:959~id:997 > > I ran the utility twice on 26-range. > total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:433.525 > ops/sec:4095.191742114065 > total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:427.884 > ops/sec:4147.028166512419 > total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:674.357 > ops/sec:2631.773674774637 > total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:376.375 > ops/sec:4720.573895715709 > total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:47.484 > ops/sec:37431.74543003959 > total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.443 > ops/sec:45113.50556499252 > total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:407.076 > ops/sec:4356.2184948265185 > total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:436.169 > ops/sec:4066.749356327479 > total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:617.88 > ops/sec:2869.99255518871 > total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:27.348 > ops/sec:63163.0832236361 > total # partitions:26; partition_number:10; rows:1734960 elapsed_sec:64.383 > ops/sec:26947.48613764503 > total # partitions:26; partition_number:11; rows:1729728 > elapsed_sec:381.221 ops/sec:4537.336610522505 > total # partitions:26; partition_number:12; rows:1732999 elapsed_sec:59.049 > ops/sec:29348.49023692188 > total # partitions:26; partition_number:13; rows:1731442 > elapsed_sec:369.975 ops/sec:4679.889181701466 > total # partitions:26; partition_number:14; rows:1732026 > elapsed_sec:698.667 ops/sec:2479.0436645784043 > total # partitions:26; partition_number:15; rows:1727737 > elapsed_sec:398.964 ops/sec:4330.558646895459 > total # partitions:26; partition_number:16; rows:1728357 > elapsed_sec:695.636 ops/sec:2484.570953774675 > total # partitions:26; partition_number:17; rows:1733250 > elapsed_sec:392.337 ops/sec:4417.758202769558 > total # partitions:26; partition_number:18; rows:1742667 > elapsed_sec:410.238 ops/sec:4247.94143887207 > total # partitions:26; partition_number:19; rows:1742777 > elapsed_sec:374.509 ops/sec:4653.498313792192 > total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:44.447 > ops/sec:39204.445744369696 > total # partitions:26; partition_number:21; rows:1742248 > elapsed_sec:404.692 ops/sec:4305.120931473812 > total # partitions:26; partition_number:22; rows:1743288 > elapsed_sec:420.886 ops/sec:4141.948175990648 > total # partitions:26; partition_number:23; rows:1741106 > elapsed_sec:374.782 ops/sec:4645.650004535971 > total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.059 > ops/sec:51047.71132446637 > total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.873 > ops/sec:44671.67442698017 > > total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:687.482 > ops/sec:2582.4210670243006 > total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:687.132 > ops/sec:2582.3961043875124 > total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:57.592 > ops/sec:30815.998749826365 > total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:369.096 > ops/sec:4813.6690725448125 > total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:49.486 > ops/sec:35917.41098492503 > total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.008 > ops/sec:45616.59146841673 > total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:405.897 > ops/sec:4368.871905926873 > total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:368.597 > ops/sec:4812.274652262498 > total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:401.001 > ops/sec:4422.210917179758ex > total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:381.702 > ops/sec:4525.477990683832 > total # partitions:26; partition_number:10; rows:1734960 > elapsed_sec:659.132 ops/sec:2632.1890000788917 > total # partitions:26; partition_number:11; rows:1729728 elapsed_sec:65.535 > ops/sec:26393.95742732891 > total # partitions:26; partition_number:12; rows:1732999 > elapsed_sec:367.694 ops/sec:4713.15550430521 > total # partitions:26; partition_number:13; rows:1731442 > elapsed_sec:394.045 ops/sec:4394.020987450672 > total # partitions:26; partition_number:14; rows:1732026 > elapsed_sec:692.832 ops/sec:2499.9220590272967 > total # partitions:26; partition_number:15; rows:1727737 > elapsed_sec:637.833 ops/sec:2708.760757126082 > total # partitions:26; partition_number:16; rows:1728357 elapsed_sec:35.41 > ops/sec:48809.85597288902 > total # partitions:26; partition_number:17; rows:1733250 > elapsed_sec:348.711 ops/sec:4970.448308197906 > total # partitions:26; partition_number:18; rows:1742667 > elapsed_sec:747.852 ops/sec:2330.2297780844337 > total # partitions:26; partition_number:19; rows:1742777 elapsed_sec:21.719 > ops/sec:80242.04613472075 > total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:41.713 > ops/sec:41774.02728166279 > total # partitions:26; partition_number:21; rows:1742248 > elapsed_sec:398.595 ops/sec:4370.97304281288 > total # partitions:26; partition_number:22; rows:1743288 > elapsed_sec:416.296 ops/sec:4187.61650364164 > total # partitions:26; partition_number:23; rows:1741106 elapsed_sec:376.67 > ops/sec:4622.364403854833 > total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.025 > ops/sec:51098.72152828803 > total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.673 > ops/sec:44902.69697204768 > > I can't make rhyme or reason of it. There seems to be an imperfect > region-to-region correspondence, but some queries that could cross multiple > datacenters in theory appear fast. Some scans that are restricted to only > one regionserver are fast, some are slow. Anyone with more experience who > might have insight? > > > On Fri, Mar 25, 2016 at 9:20 AM, Stack <st...@duboce.net> wrote: > > > On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > James: > > > Another experiment you can do is to enable region replica - > HBASE-10070. > > > > > > This would bring down the read variance greatly. > > > > > > > > Suggest you NOT do this James. > > > > Lets figure your issue as-is rather than compound by adding yet more > moving > > parts. > > > > St.Ack > > > > > > > > > > > > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nkey...@gmail.com> > > wrote: > > > > > > > > The read path is much more complex than the write one, so the > response > > > time > > > > has much more variance. > > > > The gap is so wide here that I would bet on Ted's or Stack's points, > > but > > > > here are a few other sources of variance: > > > > - hbase cache: as Anoop said, may be the data is already in the hbase > > > cache > > > > (setCacheBlocks(false), means "don't add blocks to the cache", not > > "don't > > > > use the cache") > > > > - OS cache: and if the data is not in HBase cache may be it is in the > > > > operating system cache (for example if you run the test multiple > times) > > > > - data locality: if you're lucky the data is local to the region > > server. > > > If > > > > you're not, the reads need an extra network hoop. > > > > - number of store: more hfiles/stores per region => slower reads. > > > > - number of versions and so on: sub case of the previous one: if the > > rows > > > > have been updated multiple times and the compaction has not ran yet, > > you > > > > will read much more data. > > > > - (another subcase): the data has not been flushed yet and is > available > > > in > > > > the memstore => fast read. > > > > > > > > None of these points has any importance for the the write path. > > Basically > > > > the writes variance says nothing about the variance you will get on > the > > > > reads. > > > > > > > > IIRC, locality and number of stores are visible in HBase UI. Doing a > > > table > > > > flush and then running a major compaction generally helps to > stabilize > > > > response time when you do a test. But it should not explain the x25 > > > you're > > > > seeing, there is something else somewhere else. I don't get the > > > > regionserver boundaries you're mentioning: there is no boundary > between > > > > regionservers. A regionserver can host A->D and M->S while another > > hosts > > > > D->M and S->Z for example. > > > > > > > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <anoop.hb...@gmail.com> > > > wrote: > > > >> > > > >> I see you set cacheBlocks to be false on the Scan. By any chance on > > > >> some other RS(s), the data you are looking for is already in cache? > > > >> (Any previous scan or by cache on write) And there are no > concurrent > > > >> writes any way right? This much difference in time ! One > > > >> possibility is blocks avail or not avail in cache.. > > > >> > > > >> -Anoop- > > > >> > > > >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote: > > > >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville < > > > >>> james.johansvi...@gmail.com> wrote: > > > >>> > > > >>>> Hello all, > > > >>>> > > > >>>> So, I wrote a Java application for HBase that does a partitioned > > > >> full-table > > > >>>> scan according to a set number of partitions. For example, if > there > > > are > > > >> 20 > > > >>>> partitions specified, then 20 separate full scans are launched > that > > > >> cover > > > >>>> an equal slice of the row identifier range. > > > >>>> > > > >>>> The rows are uniformly distributed throughout the RegionServers. > > > >>> > > > >>> > > > >>> How many RegionServers? How many Regions? Are Regions evenly > > > distributed > > > >>> across the servers? If you put all partitions on one machine and > then > > > run > > > >>> your client, do the timings even out? > > > >>> > > > >>> The disparity seems really wide. > > > >>> > > > >>> St.Ack > > > >>> > > > >>> > > > >>> > > > >>> > > > >>>> I > > > >>>> confirmed this through the hbase shell. I have only one column > > family, > > > >> and > > > >>>> each row has the same number of column qualifiers. > > > >>>> > > > >>>> My problem is that the individual scan performance is wildly > > > >> inconsistent > > > >>>> even though they fetch approximately a similar number of rows. > This > > > >>>> inconsistency appears to be random with respect to hosts or > > > >> regionservers > > > >>>> or partitions or CPU cores. I am the only user of the fleet and > not > > > >> running > > > >>>> any other concurrent HBase operation. > > > >>>> > > > >>>> I started measuring from the beginning of the scan and stopped > > > measuring > > > >>>> after the scan was completed. I am not doing any logic with the > > > results, > > > >>>> just scanning them. > > > >>>> > > > >>>> For ~230K rows fetched per scan, I am getting anywhere from 4 > > seconds > > > to > > > >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone > > have > > > >> any > > > >>>> insight? By comparison, a similar utility I wrote to upsert to > > > >>>> regionservers was very consistent in ops/sec and I had no issues > > with > > > >> it. > > > >>>> > > > >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB > > > heap, I > > > >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example > of > > > log > > > >>>> output I saved that used 130 partitions. > > > >>>> > > > >>>> total # partitions:130; partition id:47; rows:232730 > > elapsed_sec:6.401 > > > >>>> ops/sec:36358.38150289017 > > > >>>> total # partitions:130; partition id:100; rows:206890 > > > elapsed_sec:6.636 > > > >>>> ops/sec:31176.91380349608 > > > >>>> total # partitions:130; partition id:63; rows:233437 > > elapsed_sec:7.586 > > > >>>> ops/sec:30772.08014764039 > > > >>>> total # partitions:130; partition id:9; rows:232585 > > elapsed_sec:32.985 > > > >>>> ops/sec:7051.235410034865 > > > >>>> total # partitions:130; partition id:19; rows:234192 > > > elapsed_sec:38.733 > > > >>>> ops/sec:6046.3170939508955 > > > >>>> total # partitions:130; partition id:1; rows:232860 > > elapsed_sec:48.479 > > > >>>> ops/sec:4803.316900101075 > > > >>>> total # partitions:130; partition id:125; rows:205334 > > > elapsed_sec:41.911 > > > >>>> ops/sec:4899.286583474505 > > > >>>> total # partitions:130; partition id:123; rows:206622 > > > elapsed_sec:42.281 > > > >>>> ops/sec:4886.875901705258 > > > >>>> total # partitions:130; partition id:54; rows:232811 > > > elapsed_sec:49.083 > > > >>>> ops/sec:4743.210480206996 > > > >>>> > > > >>>> I use setCacheBlocks(false), setCaching(5000). Does anyone have > any > > > >>>> insight into how I can make the read performance more consistent? > > > >>>> > > > >>>> Thanks! > > > >> > > > > > > > > >