Hello all, I have 13 RegionServers and presplit into 13 regions (which motivated my comment that I aligned my queries with the regionservers, which obviously isn't accurate). I have been testing using a multiple of 13 for partitioned scans.
Here are my current region setup -- I converted the row identifier to a logical identifier known to my code for easier read access. There are 26 regions split roughly evenly over a logical range of 0-996. Here are region boundaries. The splits aren't perfect, so I assume for a 26 partition scan, each scan is going to read most of one region and a tiny bit of another region (maybe three). I have given each server an index and marked its datacenter with an index as well -- so the first three regions are on server #9, and so on. 0 region range=[EMPTY]~id:38 server index=9 dc=1 1 region range=id:38~id:77 server index=9 dc=1 2 region range=id:77~id:115 server index=9 dc=1 3 region range=id:115~id:154 server index=10 dc=3 4 region range=id:154~id:192 server index=3 dc=2 5 region range=id:192~id:231 server index=10 dc=3 6 region range=id:231~id:269 server index=10 dc=3 7 region range=id:269~id:308 server index=10 dc=3 8 region range=id:308~id:346 server index=1 dc=1 9 region range=id:346~id:385 server index=4 dc=2 10 region range=id:385~id:423 server index=2 dc=2 11 region range=id:423~id:462 server index=5 dc=3 12 region range=id:462~id:500 server index=11 dc=1 13 region range=id:500~id:539 server index=10 dc=3 14 region range=id:539~id:577 server index=11 dc=1 15 region range=id:577~id:616 server index=11 dc=1 16 region range=id:616~id:654 server index=6 dc=2 17 region range=id:654~id:693 server index=12 dc=3 18 region range=id:693~id:730 server index=8 dc=1 19 region range=id:730~id:769 server index=4 dc=2 20 region range=id:769~id:806 server index=12 dc=3 21 region range=id:806~id:845 server index=11 dc=1 22 region range=id:845~id:883 server index=9 dc=1 23 region range=id:883~id:921 server index=12 dc=3 24 region range=id:921~id:958 server index=0 dc=3 25 region range=id:958:[EMPTY] server index=12 dc=3 Here are my scan ranges -- the end isn't inclusive. Each scan range corresponds to a particular partition number in sequence. 0 scan range=id:0~id:39 1 scan range=id:39~id:78 2 scan range=id:78~id:117 3 scan range=id:117~id:156 4 scan range=id:156~id:195 5 scan range=id:195~id:234 6 scan range=id:234~id:273 7 scan range=id:273~id:312 8 scan range=id:312~id:351 9 scan range=id:351~id:389 10 scan range=id:389~id:427 11 scan range=id:427~id:465 12 scan range=id:465~id:503 13 scan range=id:503~id:541 14 scan range=id:541~id:579 15 scan range=id:579~id:617 16 scan range=id:617~id:655 17 scan range=id:655~id:693 18 scan range=id:693~id:731 19 scan range=id:731~id:769 20 scan range=id:769~id:807 21 scan range=id:807~id:845 22 scan range=id:845~id:883 23 scan range=id:883~id:921 24 scan range=id:921~id:959 25 scan range=id:959~id:997 I ran the utility twice on 26-range. total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:433.525 ops/sec:4095.191742114065 total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:427.884 ops/sec:4147.028166512419 total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:674.357 ops/sec:2631.773674774637 total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:376.375 ops/sec:4720.573895715709 total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:47.484 ops/sec:37431.74543003959 total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.443 ops/sec:45113.50556499252 total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:407.076 ops/sec:4356.2184948265185 total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:436.169 ops/sec:4066.749356327479 total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:617.88 ops/sec:2869.99255518871 total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:27.348 ops/sec:63163.0832236361 total # partitions:26; partition_number:10; rows:1734960 elapsed_sec:64.383 ops/sec:26947.48613764503 total # partitions:26; partition_number:11; rows:1729728 elapsed_sec:381.221 ops/sec:4537.336610522505 total # partitions:26; partition_number:12; rows:1732999 elapsed_sec:59.049 ops/sec:29348.49023692188 total # partitions:26; partition_number:13; rows:1731442 elapsed_sec:369.975 ops/sec:4679.889181701466 total # partitions:26; partition_number:14; rows:1732026 elapsed_sec:698.667 ops/sec:2479.0436645784043 total # partitions:26; partition_number:15; rows:1727737 elapsed_sec:398.964 ops/sec:4330.558646895459 total # partitions:26; partition_number:16; rows:1728357 elapsed_sec:695.636 ops/sec:2484.570953774675 total # partitions:26; partition_number:17; rows:1733250 elapsed_sec:392.337 ops/sec:4417.758202769558 total # partitions:26; partition_number:18; rows:1742667 elapsed_sec:410.238 ops/sec:4247.94143887207 total # partitions:26; partition_number:19; rows:1742777 elapsed_sec:374.509 ops/sec:4653.498313792192 total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:44.447 ops/sec:39204.445744369696 total # partitions:26; partition_number:21; rows:1742248 elapsed_sec:404.692 ops/sec:4305.120931473812 total # partitions:26; partition_number:22; rows:1743288 elapsed_sec:420.886 ops/sec:4141.948175990648 total # partitions:26; partition_number:23; rows:1741106 elapsed_sec:374.782 ops/sec:4645.650004535971 total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.059 ops/sec:51047.71132446637 total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.873 ops/sec:44671.67442698017 total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:687.482 ops/sec:2582.4210670243006 total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:687.132 ops/sec:2582.3961043875124 total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:57.592 ops/sec:30815.998749826365 total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:369.096 ops/sec:4813.6690725448125 total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:49.486 ops/sec:35917.41098492503 total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.008 ops/sec:45616.59146841673 total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:405.897 ops/sec:4368.871905926873 total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:368.597 ops/sec:4812.274652262498 total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:401.001 ops/sec:4422.210917179758ex total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:381.702 ops/sec:4525.477990683832 total # partitions:26; partition_number:10; rows:1734960 elapsed_sec:659.132 ops/sec:2632.1890000788917 total # partitions:26; partition_number:11; rows:1729728 elapsed_sec:65.535 ops/sec:26393.95742732891 total # partitions:26; partition_number:12; rows:1732999 elapsed_sec:367.694 ops/sec:4713.15550430521 total # partitions:26; partition_number:13; rows:1731442 elapsed_sec:394.045 ops/sec:4394.020987450672 total # partitions:26; partition_number:14; rows:1732026 elapsed_sec:692.832 ops/sec:2499.9220590272967 total # partitions:26; partition_number:15; rows:1727737 elapsed_sec:637.833 ops/sec:2708.760757126082 total # partitions:26; partition_number:16; rows:1728357 elapsed_sec:35.41 ops/sec:48809.85597288902 total # partitions:26; partition_number:17; rows:1733250 elapsed_sec:348.711 ops/sec:4970.448308197906 total # partitions:26; partition_number:18; rows:1742667 elapsed_sec:747.852 ops/sec:2330.2297780844337 total # partitions:26; partition_number:19; rows:1742777 elapsed_sec:21.719 ops/sec:80242.04613472075 total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:41.713 ops/sec:41774.02728166279 total # partitions:26; partition_number:21; rows:1742248 elapsed_sec:398.595 ops/sec:4370.97304281288 total # partitions:26; partition_number:22; rows:1743288 elapsed_sec:416.296 ops/sec:4187.61650364164 total # partitions:26; partition_number:23; rows:1741106 elapsed_sec:376.67 ops/sec:4622.364403854833 total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.025 ops/sec:51098.72152828803 total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.673 ops/sec:44902.69697204768 I can't make rhyme or reason of it. There seems to be an imperfect region-to-region correspondence, but some queries that could cross multiple datacenters in theory appear fast. Some scans that are restricted to only one regionserver are fast, some are slow. Anyone with more experience who might have insight? On Fri, Mar 25, 2016 at 9:20 AM, Stack <st...@duboce.net> wrote: > On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > James: > > Another experiment you can do is to enable region replica - HBASE-10070. > > > > This would bring down the read variance greatly. > > > > > Suggest you NOT do this James. > > Lets figure your issue as-is rather than compound by adding yet more moving > parts. > > St.Ack > > > > > > > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nkey...@gmail.com> > wrote: > > > > > > The read path is much more complex than the write one, so the response > > time > > > has much more variance. > > > The gap is so wide here that I would bet on Ted's or Stack's points, > but > > > here are a few other sources of variance: > > > - hbase cache: as Anoop said, may be the data is already in the hbase > > cache > > > (setCacheBlocks(false), means "don't add blocks to the cache", not > "don't > > > use the cache") > > > - OS cache: and if the data is not in HBase cache may be it is in the > > > operating system cache (for example if you run the test multiple times) > > > - data locality: if you're lucky the data is local to the region > server. > > If > > > you're not, the reads need an extra network hoop. > > > - number of store: more hfiles/stores per region => slower reads. > > > - number of versions and so on: sub case of the previous one: if the > rows > > > have been updated multiple times and the compaction has not ran yet, > you > > > will read much more data. > > > - (another subcase): the data has not been flushed yet and is available > > in > > > the memstore => fast read. > > > > > > None of these points has any importance for the the write path. > Basically > > > the writes variance says nothing about the variance you will get on the > > > reads. > > > > > > IIRC, locality and number of stores are visible in HBase UI. Doing a > > table > > > flush and then running a major compaction generally helps to stabilize > > > response time when you do a test. But it should not explain the x25 > > you're > > > seeing, there is something else somewhere else. I don't get the > > > regionserver boundaries you're mentioning: there is no boundary between > > > regionservers. A regionserver can host A->D and M->S while another > hosts > > > D->M and S->Z for example. > > > > > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <anoop.hb...@gmail.com> > > wrote: > > >> > > >> I see you set cacheBlocks to be false on the Scan. By any chance on > > >> some other RS(s), the data you are looking for is already in cache? > > >> (Any previous scan or by cache on write) And there are no concurrent > > >> writes any way right? This much difference in time ! One > > >> possibility is blocks avail or not avail in cache.. > > >> > > >> -Anoop- > > >> > > >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote: > > >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville < > > >>> james.johansvi...@gmail.com> wrote: > > >>> > > >>>> Hello all, > > >>>> > > >>>> So, I wrote a Java application for HBase that does a partitioned > > >> full-table > > >>>> scan according to a set number of partitions. For example, if there > > are > > >> 20 > > >>>> partitions specified, then 20 separate full scans are launched that > > >> cover > > >>>> an equal slice of the row identifier range. > > >>>> > > >>>> The rows are uniformly distributed throughout the RegionServers. > > >>> > > >>> > > >>> How many RegionServers? How many Regions? Are Regions evenly > > distributed > > >>> across the servers? If you put all partitions on one machine and then > > run > > >>> your client, do the timings even out? > > >>> > > >>> The disparity seems really wide. > > >>> > > >>> St.Ack > > >>> > > >>> > > >>> > > >>> > > >>>> I > > >>>> confirmed this through the hbase shell. I have only one column > family, > > >> and > > >>>> each row has the same number of column qualifiers. > > >>>> > > >>>> My problem is that the individual scan performance is wildly > > >> inconsistent > > >>>> even though they fetch approximately a similar number of rows. This > > >>>> inconsistency appears to be random with respect to hosts or > > >> regionservers > > >>>> or partitions or CPU cores. I am the only user of the fleet and not > > >> running > > >>>> any other concurrent HBase operation. > > >>>> > > >>>> I started measuring from the beginning of the scan and stopped > > measuring > > >>>> after the scan was completed. I am not doing any logic with the > > results, > > >>>> just scanning them. > > >>>> > > >>>> For ~230K rows fetched per scan, I am getting anywhere from 4 > seconds > > to > > >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone > have > > >> any > > >>>> insight? By comparison, a similar utility I wrote to upsert to > > >>>> regionservers was very consistent in ops/sec and I had no issues > with > > >> it. > > >>>> > > >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB > > heap, I > > >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of > > log > > >>>> output I saved that used 130 partitions. > > >>>> > > >>>> total # partitions:130; partition id:47; rows:232730 > elapsed_sec:6.401 > > >>>> ops/sec:36358.38150289017 > > >>>> total # partitions:130; partition id:100; rows:206890 > > elapsed_sec:6.636 > > >>>> ops/sec:31176.91380349608 > > >>>> total # partitions:130; partition id:63; rows:233437 > elapsed_sec:7.586 > > >>>> ops/sec:30772.08014764039 > > >>>> total # partitions:130; partition id:9; rows:232585 > elapsed_sec:32.985 > > >>>> ops/sec:7051.235410034865 > > >>>> total # partitions:130; partition id:19; rows:234192 > > elapsed_sec:38.733 > > >>>> ops/sec:6046.3170939508955 > > >>>> total # partitions:130; partition id:1; rows:232860 > elapsed_sec:48.479 > > >>>> ops/sec:4803.316900101075 > > >>>> total # partitions:130; partition id:125; rows:205334 > > elapsed_sec:41.911 > > >>>> ops/sec:4899.286583474505 > > >>>> total # partitions:130; partition id:123; rows:206622 > > elapsed_sec:42.281 > > >>>> ops/sec:4886.875901705258 > > >>>> total # partitions:130; partition id:54; rows:232811 > > elapsed_sec:49.083 > > >>>> ops/sec:4743.210480206996 > > >>>> > > >>>> I use setCacheBlocks(false), setCaching(5000). Does anyone have any > > >>>> insight into how I can make the read performance more consistent? > > >>>> > > >>>> Thanks! > > >> > > >