There are quite a lot of established and time wait connections between the RS on port 50010, but i dont know a good way of monitoring how much data is going through each connection (if that's what you meant)?
On Sun, Dec 22, 2013 at 12:00 AM, Kristoffer Sjögren <[email protected]>wrote: > Scans on RS 19 and 23, which have 5 regions instead of 4, stands out more > than scans on RS 20, 21, 22. But scans on RS 7 and 18, that also have 5 > regions are doing fine, not best, but still in the mid-range. > > > On Sat, Dec 21, 2013 at 11:51 PM, Kristoffer Sjögren <[email protected]>wrote: > >> Yeah, im doing a count(*) query on the 96 region table. Do you mean to >> check network traffic between RS? >> >> From debugging phoenix code I can see that there are 96 scans sent and >> each response returned back to the client contain only the sum of rows, >> which are then aggregated and returned. So the traffic between client and >> each RS is very small. >> >> >> >> >> On Sat, Dec 21, 2013 at 11:35 PM, lars hofhansl <[email protected]> wrote: >> >>> Thanks Kristoffer, >>> >>> yeah, that's the right metric. I would put my bet on the slower network. >>> But you're also doing a select count(*) query in Phoenix, right? So >>> nothing should really be sent across the network. >>> >>> When you do the queries, can you check whether there is any network >>> traffic? >>> >>> -- Lars >>> >>> >>> >>> ________________________________ >>> From: Kristoffer Sjögren <[email protected]> >>> To: [email protected]; lars hofhansl <[email protected]> >>> Sent: Saturday, December 21, 2013 1:28 PM >>> Subject: Re: Performance tuning >>> >>> >>> @pradeep scanner caching should not be an issue since data transferred to >>> the client is tiny. >>> >>> @lars Yes, the data might be small for this particular case :-) >>> >>> I have checked everything I can think of on RS (CPU, network, Hbase >>> console, uptime etc) and nothing stands out, except for the pings >>> (network >>> pings). >>> There are 5 regions on 7, 18, 19, and 23 the others have 4. >>> hdfsBlocksLocalityIndex=100 on all RS (was that the correct metric?) >>> >>> -Kristoffer >>> >>> >>> >>> >>> On Sat, Dec 21, 2013 at 9:44 PM, lars hofhansl <[email protected]> wrote: >>> >>> > Hi Kristoffer, >>> > For this particular problem. Are many regions on the same >>> RegionServers? >>> > Did you profile those RegionServers? Anything weird on that box? >>> > Pings slower might well be an issue. How's the data locality? (You can >>> > check on a RegionServer's overview page). >>> > If needed, you can issue a major compaction to reestablish local data >>> on >>> > all RegionServers. >>> > >>> > >>> > 32 cores matched with only 4G of RAM is a bit weird, but with your tiny >>> > dataset it doesn't matter anyway. >>> > >>> > 10m rows across 96 regions is just about 100k rows per region. You >>> won't >>> > see many of the nice properties for HBase. >>> > Try with 100m (or better 1bn rows). Then we're talking. For anything >>> below >>> > this you wouldn't want to use HBase anyway. >>> > (100k rows I could scan on my phone with a Perl script in less than 1s) >>> > >>> > >>> > With "ping" you mean an actual network ping, or some operation on top >>> of >>> > HBase? >>> > >>> > >>> > -- Lars >>> > >>> > >>> > >>> > ________________________________ >>> > From: Kristoffer Sjögren <[email protected]> >>> > To: [email protected] >>> > Sent: Saturday, December 21, 2013 11:17 AM >>> > Subject: Performance tuning >>> > >>> > >>> > Hi >>> > >>> > I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the >>> last >>> > couple of days and need some help. >>> > >>> > Background. >>> > >>> > - 23 machine cluster, 32 cores, 4GB heap per RS. >>> > - Table t_24 have 24 online regions (24 salt buckets). >>> > - Table t_96 have 96 online regions (96 salt buckets). >>> > - 10.5 million rows per table. >>> > - Count query - select (*) from ... >>> > - Group by query - select A, B, C sum(D) from ... where (A = 1 and T >>> >= 0 >>> > and T <= 2147482800) group by A, B, C; >>> > >>> > What I found ultimately is that region servers 19, 20, 21, 22 and 23 >>> > are consistently >>> > 2-3x slower than the others. This hurts overall latency pretty bad >>> since >>> > queries are executed in parallel on the RS and then aggregated at the >>> > client (through Phoenix). In Hannibal regions spread out evenly over >>> region >>> > servers, according to salt buckets (phoenix feature, pre-create >>> regions and >>> > a rowkey prefix). >>> > >>> > As far as I can tell, there is no network or hardware configuration >>> > divergence between the machines. No CPU, network or other notable >>> > divergence >>> > in Ganglia. No RS metric differences in HBase master console. >>> > >>> > The only thing that may be of interest is that pings (within the >>> cluster) >>> > to >>> > bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if >>> > this is significant, >>> > but I get a bad feeling about it since it match exactly with the RS >>> that >>> > stood out in my performance tests. >>> > >>> > Any ideas of how I might find the source of this problem? >>> > >>> > Cheers, >>> > -Kristoffer >>> > >>> >> >> >
