Re: Performance tuning

Kristoffer Sjögren Sat, 21 Dec 2013 15:18:18 -0800

There are quite a lot of established and time wait connections between the
RS on port 50010, but i dont know a good way of monitoring how much data is
going through each connection (if that's what you meant)?



On Sun, Dec 22, 2013 at 12:00 AM, Kristoffer Sjögren <[email protected]>wrote:

> Scans on RS 19 and 23, which have 5 regions instead of 4, stands out more
> than scans on RS 20, 21, 22. But scans on RS 7 and 18, that also have 5
> regions are doing fine, not best, but still in the mid-range.
>
>
> On Sat, Dec 21, 2013 at 11:51 PM, Kristoffer Sjögren <[email protected]>wrote:
>
>> Yeah, im doing a count(*) query on the 96 region table. Do you mean to
>> check network traffic between RS?
>>
>> From debugging phoenix code I can see that there are 96 scans sent and
>> each response returned back to the client contain only the sum of rows,
>> which are then aggregated and returned. So the traffic between client and
>> each RS is very small.
>>
>>
>>
>>
>> On Sat, Dec 21, 2013 at 11:35 PM, lars hofhansl <[email protected]> wrote:
>>
>>> Thanks Kristoffer,
>>>
>>> yeah, that's the right metric. I would put my bet on the slower network.
>>> But you're also doing a select count(*) query in Phoenix, right? So
>>> nothing should really be sent across the network.
>>>
>>> When you do the queries, can you check whether there is any network
>>> traffic?
>>>
>>> -- Lars
>>>
>>>
>>>
>>> ________________________________
>>>  From: Kristoffer Sjögren <[email protected]>
>>> To: [email protected]; lars hofhansl <[email protected]>
>>> Sent: Saturday, December 21, 2013 1:28 PM
>>> Subject: Re: Performance tuning
>>>
>>>
>>> @pradeep scanner caching should not be an issue since data transferred to
>>> the client is tiny.
>>>
>>> @lars Yes, the data might be small for this particular case :-)
>>>
>>> I have checked everything I can think of on RS (CPU, network, Hbase
>>> console, uptime etc) and nothing stands out, except for the pings
>>> (network
>>> pings).
>>> There are 5 regions on 7, 18, 19, and 23 the others have 4.
>>> hdfsBlocksLocalityIndex=100 on all RS (was that the correct metric?)
>>>
>>> -Kristoffer
>>>
>>>
>>>
>>>
>>> On Sat, Dec 21, 2013 at 9:44 PM, lars hofhansl <[email protected]> wrote:
>>>
>>> > Hi Kristoffer,
>>> > For this particular problem. Are many regions on the same
>>> RegionServers?
>>> > Did you profile those RegionServers? Anything weird on that box?
>>> > Pings slower might well be an issue. How's the data locality? (You can
>>> > check on a RegionServer's overview page).
>>> > If needed, you can issue a major compaction to reestablish local data
>>> on
>>> > all RegionServers.
>>> >
>>> >
>>> > 32 cores matched with only 4G of RAM is a bit weird, but with your tiny
>>> > dataset it doesn't matter anyway.
>>> >
>>> > 10m rows across 96 regions is just about 100k rows per region. You
>>> won't
>>> > see many of the nice properties for HBase.
>>> > Try with 100m (or better 1bn rows). Then we're talking. For anything
>>> below
>>> > this you wouldn't want to use HBase anyway.
>>> > (100k rows I could scan on my phone with a Perl script in less than 1s)
>>> >
>>> >
>>> > With "ping" you mean an actual network ping, or some operation on top
>>> of
>>> > HBase?
>>> >
>>> >
>>> > -- Lars
>>> >
>>> >
>>> >
>>> > ________________________________
>>> >  From: Kristoffer Sjögren <[email protected]>
>>> > To: [email protected]
>>> > Sent: Saturday, December 21, 2013 11:17 AM
>>> > Subject: Performance tuning
>>> >
>>> >
>>> > Hi
>>> >
>>> > I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the
>>> last
>>> > couple of days and need some help.
>>> >
>>> > Background.
>>> >
>>> > - 23 machine cluster, 32 cores, 4GB heap per RS.
>>> > - Table t_24 have 24 online regions (24 salt buckets).
>>> > - Table t_96 have 96 online regions (96 salt buckets).
>>> > - 10.5 million rows per table.
>>> > - Count query - select (*) from ...
>>> > - Group by query - select A, B, C sum(D) from ... where (A = 1 and T
>>> >= 0
>>> > and T <= 2147482800) group by A, B, C;
>>> >
>>> > What I found ultimately is that region servers 19, 20, 21, 22 and 23
>>> > are consistently
>>> > 2-3x slower than the others. This hurts overall latency pretty bad
>>> since
>>> > queries are executed in parallel on the RS and then aggregated at the
>>> > client (through Phoenix). In Hannibal regions spread out evenly over
>>> region
>>> > servers, according to salt buckets (phoenix feature, pre-create
>>> regions and
>>> > a rowkey prefix).
>>> >
>>> > As far as I can tell, there is no network or hardware configuration
>>> > divergence between the machines. No CPU, network or other notable
>>> > divergence
>>> > in Ganglia. No RS metric differences in HBase master console.
>>> >
>>> > The only thing that may be of interest is that pings (within the
>>> cluster)
>>> > to
>>> > bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if
>>> > this is significant,
>>> > but I get a bad feeling about it since it match exactly with the RS
>>> that
>>> > stood out in my performance tests.
>>> >
>>> > Any ideas of how I might find the source of this problem?
>>> >
>>> > Cheers,
>>> > -Kristoffer
>>> >
>>>
>>
>>
>

Re: Performance tuning

Reply via email to