Hi Kristoffer,
For this particular problem. Are many regions on the same RegionServers? Did 
you profile those RegionServers? Anything weird on that box?
Pings slower might well be an issue. How's the data locality? (You can check on 
a RegionServer's overview page).
If needed, you can issue a major compaction to reestablish local data on all 
RegionServers.


32 cores matched with only 4G of RAM is a bit weird, but with your tiny dataset 
it doesn't matter anyway.

10m rows across 96 regions is just about 100k rows per region. You won't see 
many of the nice properties for HBase.
Try with 100m (or better 1bn rows). Then we're talking. For anything below this 
you wouldn't want to use HBase anyway.
(100k rows I could scan on my phone with a Perl script in less than 1s)


With "ping" you mean an actual network ping, or some operation on top of HBase?


-- Lars



________________________________
 From: Kristoffer Sjögren <sto...@gmail.com>
To: user@hbase.apache.org 
Sent: Saturday, December 21, 2013 11:17 AM
Subject: Performance tuning
 

Hi

I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the last
couple of days and need some help.

Background.

- 23 machine cluster, 32 cores, 4GB heap per RS.
- Table t_24 have 24 online regions (24 salt buckets).
- Table t_96 have 96 online regions (96 salt buckets).
- 10.5 million rows per table.
- Count query - select (*) from ...
- Group by query - select A, B, C sum(D) from ... where (A = 1 and T >= 0
and T <= 2147482800) group by A, B, C;

What I found ultimately is that region servers 19, 20, 21, 22 and 23
are consistently
2-3x slower than the others. This hurts overall latency pretty bad since
queries are executed in parallel on the RS and then aggregated at the
client (through Phoenix). In Hannibal regions spread out evenly over region
servers, according to salt buckets (phoenix feature, pre-create regions and
a rowkey prefix).

As far as I can tell, there is no network or hardware configuration
divergence between the machines. No CPU, network or other notable divergence
in Ganglia. No RS metric differences in HBase master console.

The only thing that may be of interest is that pings (within the cluster) to
bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if
this is significant,
but I get a bad feeling about it since it match exactly with the RS that
stood out in my performance tests.

Any ideas of how I might find the source of this problem?

Cheers,
-Kristoffer

Reply via email to