Re: HBase / YCSB

Jeff Whiting Fri, 29 Jul 2011 10:04:48 -0700

Check the region server logs. If they are blocking on something it should show up there. For cdh3the logs are in /var/log/hbase/. Also you may want to turn on debug level for your logging (eitherin log4j or in the web interface). Finally all of your requests are going to just one regionserver...npin-172-16-12-204.np.local...so it may be stuck trying to split a region or something.You could try to pre-split the regions which may help.


~Jeff


On 7/29/2011 10:57 AM, Eric Hauser wrote:

Hi,
I've been doing different experiments with a 5-node cluster with YCSB.
  We have been testing a number of different configurations, so I have
been constantly been wiping our cluster up and setting it up again
since we configure everything via Chef.   At one point, I was able to
get the following stats from our cluster which I was pretty happy
with:
YCSB Client 0.1

Command line: -load -db com.yahoo.ycsb.db.HBaseClient
-Pworkloads/workloada -p columnfamily=family -p recordcount=10000000
-s

[OVERALL], RunTime(ms), 1057645.0

[OVERALL], Throughput(ops/sec), 9454.96834949345

[INSERT], Operations, 10000000

[INSERT], AverageLatency(ms), 0.0915235

[INSERT], MinLatency(ms), 0

[INSERT], MaxLatency(ms), 6925

[INSERT], 95thPercentileLatency(ms), 0

[INSERT], 99thPercentileLatency(ms), 0

[INSERT], Return=0, 10000000

However, in our most recent server builds, I seem to very quickly
deadlock something in HBase.  I've backed through all our old
revisions and reverted a number of different configuration settings,
and I can't seem to figure out now why the cluster is so slow.  Our
terasort M/R tests are returning the same values as before, so I do
not believe that there is anything wrong external to HBase.

The behavior that I see when I kick off the tests is this:

[UPDATE], 0, 4765

[UPDATE], 1, 248

[UPDATE], 2, 0

[UPDATE], 3, 0

[UPDATE], 4, 0

Basically, it kicks off a large number of inserts and HBase grinds to
a halt.  Some number of the writes end up getting inserted (usually
around ~50), but then everything stops.  Here's the behavior I see
with the region servers:

npin-172-16-12-203.np.local:60030       1311956094792   requests=50,
regions=1, usedHeap=151, maxHeap=16358
npin-172-16-12-204.np.local:60030       1311956094776   requests=5, regions=2,
usedHeap=157, maxHeap=16358
npin-172-16-12-205.np.local:60030       1311956093804   requests=0, regions=0,
usedHeap=134, maxHeap=16358
npin-172-16-12-206.np.local:60030       1311956093809   requests=0, regions=0,
usedHeap=134, maxHeap=16358
npin-172-16-12-207.np.local:60030       1311956094799   requests=0, regions=0,
usedHeap=134, maxHeap=16358
Total:  servers: 5              requests=55, regions=3

I did thread dumps on both the masters and region servers during this
time and did not see anything interesting. I'm using 0.90.3-CDH3U1.
Anyone have a suggestion on where to look next?


--
Jeff Whiting
Qualtrics Senior Software Engineer
je...@qualtrics.com

Re: HBase / YCSB

Reply via email to