1 GB heap is nowhere enough to run if you're tying to test something real (or approximate it with YCSB). Try 4 or 8, anything up to 31 GB, use case dependent. >= 32 GB gives away compressed OOPs and maybe GC issues.
Also, I recently redid the HBase YCSB client in a modern way for >= 0.98. See https://github.com/apurtell/YCSB/tree/new_hbase_client . It performs in an IMHO more useful fashion than the previous for what YCSB is intended, but might need some tuning (haven't tried it on a cluster of significant size). One difference you should see is we won't back up for 30-60 seconds after a bunch of threads flush fat 12+ MB write buffers. On Thu, Sep 18, 2014 at 2:31 PM, Josh Williams <jwilli...@endpoint.com> wrote: > Ted, > > Stack trace, that's definitely a good idea. Here's one jstack snapshot > from the region server while there's no apparent activity going on: > https://gist.github.com/joshwilliams/4950c1d92382ea7f3160 > > If it's helpful, this is the YCSB side of the equation right around the > same time: > https://gist.github.com/joshwilliams/6fa3623088af9d1446a3 > > > And Gary, > > As far as the memory configuration, that's a good question. Looks like > HBASE_HEAPSIZE isn't set, which I now see has a default of 1GB. There > isn't any swap configured, and 12G of the memory on the instance is > going to file cache, so there's definitely room to spare. > > Maybe it'd help if I gave it more room by setting HBASE_HEAPSIZE. > Couldn't hurt to try that now... > > What's strange is running on m3.xlarge, which also has 15G of RAM but > fewer CPU cores, it runs fine. > > Thanks to you both for the insight! > > -- Josh > > > > On Thu, 2014-09-18 at 11:42 -0700, Gary Helmling wrote: >> What do you have HBASE_HEAPSIZE set to in hbase-env.sh? Is it >> possible that you're overcommitting memory and the instance is >> swapping? Just a shot in the dark, but I see that the m3.2xlarge >> instance has 30G of memory vs. 15G for c3.2xlarge. >> >> On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> > bq. there's almost no activity on either side >> > >> > During this period, can you capture stack trace for the region server and >> > pastebin the stack ? >> > >> > Cheers >> > >> > On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams <jwilli...@endpoint.com> >> > wrote: >> > >> >> Hi, everyone. Here's a strange one, at least to me. >> >> >> >> I'm doing some performance profiling, and as a rudimentary test I've >> >> been using YCSB to drive HBase (originally 0.98.3, recently updated to >> >> 0.98.6.) The problem happens on a few different instance sizes, but >> >> this is probably the closest comparison... >> >> >> >> On m3.2xlarge instances, works as expected. >> >> On c3.2xlarge instances, HBase barely responds at all during workloads >> >> that involve read activity, falling silent for ~62 second intervals, >> >> with the YCSB throughput output resembling: >> >> >> >> 0 sec: 0 operations; >> >> 2 sec: 918 operations; 459 current ops/sec; [UPDATE >> >> AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26] >> >> 4 sec: 918 operations; 0 current ops/sec; >> >> 6 sec: 918 operations; 0 current ops/sec; >> >> <snip> >> >> 62 sec: 918 operations; 0 current ops/sec; >> >> 64 sec: 5302 operations; 2192 current ops/sec; [UPDATE >> >> AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56] >> >> 66 sec: 5302 operations; 0 current ops/sec; >> >> 68 sec: 5302 operations; 0 current ops/sec; >> >> (And so on...) >> >> >> >> While that happens there's almost no activity on either side, the CPU's >> >> and disks are idle, no iowait at all. >> >> >> >> There isn't much that jumps out at me when digging through the Hadoop >> >> and HBase logs, except that those 62-second intervals are often (but >> >> note always) associated with ClosedChannelExceptions in the regionserver >> >> logs. But I believe that's just HBase finding that a TCP connection it >> >> wants to reply on had been closed. >> >> >> >> As far as I've seen this happens every time on this or any of the larger >> >> c3 class of instances, surprisingly. The m3 instance class sizes all >> >> seem to work fine. These are built with a custom AMI that has HBase and >> >> all installed, and run via a script, so the different instance type >> >> should be the only difference between them. >> >> >> >> Anyone seen anything like this? Any pointers as to what I could look at >> >> to help diagnose this odd problem? Could there be something I'm >> >> overlooking in the logs? >> >> >> >> Thanks! >> >> >> >> -- Josh >> >> >> >> >> >> > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)