Hello all,

So, I wrote a Java application for HBase that does a partitioned full-table
scan according to a set number of partitions. For example, if there are 20
partitions specified, then 20 separate full scans are launched that cover
an equal slice of the row identifier range.

The rows are uniformly distributed throughout the RegionServers. I
confirmed this through the hbase shell. I have only one column family, and
each row has the same number of column qualifiers.

My problem is that the individual scan performance is wildly inconsistent
even though they fetch approximately a similar number of rows. This
inconsistency appears to be random with respect to hosts or regionservers
or partitions or CPU cores. I am the only user of the fleet and not running
any other concurrent HBase operation.

I started measuring from the beginning of the scan and stopped measuring
after the scan was completed. I am not doing any logic with the results,
just scanning them.

For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
100+ seconds. This seems a little too bouncy for me. Does anyone have any
insight? By comparison, a similar utility I wrote to upsert to
regionservers was very consistent in ops/sec and I had no issues with it.

Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
output I saved that used 130 partitions.

total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
ops/sec:36358.38150289017
total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
ops/sec:31176.91380349608
total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
ops/sec:30772.08014764039
total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
ops/sec:7051.235410034865
total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
ops/sec:6046.3170939508955
total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
ops/sec:4803.316900101075
total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
ops/sec:4899.286583474505
total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
ops/sec:4886.875901705258
total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
ops/sec:4743.210480206996

I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
insight into how I can make the read performance more consistent?

Thanks!

Reply via email to