In theory they should be aligned with *regionserver* boundaries. Would
crossing multiple regions on the same regionserver result in the big
performance difference being seen here?

I am using Hortonworks HBase 1.1.2

On Thu, Mar 24, 2016 at 5:32 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> I assume the partitions' boundaries don't align with region boundaries,
> right ?
>
> Meaning some partitions would cross region boundaries.
>
> Which hbase release do you use ?
>
> Thanks
>
> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> james.johansvi...@gmail.com> wrote:
>
> > Hello all,
> >
> > So, I wrote a Java application for HBase that does a partitioned
> full-table
> > scan according to a set number of partitions. For example, if there are
> 20
> > partitions specified, then 20 separate full scans are launched that cover
> > an equal slice of the row identifier range.
> >
> > The rows are uniformly distributed throughout the RegionServers. I
> > confirmed this through the hbase shell. I have only one column family,
> and
> > each row has the same number of column qualifiers.
> >
> > My problem is that the individual scan performance is wildly inconsistent
> > even though they fetch approximately a similar number of rows. This
> > inconsistency appears to be random with respect to hosts or regionservers
> > or partitions or CPU cores. I am the only user of the fleet and not
> running
> > any other concurrent HBase operation.
> >
> > I started measuring from the beginning of the scan and stopped measuring
> > after the scan was completed. I am not doing any logic with the results,
> > just scanning them.
> >
> > For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
> > 100+ seconds. This seems a little too bouncy for me. Does anyone have any
> > insight? By comparison, a similar utility I wrote to upsert to
> > regionservers was very consistent in ops/sec and I had no issues with it.
> >
> > Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
> > see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
> > output I saved that used 130 partitions.
> >
> > total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> > ops/sec:36358.38150289017
> > total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> > ops/sec:31176.91380349608
> > total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> > ops/sec:30772.08014764039
> > total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> > ops/sec:7051.235410034865
> > total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> > ops/sec:6046.3170939508955
> > total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> > ops/sec:4803.316900101075
> > total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
> > ops/sec:4899.286583474505
> > total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
> > ops/sec:4886.875901705258
> > total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> > ops/sec:4743.210480206996
> >
> > I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> > insight into how I can make the read performance more consistent?
> >
> > Thanks!
> >
>

Reply via email to