Re: Inconsistent scan performance

Ted Yu Thu, 24 Mar 2016 18:34:55 -0700

Crossing region boundaries which happen to be on different servers may be.

On Thu, Mar 24, 2016 at 5:49 PM, James Johansville <
james.johansvi...@gmail.com> wrote:


> In theory they should be aligned with *regionserver* boundaries. Would
> crossing multiple regions on the same regionserver result in the big
> performance difference being seen here?
>
> I am using Hortonworks HBase 1.1.2
>
> On Thu, Mar 24, 2016 at 5:32 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > I assume the partitions' boundaries don't align with region boundaries,
> > right ?
> >
> > Meaning some partitions would cross region boundaries.
> >
> > Which hbase release do you use ?
> >
> > Thanks
> >
> > On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > james.johansvi...@gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > So, I wrote a Java application for HBase that does a partitioned
> > full-table
> > > scan according to a set number of partitions. For example, if there are
> > 20
> > > partitions specified, then 20 separate full scans are launched that
> cover
> > > an equal slice of the row identifier range.
> > >
> > > The rows are uniformly distributed throughout the RegionServers. I
> > > confirmed this through the hbase shell. I have only one column family,
> > and
> > > each row has the same number of column qualifiers.
> > >
> > > My problem is that the individual scan performance is wildly
> inconsistent
> > > even though they fetch approximately a similar number of rows. This
> > > inconsistency appears to be random with respect to hosts or
> regionservers
> > > or partitions or CPU cores. I am the only user of the fleet and not
> > running
> > > any other concurrent HBase operation.
> > >
> > > I started measuring from the beginning of the scan and stopped
> measuring
> > > after the scan was completed. I am not doing any logic with the
> results,
> > > just scanning them.
> > >
> > > For ~230K rows fetched per scan, I am getting anywhere from 4 seconds
> to
> > > 100+ seconds. This seems a little too bouncy for me. Does anyone have
> any
> > > insight? By comparison, a similar utility I wrote to upsert to
> > > regionservers was very consistent in ops/sec and I had no issues with
> it.
> > >
> > > Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap,
> I
> > > see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of
> log
> > > output I saved that used 130 partitions.
> > >
> > > total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> > > ops/sec:36358.38150289017
> > > total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> > > ops/sec:31176.91380349608
> > > total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> > > ops/sec:30772.08014764039
> > > total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> > > ops/sec:7051.235410034865
> > > total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> > > ops/sec:6046.3170939508955
> > > total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> > > ops/sec:4803.316900101075
> > > total # partitions:130; partition id:125; rows:205334
> elapsed_sec:41.911
> > > ops/sec:4899.286583474505
> > > total # partitions:130; partition id:123; rows:206622
> elapsed_sec:42.281
> > > ops/sec:4886.875901705258
> > > total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> > > ops/sec:4743.210480206996
> > >
> > > I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> > > insight into how I can make the read performance more consistent?
> > >
> > > Thanks!
> > >
> >
>

Re: Inconsistent scan performance

Reply via email to