Re: Scan problem

Saad Mufti Mon, 19 Mar 2018 07:31:36 -0700

Another option if you have enough disk space/off heap memory space is to
enable bucket cache to cache even more of your data, and set the
PREFETCH_ON_OPEN => true option on the column families you want always
cache. That way HBase will prefetch your data into the bucket cache and
your scan won't have that initial slowdown. Or if you want to do it
globally for all column families, set the configuration flag
"hbase.rs.prefetchblocksonopen" to "true". Keep in mind though that if you
do this, you should either have enough bucket cache space for all your
data, otherwise there will be a lot of useless eviction activity at HBase
startup and even later.


Also, where a region is located will also be heavily impacted by which
region balancer you have chosen and how you have tuned it in terms of how
often to run and other parameters. A split region will stay initially at
least on the same region server but your balancer if and when run can move
it (an indeed any region) elsewhere to satisfy its criteria.

Cheers.

----
Saad


On Mon, Mar 19, 2018 at 1:14 AM, ramkrishna vasudevan <
[email protected]> wrote:

> Hi
>
> First regarding the scans,
>
> Generally the data resides in the store files which is in HDFS. So probably
> the first scan that you are doing is reading from HDFS which involves disk
> reads. Once the blocks are read, they are cached in the Block cache of
> HBase. So your further reads go through that and hence you see further
> speed up in the scans.
>
> >> And another question about region split, I want to know which
> RegionServer
> will load the new region afther splited ,
> Will they be the same One with the old region?
> Yes . Generally same region server hosts it.
>
> In master the code is here,
> https://github.com/apache/hbase/blob/master/hbase-
> server/src/main/java/org/apache/hadoop/hbase/master/assignment/
> SplitTableRegionProcedure.java
>
> You may need to understand the entire flow to know how the regions are
> opened after a split.
>
> Regards
> Ram
>
> On Sat, Mar 17, 2018 at 9:02 PM, Yang Zhang <[email protected]>
> wrote:
>
> > Hello everyone
> >
> >         I try to do many Scan use RegionScanner in coprocessor, and
> ervery
> > time ,the first Scan cost  about 10 times than the other,
> > I don't know why this will happen
> >
> > OneBucket Scan cost is : 8794 ms Num is : 710
> > OneBucket Scan cost is : 91 ms Num is : 776
> > OneBucket Scan cost is : 87 ms Num is : 808
> > OneBucket Scan cost is : 105 ms Num is : 748
> > OneBucket Scan cost is : 68 ms Num is : 200
> >
> >
> > And another question about region split, I want to know which
> RegionServer
> > will load the new region afther splited ,
> > Will they be the same One with the old region?  Anyone know where I can
> > find the code to learn about that?
> >
> >
> > Thanks for your help
> >
>

Re: Scan problem

Reply via email to