Re: Parallel Scanner

Anil Sun, 19 Feb 2017 23:34:40 -0800

Thanks Ram. I will look into EndPoints.

On 20 February 2017 at 12:29, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:


> Yes. There is way.
>
> Have you seen Endpoints? Endpoints are triggers like points that allows
> your client to trigger them parallely in one ore more regions using the
> start and end key of the region. This executes parallely and then you may
> have to sort out the results as per your need.
>
> But these endpoints have to running on your region servers and it is not a
> client only soln.
> https://blogs.apache.org/hbase/entry/coprocessor_introduction.
>
> Be careful when you use them. Since these endpoints run on server ensure
> that these are not heavy or things that consume more memory which can have
> adverse effects on the server.
>
>
> Regards
> Ram
>
> On Mon, Feb 20, 2017 at 12:18 PM, Anil <anilk...@gmail.com> wrote:
>
> > Thanks Ram.
> >
> > So, you mean that there is no harm in using  HTable#getRegionsInRange in
> > the application code.
> >
> > HTable#getRegionsInRange returned single entry for all my region start
> key
> > and end key. i need to explore more on this.
> >
> > "If you know the table region's start and end keys you could create
> > parallel scans in your application code."  - is there any way to scan a
> > region in the application code other than the one i put in the original
> > email ?
> >
> > "One thing to watch out is that if there is a split in the region then
> > this start
> > and end row may change so in that case it is better you try to get
> > the regions every time before you issue a scan"
> >  - Agree. i am dynamically determining the region start key and end key
> > before initiating scan operations for every initial load.
> >
> > Thanks.
> >
> >
> >
> >
> > On 20 February 2017 at 10:59, ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> > > Hi Anil,
> > >
> > > HBase directly does not provide parallel scans. If you know the table
> > > region's start and end keys you could create parallel scans in your
> > > application code.
> > >
> > > In the above code snippet, the intent is right - you get the required
> > > regions and can issue parallel scans from your app.
> > >
> > > One thing to watch out is that if there is a split in the region then
> > this
> > > start and end row may change so in that case it is better you try to
> get
> > > the regions every time before you issue a scan. Does that make sense to
> > > you?
> > >
> > > Regards
> > > Ram
> > >
> > > On Sat, Feb 18, 2017 at 1:44 PM, Anil <anilk...@gmail.com> wrote:
> > >
> > > > Hi ,
> > > >
> > > > I am building an usecase where i have to load the hbase data into
> > > In-memory
> > > > database (IMDB). I am scanning the each region and loading data into
> > > IMDB.
> > > >
> > > > i am looking at parallel scanner ( https://issues.apache.org/
> > > > jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and
> > HTable#
> > > > getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> > > > deprecated, HBASE-1935 is still open.
> > > >
> > > > I see Connection from ConnectionFactory is HConnectionImplementation
> by
> > > > default and creates HTable instance.
> > > >
> > > > Do you see any issues in using HTable from Table instance ?
> > > >             for each region {
> > > >                         int i = 0;
> > > >                     List<HRegionLocation> regions =
> > > > hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(),
> > true);
> > > >
> > > >                     for (HRegionLocation region : regions){
> > > >                     startRow = i == 0 ? scans.getStartRow() :
> > > > region.getRegionInfo().getStartKey();
> > > >                     i++;
> > > >                     endRow = i == regions.size()? scans.getStopRow()
> :
> > > > region.getRegionInfo().getEndKey();
> > > >                      }
> > > >            }
> > > >
> > > > are there any alternatives to achieve parallel scan? Thanks.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Re: Parallel Scanner

Reply via email to