Re: Scan a region in parallel

John Leach Sat, 22 Oct 2016 14:08:28 -0700

Anil,

You could also try Splice Machine (Open Source).


Regards,
John Leach

> On Oct 21, 2016, at 4:05 AM, Anil <[email protected]> wrote:
> 
> Thank you Ram. Now its clear. i will take a look at it.
> 
> Thanks again.
> 
> On 21 October 2016 at 14:25, ramkrishna vasudevan <
> [email protected]> wrote:
> 
>> Phoenix does support intelligent ways when you query using columns since it
>> is a SQL engine.
>> 
>> There the parallelism happens by using guideposts - those are fixed spaced
>> row keys stored in a seperate stats table. So when you do a query the
>> Phoenix internally spawns parallels scan queries using those guide posts
>> and thus making querying faster.
>> 
>> Regards
>> Ram
>> 
>> On Fri, Oct 21, 2016 at 1:26 PM, Anil <[email protected]> wrote:
>> 
>>> Thank you Ram.
>>> 
>>> "So now  you are spawning those many scan threads equal to the number of
>>> regions " - YES
>>> 
>>> There are two ways of scanning region in parallel
>>> 
>>> 1. scan a region with start row and stop row in parallel with single scan
>>> operation on server side and hbase take care of parallelism internally.
>>> 2. transform a start row and stop row of a region into number of start
>> and
>>> stop rows (by some criteria) and span scan query for each start and stop
>>> row.
>>> 
>>> #1 is not supported (as you also said).
>>> 
>>> i am looking for #2. i checked the phoenix documentation and code. it
>> seems
>>> to me that phoenix is doing #2. i looked into phoenix code and could not
>>> understand it completely.
>>> 
>>> The usecase is very simple. Hbase not good (at least in terms of
>>> performance for OLTP) query by all columns (other than row key) and
>> sorting
>>> of all columns of a row. even phoenix too.
>>> 
>>> So i am planning load the hbase/phoenix table into in-memory data base
>> for
>>> faster access.
>>> 
>>> scanning of big region sequentially will lead to larger load time. so
>>> finding ways to minimize the load time.
>>> 
>>> Hope this helps.
>>> 
>>> Thanks.
>>> 
>>> 
>>> On 21 October 2016 at 09:30, ramkrishna vasudevan <
>>> [email protected]> wrote:
>>> 
>>>> Hi Anil
>>>> 
>>>> So now  you are spawning those many scan threads equal to the number of
>>>> regions.
>>>> bq.Is there any way to scan a region in parallel ?
>>>> You mean with in a region you want to scan parallely? Which means that
>> a
>>>> single query you want to split up into N number of small scans and read
>>> and
>>>> aggregate on the client side/server side?
>>>> 
>>>> Currently you cannot do that. Once you set a start and stoprow the scan
>>>> will determine which region it belongs to and retrieves the data
>>>> sequentially in that region (it applies the filtering that you do
>> during
>>>> the course of the scan).
>>>> 
>>>> Have you tried Apache Phoenix?  Its a SQL wrapper over HBase and there
>>> you
>>>> could do parallel scans for a given SQL query if there are some guide
>>> posts
>>>> collected. Such things cannot be an integral part of HBase. But I fear
>>> as I
>>>> am not aware of your usecase we cannot suggest on this.
>>>> 
>>>> REgards
>>>> Ram
>>>> 
>>>> 
>>>> On Fri, Oct 21, 2016 at 8:40 AM, Anil <[email protected]> wrote:
>>>> 
>>>>> Any pointers ?
>>>>> 
>>>>> On 20 October 2016 at 18:15, Anil <[email protected]> wrote:
>>>>> 
>>>>>> HI,
>>>>>> 
>>>>>> I am loading hbase table into an in-memory db to support filter,
>>>> ordering
>>>>>> and pagination.
>>>>>> 
>>>>>> I am scanning region and inserting data into in-memory db. each
>>> region
>>>>>> scan is done in single thread so each region is scanned in
>> parallel.
>>>>>> 
>>>>>> Is there any way to scan a region in parallel ? any pointers would
>> be
>>>>>> helpful.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Scan a region in parallel

Reply via email to