Re: Using Scans in parallel

Sam Seigal Wed, 05 Oct 2011 18:19:09 -0700

So the whole point of getting the region locations is to ensure that
there is one thread per region server ?



On Wed, Oct 5, 2011 at 4:42 PM, lars hofhansl <lhofha...@yahoo.com> wrote:
> Hi Sam,
>
>
> There were some attempts to build this in. In the end I think the exact 
> patterns are different based on what one is trying to achieve.
> Currently what you can do is getting all the region locations 
> (HTable.getRegionLocations). From the HRegionInfos you can
> get the regions start and end keys.
> Now you can issue parallel scan for as many regions as you want (by create a 
> Scan object with start and row set to the region's
> start and end key).
> You probably want to group the regions by regionserver and have one thread 
> per region server, or something.
>
>
> -- Lars
> ________________________________
> From: Sam Seigal <selek...@yahoo.com>
> To: hbase-u...@hadoop.apache.org
> Sent: Wednesday, October 5, 2011 4:29 PM
> Subject: Using Scans in parallel
>
> Hi ,
>
> Is there a known way to be able to do Scan's in parallel (in different
> threads even) and then sort/combine the output ?
>
> For a row key like:
>
> prefix-event_type-event_id
> prefix-event_type-event_id
>
> I want to declare two scan objects (for say event_id_type foo)
>
> Scan 1 =>  0-foo
> Scan 2 =>  1-foo
>
> execute the scans in parallel (maybe even in different threads) and
> then merge the results ?
>
> Thank you,
>
> Sam
>

Re: Using Scans in parallel

Reply via email to