So you'd have to do a little bit of home work up front. 

Supposed you have to pull some data from 30K rows out of 10 Mil?
If they are in sort order, you could determine the regions and then think about 
doing a couple of scans in parallel. 

But that may be more work than just doing the set of gets. 

It would be interesting to benchmark the performance.... 

I wonder if a coprocessor could help speed this up?  
I mean use the cp to do all the gets per region rather than a full region scan 
and then filter against the list for that region. 

Again this would be for a very specific type of query.... 


On Feb 18, 2013, at 5:07 AM, ramkrishna vasudevan 
<ramkrishna.s.vasude...@gmail.com> wrote:

> If the scan is happening on the same region then going for Scan would be a
> better option.
> 
> Regards
> RAm
> 
> On Mon, Feb 18, 2013 at 4:26 PM, Nicolas Liochon <nkey...@gmail.com> wrote:
> 
>> i) Yes, or, at least, of often yes.
>> II) You're right. It's difficult to guess how much it would improve the
>> performances (there is a lot of caching effect), but using a single scan
>> could be an interesting optimisation imho.
>> 
>> Nicolas
>> 
>> 
>> On Mon, Feb 18, 2013 at 10:57 AM, Varun Sharma <va...@pinterest.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I am trying to batched get(s) on a cluster. Here is the code:
>>> 
>>> List<Get> gets = ...
>>> // Prepare my gets with the rows i need
>>> myHTable.get(gets);
>>> 
>>> I have two questions about the above scenario:
>>> i) Is this the most optimal way to do this ?
>>> ii) I have a feeling that if there are multiple gets in this case, on the
>>> same region, then each one of those shall instantiate separate scan(s)
>> over
>>> the region even though a single scan is sufficient. Am I mistaken here ?
>>> 
>>> Thanks
>>> Varun
>>> 
>> 

Reply via email to