Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Nicolas Liochon
teresting. One could >> identify clusters of close row keys in the Gets and issue a Scan for each >> cluster. >> >> >> -- Lars >> >> >> >> >> From: Nicolas Liochon >> To: user >> Sent: Tuesd

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Nicolas Liochon
ntify clusters of close row keys in the Gets and issue a Scan for each > cluster. > > > -- Lars > > > > > From: Nicolas Liochon > To: user > Sent: Tuesday, February 19, 2013 9:28 AM > Subject: Re: Optimizing Multi Gets in hb

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread lars hofhansl
a Scan for each cluster. -- Lars From: Nicolas Liochon To: user Sent: Tuesday, February 19, 2013 9:28 AM Subject: Re: Optimizing Multi Gets in hbase Imho,  the easiest thing to do would be to write a filter. You need to order the rows, then you can use

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Varun Sharma
resent the rows you are looking for in a filter, so that would > > > probably > > > > shift this slightly more towards Gets (just imaging a Filter that to > > > encode > > > > 100k random row keys to be matched; since Filters are instantiated > > st

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Nicolas Liochon
table has 10bn rows, in > > that > > > case it is almost certain that the Gets are faster than a scan. > > > Now image the Gets only cover a small key range. With statistics we > could > > > tell whether it would beneficial to turn this into a scan. > >

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Varun Sharma
As I said below, the crux of the matter is having some histograms of your > > data, so that such a decision could be made automatically. > > > > > > -- Lars > > > > > > > > > > From: lars hofhansl > > To: &q

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Nicolas Liochon
tore > there is another natural limit there). > > > As I said below, the crux of the matter is having some histograms of your > data, so that such a decision could be made automatically. > > > -- Lars > > > > ____ > From: lars hofha

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread lars hofhansl
the matter is having some histograms of your data, so that such a decision could be made automatically. -- Lars From: lars hofhansl To: "user@hbase.apache.org" Sent: Monday, February 18, 2013 5:48 PM Subject: Re: Optimizing Multi Gets in hbase

Re: Optimizing Multi Gets in hbase

2013-02-18 Thread Varun Sharma
with a appropriate filter (may have to > implement your own filter, though). Maybe we could a version of RowFilter > that match against multiple keys. > > > -- Lars > > > > ____________ > From: Varun Sharma > To: user@hbase.apache.org > Sent

Re: Optimizing Multi Gets in hbase

2013-02-18 Thread lars hofhansl
-- Lars From: Varun Sharma To: user@hbase.apache.org Sent: Monday, February 18, 2013 1:57 AM Subject: Optimizing Multi Gets in hbase Hi, I am trying to batched get(s) on a cluster. Here is the code: List gets = ... // Prepare my gets with the rows i need myHTabl

Re: Optimizing Multi Gets in hbase

2013-02-18 Thread Michael Segel
So you'd have to do a little bit of home work up front. Supposed you have to pull some data from 30K rows out of 10 Mil? If they are in sort order, you could determine the regions and then think about doing a couple of scans in parallel. But that may be more work than just doing the set of get

Re: Optimizing Multi Gets in hbase

2013-02-18 Thread ramkrishna vasudevan
If the scan is happening on the same region then going for Scan would be a better option. Regards RAm On Mon, Feb 18, 2013 at 4:26 PM, Nicolas Liochon wrote: > i) Yes, or, at least, of often yes. > II) You're right. It's difficult to guess how much it would improve the > performances (there is

Re: Optimizing Multi Gets in hbase

2013-02-18 Thread Nicolas Liochon
i) Yes, or, at least, of often yes. II) You're right. It's difficult to guess how much it would improve the performances (there is a lot of caching effect), but using a single scan could be an interesting optimisation imho. Nicolas On Mon, Feb 18, 2013 at 10:57 AM, Varun Sharma wrote: > Hi, >

RE: Optimizing Multi Gets in hbase

2013-02-18 Thread Anoop Sam John
It will instantiate one scan op per Get -Anoop- From: Varun Sharma [va...@pinterest.com] Sent: Monday, February 18, 2013 3:27 PM To: user@hbase.apache.org Subject: Optimizing Multi Gets in hbase Hi, I am trying to batched get(s) on a cluster. Here is

Re: Optimizing Multi Gets in hbase

2013-02-18 Thread Viral Bajaria
Hi Varun, Are your gets around sequential keys ? If so, you might benefit by doing scans with a start and stop. If they are not sequential I don't think there would be a better way from the way you describe the problem. Besides that, some of the questions that come to mind: - How many GET(s) are

Optimizing Multi Gets in hbase

2013-02-18 Thread Varun Sharma
Hi, I am trying to batched get(s) on a cluster. Here is the code: List gets = ... // Prepare my gets with the rows i need myHTable.get(gets); I have two questions about the above scenario: i) Is this the most optimal way to do this ? ii) I have a feeling that if there are multiple gets in this c