On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John <anoo...@huawei.com> wrote:
> Yes as you say when the no of rows to be returned is becoming more and > more the latency will be becoming more. seeks within an HFile block is > some what expensive op now. (Not much but still) The new encoding prefix > trie will be a huge bonus here. There the seeks will be flying.. [Ted also > presented this in the Hadoop China] Thanks to Matt... :) I am trying to > measure the scan performance with this new encoding . Trying to back port a > simple patch for 94 version just for testing... Yes when the no of > results to be returned is more and more any index will become less > performing as per my study :) > > Do you have link to that presentation? > >btw, quick question- in your presentation, the scale there is seconds or > mill-seconds:) > > It is seconds. Dont consider the exact values. What is the % of increase > in latency is important :) Those were not high end machines. > > -Anoop- > ________________________________________ > From: Shengjie Min [kelvin....@gmail.com] > Sent: Thursday, December 27, 2012 9:59 PM > To: user@hbase.apache.org > Subject: Re: HBase - Secondary Index > > >Didnt follow u completely here. There wont be any get() happening.. As > the > >exact rowkey in a region we get from the index table, we can seek to the > >exact position and return that row. > > Sorry, When I misused "get()" here, I meant seeking. Yes, if it's just > small number of rows returned, this works perfect. As you said you will get > the exact rowkey positions per region, and simply seek them. I was trying > to work out the case that when the number of result rows increases > massively. Like in Anil's case, he wants to do a scan query against the > 2ndary index(timestamp): "select all rows from timestamp1 to timestamp2" > given no customerId provided. During that time period, he might have a big > chunk of rows from different customerIds. The index table returns a lot of > rowkey positions for different customerIds (I believe they are scattered in > different regions), then you end up seeking all different positions in > different regions and return all the rows needed. According to your > presentation page14 - Performance Test Results (Scan), without index, it's > a linear increase as result rows # increases. on the other hand, with > index, time spent climbs up way quicker than the case without index. > > btw, quick question- in your presentation, the scale there is seconds or > mill-seconds:) > > - Shengjie > > > On 27 December 2012 15:54, Anoop John <anoop.hb...@gmail.com> wrote: > > > >how the massive number of get() is going to > > perform againt the main table > > > > Didnt follow u completely here. There wont be any get() happening.. As > the > > exact rowkey in a region we get from the index table, we can seek to the > > exact position and return that row. > > > > -Anoop- > > > > On Thu, Dec 27, 2012 at 6:37 PM, Shengjie Min <kelvin....@gmail.com> > > wrote: > > > > > how the massive number of get() is going to > > > perform againt the main table > > > > > > > > > -- > All the best, > Shengjie Min >