Re: Pagination with HBase - getting previous page of data

Asaf Mesika Sun, 03 Feb 2013 06:08:19 -0800

Here are my thoughts on this matter:

1. If you define setCaching(numOfRows) on the the scan object, you can
check before each call to make sure you haven't passed your page limit,
thus won't get to the point in which you retrieve from each region pageSize
results.


2. I think its o.k. for the UI to present a certain point in time in the
database on offer paging on that. You can achieve that by taking current
timestamp (System.currentTime()) and force the results to returned up to
that time by using scan.setTimeRange(0, currentTime). If you save
currentTime and send it back with the results to the UI, it can keep
sending it to backend, thus ensuring you're viewing that point in time.
If rows keeps being inserted, their timestamp will be greater, thus not
displayed


On Wed, Jan 30, 2013 at 2:42 PM, Toby Lazar <tla...@gmail.com> wrote:

> Sounds like if you had 1000 regions, each with 99 rows, and you asked
> for 100 that you'd get back 99,000. My guess is that a Filter is
> serialized once and that is sent successively to each region and that
> it isn't updated between regions.  Don't think doing that would be too
> easy.
>
> Toby
>
> On 1/30/13, Jean-Marc Spaggiari <jean-m...@spaggiari.org> wrote:
> > Hi Anoop,
> >
> > So does it mean the scanner can send back LIMIT*2-1 lines max? Reading
> > 100 rows from the 2nd region is using extra time and resources. Why
> > not ask for only the number of missing lines?
> >
> > JM
> >
> > 2013/1/30, Anoop Sam John <anoo...@huawei.com>:
> >> @Anil
> >>
> >>>I could not understand that why it goes to multiple regionservers in
> >> parallel. Why it cannot guarantee results <= page size( my guess: due to
> >> multiple RS scans)? If you have used it then maybe you can explain the
> >> behaviour?
> >>
> >> Scan from client side never go to multiple RS in parallel. Scan from
> >> HTable
> >> API will be sequential with one region after the other. For every region
> >> it
> >> will open up scanner in the RS and do next() calls. The filter will be
> >> instantiated at server side per region level ...
> >>
> >> When u need 100 rows in the page and you created a Scan at client side
> >> with
> >> the filter and suppose there are 2 regions, 1st the scanner is opened at
> >> for
> >> region1 and scan is happening. It will ensure that max 100 rows will be
> >> retrieved from that region.  But when the region boundary crosses and
> >> client
> >> automatically open up scanner for the region2, there also it will pass
> >> filter with max 100 rows and so from there also max 100 rows can come..
> >> So
> >> over all at the client side we can not guartee that the scan created
> will
> >> only scan 100 rows as a whole from the table.
> >>
> >> I think I am making it clear.   I have not PageFilter at all.. I am just
> >> explaining as per the knowledge on scan flow and the general filter
> >> usage.
> >>
> >> "This is because the filter is applied separately on different region
> >> servers. It does however optimize the scan of individual HRegions by
> >> making
> >> sure that the page size is never exceeded locally. "
> >>
> >> I guess it need to be saying that   "This is because the filter is
> >> applied
> >> separately on different regions".
> >>
> >> -Anoop-
> >>
> >> ________________________________________
> >> From: anil gupta [anilgupt...@gmail.com]
> >> Sent: Wednesday, January 30, 2013 1:33 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: Pagination with HBase - getting previous page of data
> >>
> >> Hi Mohammad,
> >>
> >> You are most welcome to join the discussion. I have never used
> PageFilter
> >> so i don't really have concrete input.
> >> I had a look at
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> >> I could not understand that why it goes to multiple regionservers in
> >> parallel. Why it cannot guarantee results <= page size( my guess: due to
> >> multiple RS scans)? If you have used it then maybe you can explain the
> >> behaviour?
> >>
> >> Thanks,
> >> Anil
> >>
> >> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <donta...@gmail.com>
> >> wrote:
> >>
> >>> I'm kinda hesitant to put my leg in between the pros ;)But, does it
> >>> sound
> >>> sane to use PageFilter for both rows and columns and having some
> >>> additional
> >>> logic to handle the 'nth' page logic?It'll help us in both kind of
> >>> paging.
> >>>
> >>> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> >>> jean-m...@spaggiari.org>
> >>> wrote:
> >>> > Hi Anil,
> >>> >
> >>> > I think it really depend on the way you want to use the pagination.
> >>> >
> >>> > Do you need to be able to jump to page X? Are you ok if you miss a
> >>> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> >>> > page indexes are a day old? Do you need to paginate over 300 colums?
> >>> > Or just 1? Do you need to always have the exact same number of
> entries
> >>> > in each page?
> >>> >
> >>> > For my usecase I need to be able to jump to the page X and I don't
> >>> > have any content. I have hundred of millions lines. Only the rowkey
> >>> > matter for me and I'm fine if sometime I have 50 entries displayed,
> >>> > and sometime only 45. So I'm thinking about calculating which row is
> >>> > the first one for each page, and store that separatly. Then I just
> >>> > need to run the MR daily.
> >>> >
> >>> > It's not a perfect solution I agree, but this might do the job for
> me.
> >>> > I'm totally open to all other idea which might do the job to.
> >>> >
> >>> > JM
> >>> >
> >>> > 2013/1/29, anil gupta <anilgupt...@gmail.com>:
> >>> >> Yes, your suggested solution only works on RowKey based pagination.
> >>> >> It
> >>> will
> >>> >> fail when you start filtering on the basis of columns.
> >>> >>
> >>> >> Still, i would say it's comparatively easier to maintain this at
> >>> >> Application level rather than creating tables for pagination.
> >>> >>
> >>> >> What if you have 300 columns in your schema. Will you create 300
> >>> >> tables?
> >>> >> What about handling of pagination when filtering is done based on
> >>> multiple
> >>> >> columns ("and" and "or" conditions)?
> >>> >>
> >>> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> >>> >> jean-m...@spaggiari.org> wrote:
> >>> >>
> >>> >>> No, no killer solution here ;)
> >>> >>>
> >>> >>> But I'm still thinking about that because I might have to implement
> >>> >>> some pagination options soon...
> >>> >>>
> >>> >>> As you are saying, it's only working on the row-key, but if you
> want
> >>> >>> to do the same-thing on non-rowkey, you might have to create a
> >>> >>> secondary index table...
> >>> >>>
> >>> >>> JM
> >>> >>>
> >>> >>> 2013/1/27, anil gupta <anilgupt...@gmail.com>:
> >>> >>> > That's alright..I thought that you have come-up with a killer
> >>> solution.
> >>> >>> So,
> >>> >>> > got curious to hear your ideas. ;)
> >>> >>> > It seems like your below mentioned solution will not work on
> >>> filtering
> >>> >>> > on
> >>> >>> > non row-key columns since when you are deciding the page numbers
> >>> >>> > you
> >>> >>> > are
> >>> >>> > only considering rowkey.
> >>> >>> >
> >>> >>> > Thanks,
> >>> >>> > Anil
> >>> >>> >
> >>> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >>> >>> > jean-m...@spaggiari.org> wrote:
> >>> >>> >
> >>> >>> >> Hi Anil,
> >>> >>> >>
> >>> >>> >> I don't have a solution. I never tought about that ;) But I was
> >>> >>> >> thinking about something like you create a 2nd table where you
> >>> >>> >> place
> >>> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
> >>> >>> >> specific page, you query by the number, found the key, and you
> >>> >>> >> know
> >>> >>> >> where to start you scan in the main table.
> >>> >>> >>
> >>> >>> >> The issue is properly the number for each lines since with a MR
> >>> >>> >> you
> >>> >>> >> don't know where you are from the beginning. But you can built
> >>> >>> >> something where you store the line number from the beginning of
> >>> >>> >> the
> >>> >>> >> region, then when all regions are parsed you can reconstruct the
> >>> total
> >>> >>> >> numbering... That should work...
> >>> >>> >>
> >>> >>> >> JM
> >>> >>> >>
> >>> >>> >> 2013/1/25, anil gupta <anilgupt...@gmail.com>:
> >>> >>> >> > Inline...
> >>> >>> >> >
> >>> >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >>> >>> >> > jean-m...@spaggiari.org> wrote:
> >>> >>> >> >
> >>> >>> >> >> Hi Anil,
> >>> >>> >> >>
> >>> >>> >> >> The issue is that all the other sub-sequent page start should
> >>> >>> >> >> be
> >>> >>> moved
> >>> >>> >> >> too...
> >>> >>> >> >>
> >>> >>> >> > Yes, this is a possibility. Hence the Developer has to take
> >>> >>> >> > care
> >>> of
> >>> >>> >> > this
> >>> >>> >> > case. It might also be possible that the pageSize is not a
> hard
> >>> >>> >> > limit
> >>> >>> >> > on
> >>> >>> >> > number of results(more like a hint or suggestion on size). I
> >>> >>> >> > would
> >>> >>> >> > say
> >>> >>> >> > it
> >>> >>> >> > varies by use case.
> >>> >>> >> >
> >>> >>> >> >>
> >>> >>> >> >> so if you want to jump directly to page n, you might be
> >>> >>> >> >> totally
> >>> >>> >> >> shifted because of all the data inserted in the meantime...
> >>> >>> >> >>
> >>> >>> >> >> If you want a real complete pagination feature, you might
> want
> >>> >>> >> >> to
> >>> >>> have
> >>> >>> >> >> a coproccessor or a MR updating another table refering to the
> >>> >>> >> >> pages....
> >>> >>> >> >>
> >>> >>> >> > Well, the solution depends on the use case. I will be doing
> >>> >>> >> > pagination
> >>> >
> >>>
> >>> --
> >>> Warm Regards,
> >>> Tariq
> >>> https://mtariq.jux.com/
> >>> cloudfront.blogspot.com
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Anil Gupta
> >
>
> --
> Sent from my mobile device
>

Re: Pagination with HBase - getting previous page of data

Reply via email to