Re: How to get Last 1000 records from 1 millions records

ramkrishna vasudevan Thu, 25 Aug 2016 02:42:18 -0700

Hi Manjeet

For your first question regarding fetching last 1000 records

First in your scan you set your start Row with the bytes corresponding to (
A_9811111111_)
and let the end byte be the byte representation of  A_9811111111 + 1 . I
mean add +1 to the last byte of what comes out of  (A_9811111111_). So this
will ensure you scan only the rows corresponding to  (A_9811111111_).

Just thinking the first thing that I can see is that it may be easier to do
this with CPs than Filters. Because filters deals with per cell or that
row. Adding the results and maintaing the last 10k records may be
difficult. I have to see in detail if possible.

Do you know the number of columns you have?  If there are multiple columns
then it is quite tricky. But if you have only one column per row then or
you want only the row keys

You can implement an User Coprocessor and in that you can implement
preStoreScannerOpen(). Take for eg.  you have only one family so in that
case in you preStoreScannerOpen you will create your own StoreScanner and
in the StoreScanner.next() you can
just skip all KeyValues and during that process keep collecting your cells.
Ensure you keep collecting the cells row wise by adding to a list. You will
have to have only the latest 10000 cells in the list any time.

Every time keep checking if the row has reached the stopRow that is set in
the scan (so may be it moves to A_9811111112_).
Once you see this condition you may have to replace the list given by the
StoreScanner.next() call with the list that you have collected and send it
to the client.
I have not yet tried it but it can give you an idea with CPs.

With filters am not sure as I said as I need to read the flow and see if
there are any such APIs to mimic the above.

PS. Don't take this as a working algo. There may be reasons why it may not
work but you can see and read about CPs to see if something like above can
work out.

Regards
Ram

On Thu, Aug 25, 2016 at 2:16 PM, Manjeet Singh <manjeet.chand...@gmail.com>
wrote:

> Hi All
>
> I have one another question for same case
>
> below is my sample Hbase data  as we all know that hbase store data on the
> basis of rowkey (sorted)
> below is IP as we can see 2.168.129.81_1 is in last what I am expecting it
> shuld come just after 1.168.129.81_2
>
>
>
>  1.168.129.81_0
>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104396288,
> value=4
>  1.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104396288,
> value=1
>  1.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104396288,
> value=2
>  1.168.129.81_2
>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104396288,
> value=0
>  192.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104386671,
> value=2
>  192.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104386671,
> value=4
>  192.168.129.81_2
>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104386671,
> value=1
>  192.168.129.81_3
>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104386671,
> value=0
>  192.168.129.81_3
>  column=c2:D_com.stackoverflow/questions/3, timestamp=1472104386671,
> value=3
>  2.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/0, timestamp=1472104404609,
> value=0
>  2.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/1, timestamp=1472104404609,
> value=1
>  2.168.129.81_1
>  column=c2:D_com.stackoverflow/questions/2, timestamp=1472104404609,
> value=2
>  2.168.129.81_3
>  column=c2:D_com.stackoverflow/questions/4, timestamp=1472104404609,
> value=4
>
>
>
> On Thu, Aug 25, 2016 at 12:36 PM, Manjeet Singh <
> manjeet.chand...@gmail.com>
> wrote:
>
> > I am using some logical salt say I have mobile number in my row key so I
> > am using some algo and fitting this mobile number into some ASCII char
> > So each time I know what will be the salt so its clear to me and it will
> > never change the order
> > example
> > if based on my algo I get A for 9811111111
> > so each time it will always return me A for 9811111111
> > so if I have my row key Like
> > A_9811111111_101
> > A_9811111111_102
> > A_9811111111_103
> > A_9811111111_104
> > A_9811111111_105
> > A_9811111111_106
> > A_9811111111_107
> > A_9811111111_108
> >
> > it will sort my row key in same manner as showing above now these are
> > millions of record now i want to get last 10000 records
> > is their any way to get it, my concern is to perform all calcuation on
> > server side not client side.
> >
> >
> > Thanks
> > Manjeet
> >
> >
> > On Thu, Aug 25, 2016 at 1:06 AM, Esteban Gutierrez <este...@cloudera.com
> >
> > wrote:
> >
> >> As long as new rows are added to the latest region that "might" work.
> But
> >> if the table is using hashed keys or rows are added randomly to the
> table
> >> then retrieving the last million will be trickier and you will have to
> >> scan
> >> based on timestamp (if not modified) and then filter one more time.
> >>
> >> esteban.
> >>
> >>
> >> --
> >> Cloudera, Inc.
> >>
> >>
> >> On Wed, Aug 24, 2016 at 12:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >>
> >> > The following API should help in your case:
> >> >
> >> >   public Scan setReversed(boolean reversed) {
> >> >
> >> > Cheers
> >> >
> >> > On Wed, Aug 24, 2016 at 12:05 PM, Manjeet Singh <
> >> > manjeet.chand...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi all
> >> > >
> >> > > Hbase didnt provide sorting on column but rowkey store in sorted
> form
> >> > > like small value first and greater value last
> >> > >
> >> > > example
> >> > > 1
> >> > > 2
> >> > > 3
> >> > > 4
> >> > > 5
> >> > > 6
> >> > > 7
> >> > > and so on
> >> > >
> >> > > Assume I have 1 Miilions record but i want to look last 1000 records
> >> only
> >> > > Is their any way to do this? I don't want to perform any calculation
> >> on
> >> > > client side so may be any filter can help on it?
> >> > >
> >> > > Thanks
> >> > > Manjeet
> >> > >
> >> > > --
> >> > > luv all
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > luv all
> >
>
>
>
> --
> luv all
>

Re: How to get Last 1000 records from 1 millions records

Reply via email to