If it's helpful to state it in generic terms: specifying a range of HBase timestamps is *only* a post-filter (server-side) and *never* a primary search criteria.
In other words, searching by the HBase timestamp is a full-table scan (exhaustive search). While the timestamp can be nice for certain use-cases, I think it best to use it sparingly and only for its original purpose (preserving old versions of a cell), and not as a "query parameter". I've seen way too many use-cases where people think they're being tricky by using timestamps to implement a poor-man's search. This works for small amounts of data, but quickly falls to its knees with non-trivial data-sets. On Wed, Apr 12, 2017 at 3:05 PM, Ted Yu <yuzhih...@gmail.com> wrote: > If you change the rowkey, it would be tricky to satisfy what you stated in > your first email: > > bq. retrieve the most recent 10 rows > > Since the same uuid may have many rows which are no longer sorted in the > order amenable to your query. > > Cheers > > On Wed, Apr 12, 2017 at 10:43 AM, Josh <jof...@gmail.com> wrote: > >> Hi Ted, >> >> Thanks for the fast reply! >> Ok I see - just out of interest, if I changed my row key to be >> uuid#timestamp (instead of uuid#reverse_timestamp) - would the timestamp >> approach still be equally efficient? I just want to understand whether or >> not the timestamp approach is relying on the ordering of my row keys. >> >> Josh >> >> On Wed, Apr 12, 2017 at 6:39 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> > Since STARTROW is specified (with uuid) in both of your examples, I think >> > their efficiency should be tantamount. >> > >> > Cheers >> > >> > On Wed, Apr 12, 2017 at 10:33 AM, Josh <jof...@gmail.com> wrote: >> > >> > > Hi, >> > > >> > > I am just getting started with HBase, and have a question about the >> > > efficiency of timestamp based scans. >> > > >> > > My table's row key has structure `uuid#reverse_timestamp` where >> > > reverse_timestamp is (java.lang.Long.MAX_VALUE - time in millis when >> the >> > > row was written). For a given uuid I want to be able to retrieve the >> most >> > > recent 10 rows in the table where timestamp is greater than x. It's >> > > possible that a given uuid may have many thousands of rows (with >> > different >> > > timestamps). >> > > >> > > I found there are two ways to run my query: >> > > 1. use HBase's built in timestamps and scan a time range: >> > > > scan 'mytable', {STARTROW => '647b2194-fbb8-46af-95ba-f498ddc8adcc', >> > > TIMERANGE => [x, current_time], LIMIT => 10} >> > > >> > > 2. use only my row keys to do the scan, with STARTROW and STOPROW: >> > > scan 'mytable', {STARTROW => '647b2194-fbb8-46af-95ba-f498ddc8adcc', >> > > STOPROW='647b2194-fbb8-46af-95ba-f498ddc8adcc#x', LIMIT => 10} >> > > >> > > Both of these seem to work - but is one more efficient that the other? >> > > >> > > Thanks for any advice, >> > > Josh >> > > >> > >>