Couple of raw implementation thoughts: 1. Change the schema Take the timestamps inside the row. Rowkey is the hash(objectid), and column qualifier is the LONG.MAX_VALUE - changeDate - getTime(). You can even save it using Bytes.toBytes(ts) to save space - will always be 8 bytes, instead of the longer bytes string.
This will enable you to "view" all the timestamps related to a single objectid in one place. The problem with placing TS in the rowkey is that it's all over the place - spread across regions, so it's harder to get a valid who is before who response (indexing), without paying a penalty on insertion for keeping it up to date. I have two ideas - one is expensive read and the other is expensive write. Expensive read: When you write, you write two columns for that row: one named i_[Rounded-to-the-hour-timestamp] with value of 1 (dummy value), indicating you have timestamps with this hour, and the other is your original column named ts_[timestamp]. You can implement a Filter, which upon arriving at the required row, will first start by reading all "hour" timestamps, so it can find out where to jump in the ts_[timestamp] column. Upon arriving to the required hour timestamp matching the one you are looking for, you can know which hour was before it, thus you can jump to it (using the hint method in the Filter interface). The read is expensive since you need to read all i_[Rounded-to-the-hour-timestamp] columns in the worst case. Maybe you relax it by saying I only look for 24 hours before the original column hour, thus reducing it only to 24 read worst case. The write is cheap, the read is not. Expensive write: You can keep a column named i, which maintains an encoded version of an index for the hours, thus when you read, you achieve the correct before hour on log(n) searching through it and then jump to the ts_[timestamp] column. The write will be expensive, since you need to read-modify-write this column on each timestamp you write. The read is sort of cheap. 2. I though I had another option of using RegionObserver and EndpointCoprocessor but the biggest problem is the the predecessor timestamp may be in another region server. The first idea is more implementable :) On Mon, Apr 29, 2013 at 8:05 PM, <ri...@laposte.net> wrote: > > Thanx for the quick answer. > > > For the next key, I think you can simply use your current key as your > > scanner first key. You will then find the one which is just after. > > Then you will have to verify the MD5 hash to make sure it's still for > > the same object. > Right, this is basically easy. > > > First, if you know that you are storing data about every 10 seconds, > > set the startRow with something like > > getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n", > > (Long.MAX_VALUE - (changeDate.getTime() - 60000))) then ready the few > > lines you will have until you find your current line, and keep the > > last one. > > Actually it is impossible to know the timerange for which there will be a > next entry > > > > > Else, if you don't know, you will have to start with > > scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId))); but you > > might have to skip MANY lines before finding the right one. Do I don't > > really recommend that. > > ouch, obviously not very efficient. I assume even with a filter ? > > Message du 29/04/13 18:18 > > De : "Jean-Marc Spaggiari" > > A : user@hbase.apache.org > > Copie à : > > Objet : Re: Read access pattern > > > > Hum. > > > > For the next key, I think you can simply use your current key as your > > scanner first key. You will then find the one which is just after. > > Then you will have to verify the MD5 hash to make sure it's still for > > the same object. > > > > scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId)) + > > String.format("%19d\n", (Long.MAX_VALUE - changeDate.getTime()))); > > > > If you want to find the one just before, quickly, I see 2 options. > > > > First, if you know that you are storing data about every 10 seconds, > > set the startRow with something like > > getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n", > > (Long.MAX_VALUE - (changeDate.getTime() - 60000))) then ready the few > > lines you will have until you find your current line, and keep the > > last one. > > > > Else, if you don't know, you will have to start with > > scan.setStartRow(getMD5AsHex(Bytes.toBytes(myObjectId))); but you > > might have to skip MANY lines before finding the right one. Do I don't > > really recommend that. > > > > JM > > > > 2013/4/29 Shahab Yunus : > > > I think you cannot use the scanner simply to to a range scan here as > your > > > keys are not monotonically increasing. You need to apply logic to > > > decode/reverse your mechanism that you have used to hash your keys at > the > > > time of writing. You might want to check out the SemaText library which > > > does distributed scans and seem to handle the scenarios that you want > to > > > implement. > > > > http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ > > > > > > > > > On Mon, Apr 29, 2013 at 11:03 AM, wrote: > > > > > >> Hi, > > >> > > >> I have a rowkey defined by : > > >> getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n", > > >> (Long.MAX_VALUE - changeDate.getTime())); > > >> > > >> How could I get the previous and next row for a given rowkey ? > > >> For instance, I have the following ordered keys : > > >> > > >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370673172227807 > > >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807 > > >> >00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807 > > >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807 > > >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674987271807 > > >> > > >> If I choose the rowkey : > > >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807, what would be > the > > >> correct scan to get the previous and next key ? > > >> Result would be : > > >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807 > > >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807 > > >> > > >> Thank you ! > > >> R. > > >> > > >> Une messagerie gratuite, garantie à vie et des services en plus, ça > vous > > >> tente ? > > >> Je crée ma boîte mail www.laposte.net > > >> > > > > Une messagerie gratuite, garantie à vie et des services en plus, ça vous > tente ? > Je crée ma boîte mail www.laposte.net >