Yes I saw it. I followed Ted advice to use scan.setTimeRange(sometimestamp, Long.MAX_VALUE)
On Wed, Jul 3, 2013 at 11:23 PM, Asaf Mesika <[email protected]> wrote: > Seems right. You can make it more efficient by creating your result array > in advance and then fill it. > Regarding time filtering. Have you see that in Scan you can set start time > and end time? > > On Wednesday, July 3, 2013, Flavio Pompermaier wrote: > > > All my enums produce positive integers so I don't have +/-ve Integer > > problems. > > Obviously If I use fixed-length rowKeys I could take away the separator.. > > > > Sorry but I'm very a newbie in this field..I'm trying to understand how > to > > compose my key with the bytes.. > > Is it correct the following? > > > > final byte[] firstToken = Bytes.toBytes(source); > > final byte[] secondToken = Bytes.toBytes(type); > > final byte[] thirdToken = Bytes.toBytes(qualifier); > > final byte[] fourthToken = Bytes.toBytes(md5ofSomeString); > > byte[] rowKey = Bytes.add(firstToken,secondToken,thirdToken); > > rowKey = Bytes.add(rowKey,fourthToken); > > > > Best, > > Flavio > > > > > > On Wed, Jul 3, 2013 at 11:58 AM, Anoop John <[email protected]> > wrote: > > > > > When you make the RK and convert the int parts into byte[] ( Use > > > org.apache.hadoop.hbase.util.Bytes#toBytes(*int) *) it will give 4 > bytes > > > for every byte.. Be careful about the ordering... When u convert a > +ve > > > and -ve integer into byte[] and u do Lexiographical compare (as done in > > > HBase) u will see -ve number being greater than +ve.. If you dont have > > to > > > do deal with -ve numbers no issues :) > > > > > > Well when all the parts of the RK is of fixed width u will need any > > > seperator?? > > > > > > -Anoop- > > > > > > On Wed, Jul 3, 2013 at 2:44 PM, Flavio Pompermaier < > [email protected] > > > >wrote: > > > > > > > Yeah, I was thinking to use a normalization step in order to allow > the > > > use > > > > of FuzzyRowFilter but what is not clear to me is if integers must > also > > be > > > > normalized or not. > > > > I will explain myself better. Suppose that i follow your advice and I > > > > produce keys like: > > > > - 1|1|somehash|sometimestamp > > > > - 55|555|somehash|sometimestamp > > > > > > > > Whould they match the same pattern or do I have to normalize them to > > the > > > > following? > > > > - 001|001|somehash|sometimestamp > > > > - 055|555|somehash|sometimestamp > > > > > > > > Moreover, I noticed that you used dots ('.') to separate things > instead > > > of > > > > pipe ('|')..is there a reason for that (maybe performance or > whatever) > > or > > > > is just your favourite separator? > > > > > > > > Best, > > > > Flavio > > > > > > > > > > > > On Wed, Jul 3, 2013 at 10:12 AM, Mike Axiak <[email protected]> wrote: > > > > > > > > > I'm not sure if you're eliding this fact or not, but you'd be much > > > > > better off if you used a fixed-width format for your keys. So in > your > > > > > example, you'd have: > > > > > > > > > > PATTERN: source(4-byte-int).type(4-byte-int or smaller).fixed > 128-bit > > > > > hash.8-byte timestamp > > > > > > > > > > Example: \x00\x00\x00\x01\x00\x00\x02\x03.... > > > > > > > > > > The advantage of this is not only that it's significantly less data > > > > > (remember your key is stored on each KeyValue), but also you can > now > > > > > use FuzzyRowFilter and other techniques to quickly perform scans. > The > > > > > disadvantage is that you have to normalize the source-> integer > but I > > > > > find I can either store that in an enum or cache it for a long time > > so > > > > > it's not a big issue. > > > > > > > > > > -Mike > > > > > > > > > > On Wed, Jul 3, 2013 at 4:05 AM, Flavio Pompermaier < > > > [email protected] > > > > > > > > > > wrote: > > > > > > Thank you very much for the great support! > > > > > > This is how I thought to design my key: > > > > > > > > > > > > PATTERN: source|type|qualifier|hash(name)|timestamp > > > > > > EXAMPLE: > > > > > > > > google|appliance|oven|be9173589a7471a7179e928adc1a86f7|1372837702753 > > > > > > > > > > > > Do you think my key could be good for my scope (my search will be > > > > > > essentially by source or source|type)? > > > > > > Another point is that initially I will not have so many sources, > > so I > > > > > will > > > > > > probably have only google|* but in the next phases there could be > > > more > > > > > > sources.. > > > > > > > > > > > > Best, > > > > > > Flavio > > > > > > > > > > > > On Tue, Jul 2, 2013 at 7:53 PM, Ted Yu <[email protected]> > > wrote: > > > > > > > > > > > >> For #1, yes - the client receives less data after filtering. > > > > > >> > > > > > >> For #2, please take a look at TestMultiVersions > > > > > >> (./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java > in > > > > 0.94) > > > > > >> for time range: > > > > > >
