Thank you very much for the great support! This is how I thought to design my key:
PATTERN: source|type|qualifier|hash(name)|timestamp EXAMPLE: google|appliance|oven|be9173589a7471a7179e928adc1a86f7|1372837702753 Do you think my key could be good for my scope (my search will be essentially by source or source|type)? Another point is that initially I will not have so many sources, so I will probably have only google|* but in the next phases there could be more sources.. Best, Flavio On Tue, Jul 2, 2013 at 7:53 PM, Ted Yu <yuzhih...@gmail.com> wrote: > For #1, yes - the client receives less data after filtering. > > For #2, please take a look at TestMultiVersions > (./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java in 0.94) > for time range: > > scan = new Scan(); > > scan.setTimeRange(1000L, Long.MAX_VALUE); > For row key selection, you need a filter. Take a look at > FuzzyRowFilter.java > > Cheers > > On Tue, Jul 2, 2013 at 10:35 AM, Flavio Pompermaier <pomperma...@okkam.it > >wrote: > > > Thanks for the reply! I thus have two questions more: > > > > 1) is it true that filtering on timestamps doesn't affect performance..? > > 2) could you send me a little snippet of how you would do such a filter > (by > > row key + timestamps)? For example get all rows whose key starts with > > 'someid-' and whose timestamps is greater than some timestamp? > > > > Best, > > Flavio > > > > > > On Tue, Jul 2, 2013 at 6:25 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > bq. Using timestamp in row-keys is discouraged > > > > > > The above is true. > > > Prefixing row key with timestamp would create hot region. > > > > > > bq. should I filter by a simpler row-key plus a filter on timestamp? > > > > > > You can do the above. > > > > > > On Tue, Jul 2, 2013 at 9:13 AM, Flavio Pompermaier < > pomperma...@okkam.it > > > >wrote: > > > > > > > Hi to everybody, > > > > > > > > in my use case I have to perform batch analysis skipping old data. > > > > For example, I want to process all rows created after a certain > > > timestamp, > > > > passed as parameter. > > > > > > > > What is the most effective way to do this? > > > > Should I design my row-key to embed timestamp? > > > > Or just filtering by timestamp of the row is fast as well? Or what > > else? > > > > > > > > Initially I was thinking to compose my key as: > > > > timestamp|source|title|type > > > > > > > > but: > > > > > > > > 1) Using timestamp in row-keys is discouraged > > > > 2) If this design is ok, using this approach I still have problems > > > > filtering by timestamp because I cannot found a way to numerically > > filer > > > > (instead of alphanumerically/by string). Example: > > > > 1372776400441|something has timestamp lesser > > > > than 1372778470913|somethingelse but I cannot filter all row whose > key > > is > > > > "numerically" greater than 1372776400441. Is it possible to overcome > > this > > > > issue? > > > > 3) If this design is not ok, should I filter by a simpler row-key > plus > > a > > > > filter on timestamp? Or what else? > > > > > > > > Best, > > > > Flavio > > > > > > > > > >