Hi Andre Have a look at HbaseWD from Sematext: https://github.com/sematext/HBaseWD
The strategy there is to prefix monotonic row keys by a bin number. This spreads the writes across N bins but still allows efficient scans assuming N is not large (N scans are required). -Simon On May 25, 2012 11:13 AM, "Andre Reiter" <a.rei...@web.de> wrote: > i'm starting a new project, which is pretty simple > it will be something like google analytics, but of course a bit smaller > what is required: web servers handle requests with a kind of generic > key/value list > that requests will come at a pretty much high rate, lets say 1000 req per > second > so far i guess, there will be no problem, to handle that, and to store it > in the hbase, right? > > on the other hand, of course, the data must be processed and monitored > that is required to be time based, i.e. i want to get statistics about a > time period, lets say from day A to day B > that should wotk, BUT! > if i want to have a fast scan, i need to have the time stamp in the row > key, right? other wise i well need to make a full scan, which can take a > lot of time, if there is much data > but if i have the timestamp in the key, i will end up having hot regions, > like described here http://ikaisays.com/2011/01/** > 25/app-engine-datastore-tip-**monotonically-increasing-**values-are-bad/<http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/> > so what would be a better way, to have fast scans without hot regions? > > cheers > andre > >