Re: Hbase + mapreduce -- operational design question

2011-09-10 Thread Eugene Kirpichov
I believe HBase has some kind of TTL (timeout-based expiry) for records and it can clean them up on its own. On Sat, Sep 10, 2011 at 1:54 AM, Dhodapkar, Chinmay chinm...@qualcomm.com wrote: Hello, I have a setup where a bunch of clients store 'events' in an Hbase table . Also,

Re: Hbase + mapreduce -- operational design question

2011-09-10 Thread Sonal Goyal
Chinmay, how are you configuring your job? Have you checked using setScan and selecting the keys you care to run MR over? See http://ofps.oreilly.com/titles/9781449396107/mapreduce.html As a shameless plug - For your reports, see if you want to leverage Crux: https://github.com/sonalgoyal/crux

Hbase + mapreduce -- operational design question

2011-09-09 Thread Dhodapkar, Chinmay
Hello, I have a setup where a bunch of clients store 'events' in an Hbase table . Also, periodically(once a day), I run a mapreduce job that goes over the table and computes some reports. Now my issue is that the next time I don't want mapreduce job to process the 'events' that it has already