Re: Design question

2012-04-26 Thread Mohit Anchlia
Ant suggestion or pointers would be helpful. Are there any best practices? On Mon, Apr 23, 2012 at 3:27 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I just wanted to check how do people design their storage directories for data that is sent to the system continuously. For eg: for a given

Design question

2012-04-23 Thread Mohit Anchlia
I just wanted to check how do people design their storage directories for data that is sent to the system continuously. For eg: for a given functionality we get data feed continuously writen to sequencefile, that is then coverted to more structured format using map reduce and stored in tab

Re: Hbase + mapreduce -- operational design question

2011-09-10 Thread Eugene Kirpichov
I believe HBase has some kind of TTL (timeout-based expiry) for records and it can clean them up on its own. On Sat, Sep 10, 2011 at 1:54 AM, Dhodapkar, Chinmay chinm...@qualcomm.com wrote: Hello, I have a setup where a bunch of clients store 'events' in an Hbase table . Also,

Re: Hbase + mapreduce -- operational design question

2011-09-10 Thread Sonal Goyal
Chinmay, how are you configuring your job? Have you checked using setScan and selecting the keys you care to run MR over? See http://ofps.oreilly.com/titles/9781449396107/mapreduce.html As a shameless plug - For your reports, see if you want to leverage Crux: https://github.com/sonalgoyal/crux

Hbase + mapreduce -- operational design question

2011-09-09 Thread Dhodapkar, Chinmay
Hello, I have a setup where a bunch of clients store 'events' in an Hbase table . Also, periodically(once a day), I run a mapreduce job that goes over the table and computes some reports. Now my issue is that the next time I don't want mapreduce job to process the 'events' that it has already

Job design question

2009-08-21 Thread udiw
://www.nabble.com/Job-design-question-tp25076132p25076132.html Sent from the Hadoop core-user mailing list archive at Nabble.com.