Ant suggestion or pointers would be helpful. Are there any best practices?
On Mon, Apr 23, 2012 at 3:27 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
I just wanted to check how do people design their storage directories for
data that is sent to the system continuously. For eg: for a given
I just wanted to check how do people design their storage directories for
data that is sent to the system continuously. For eg: for a given
functionality we get data feed continuously writen to sequencefile, that is
then coverted to more structured format using map reduce and stored in tab
I believe HBase has some kind of TTL (timeout-based expiry) for
records and it can clean them up on its own.
On Sat, Sep 10, 2011 at 1:54 AM, Dhodapkar, Chinmay
chinm...@qualcomm.com wrote:
Hello,
I have a setup where a bunch of clients store 'events' in an Hbase table .
Also,
Chinmay, how are you configuring your job? Have you checked using setScan
and selecting the keys you care to run MR over? See
http://ofps.oreilly.com/titles/9781449396107/mapreduce.html
As a shameless plug - For your reports, see if you want to leverage Crux:
https://github.com/sonalgoyal/crux
Hello,
I have a setup where a bunch of clients store 'events' in an Hbase table .
Also, periodically(once a day), I run a mapreduce job that goes over the table
and computes some reports.
Now my issue is that the next time I don't want mapreduce job to process the
'events' that it has already
://www.nabble.com/Job-design-question-tp25076132p25076132.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.