Building it from scratch with a homegrown recipe.
Follow and lend us suggestions, advice, kudos, etc. If you're mean... Go away.
You suck.
www.7ops.com
@7Ops
highpoi...@7ops.com
Cheers.
Sent from my iPhone
Hi,
Some of the MR jobs I run doesn't need sorting of map-output in each
partition. Is there someway I can disable it?
Any help?
Thanks
jS
I believe HBase has some kind of TTL (timeout-based expiry) for
records and it can clean them up on its own.
On Sat, Sep 10, 2011 at 1:54 AM, Dhodapkar, Chinmay
chinm...@qualcomm.com wrote:
Hello,
I have a setup where a bunch of clients store 'events' in an Hbase table .
Also,
Chinmay, how are you configuring your job? Have you checked using setScan
and selecting the keys you care to run MR over? See
http://ofps.oreilly.com/titles/9781449396107/mapreduce.html
As a shameless plug - For your reports, see if you want to leverage Crux:
https://github.com/sonalgoyal/crux
Run a map-only job with #reduces set to 0.
Arun
On Sep 10, 2011, at 2:06 AM, john smith wrote:
Hi,
Some of the MR jobs I run doesn't need sorting of map-output in each
partition. Is there someway I can disable it?
Any help?
Thanks
jS
Is there a way to collate the possibly large number of map output files,
though?
On Sat, Sep 10, 2011 at 2:48 PM, Arun C Murthy a...@hortonworks.com wrote:
Run a map-only job with #reduces set to 0.
Arun
On Sep 10, 2011, at 2:06 AM, john smith wrote:
Hi,
Some of the MR jobs I run
On Sat, Sep 10, 2011 at 12:33 PM, Meng Mao meng...@gmail.com wrote:
Is there a way to collate the possibly large number of map output files,
though?
You can make fewer mappers by setting the mapred.min.split.size to define
the smallest input that will be given to a mapper.
There isn't
Hey,
I have reduce phases too. But for each reduce, I dont need sorted input
(map-output for that corresponding reduce task).
Setting #red to 0 completely removes the reduce phase.
Am I missing something?
Thanks,
On Sun, Sep 11, 2011 at 12:18 AM, Arun C Murthy a...@hortonworks.com wrote:
Run
The point of a 'reduce phase' is to aggregate keys from different maps (i.e.
all inputs).
I'm not sure what you are trying to do, but a use-case will help.
IAC, the only way to achieve what you are trying to do is to run to jobs with
the first a map-only job (i.e. #reduces = 0).
Arun
On Sep