Re: Number of Reducers Set to One

2011-05-12 Thread Geoffry Roberts
Bobby, Thanks for such a thoughtful response. I have a data set that represents all the people that pass through Las Vegas over a course of time, say five years, which comes to about 175 - 200 million people. Each record is a person, and it contains fields for where they came from, left to; tim

Re: Number of Reducers Set to One

2011-05-12 Thread Robert Evans
Geoffry, That really depends on how much data you are processing, and the algorithm you need to use to process the data. I did something similar a while ago with a medium amount of data and we saw significant speed up by first assigning each record a new key based off of the expected range of

Number of Reducers Set to One

2011-05-12 Thread Geoffry Roberts
All, I am mostly seeking confirmation as to my thinking on this matter. I have an MR job that I believe will force me into using a single reducer. The nature of the process is one where calculations performed on a given record rely on certain accumulated values whose calculation depends on rollin

Re: How to create a SequenceFile more faster?

2011-05-12 Thread Steve Lewis
Even for a single machine (and there may be reasons to use a single machine if the original data is not splittable) Our experience suggests it should take about an hour to process 32 GB on a single machine leading me to wonder whether writing the Sequence file is your limiting step - Consider very

AW: How to merge several SequenceFile into one?

2011-05-12 Thread Christoph Schmitz
Oops, sorry, I answered in the wrong thread. I intended to reply to the "How to create a SequenceFile faster" issue. Regards, Christoph -Ursprüngliche Nachricht- Von: 丛林 [mailto:congli...@gmail.com] Gesendet: Donnerstag, 12. Mai 2011 14:30 An: mapreduce-user@hadoop.apache.org Betreff: R

Re: How to merge several SequenceFile into one?

2011-05-12 Thread 丛林
Hi Christoph, If there is no reducer, how can these sequence files be merged? Thanks for you advice. Best Wishes, -Lin 在 2011年5月12日 下午7:44,Christoph Schmitz 写道: > Hi Lin, > > you could run a map-only job, i.e. read your data and output it from the > mapper without any reducer at all (set map

AW: How to merge several SequenceFile into one?

2011-05-12 Thread Christoph Schmitz
Hi Lin, you could run a map-only job, i.e. read your data and output it from the mapper without any reducer at all (set mapred.reduce.tasks=0 or, equivalently, use job.setNumReduceTasks(0)). That way, you parallelize over your inputs through a number of mappers and do not have any sort/shuffle

Re: How to merge several SequenceFile into one?

2011-05-12 Thread 丛林
Dear Jason, If the order of the keys in sequence file is not important to me, in other words, the sort process is not necessary, how can I stop the distributed sort to save the consumption of resource? Thanks for your suggestion. Best Wishes, -Lin 2011/5/12 jason : > M/R job with a single redu

Re: How to create a SequenceFile more faster?

2011-05-12 Thread 丛林
Dear Harsh, Will you please explain how to create a sequence file in the way of mapreduce? Suppose that all 32G little file stored in one PC. Thanks for your suggestion. BTW: I notice that you repeated most of the topic of sequence file in this mail-list :-) Best Wishes, -Lin 2011/5/12 Hars