Piyush,
On Sat, Apr 7, 2012 at 3:02 AM, Piyush Kansal wrote:
> So, do I need to set the number of reducers depending on data size, every
> time I want to run MapReduce. Or the Hadoop framework automatically invokes
> the required number of reducers at run time?
Plain Apache Hadoop does
Hi,
I am using Partitioner and Grouper class in my program. And lets say the
data which I want to process using MapReduce varies in size, it can be just
10MB or can go up to 10GB.
So, do I need to set the number of reducers depending on data size, every
time I want to run MapReduce. Or the Hadoop
ead the output of the first job to
> run through the values of A.
> One issue is that assuming the same hashing partitioner is used there are
> the same number of reducers, a specific reducer , say reducer 12 ,
> will receive the same keys in both jobs and thus part-r-00012 from the
>
the same hashing partitioner is used there are
the same number of reducers, a specific reducer , say reducer 12 ,
will receive the same keys in both jobs and thus part-r-00012 from the
first job is the only file reducer 12 will need to read.
Can I guarantee (without restricting the number of
You could merge the side effect files before running the second map job if you
want to, or you could just leave them as separate files and then read each one
in the mapper. If there are lots of files then the namenode may get hit too
much, and slow down the entire cluster. This is mitigated by
Bobby,
Thanks for such a thoughtful response.
I have a data set that represents all the people that pass through Las Vegas
over a course of time, say five years, which comes to about 175 - 200
million people. Each record is a person, and it contains fields for where
they came from, left to; tim
Geoffry,
That really depends on how much data you are processing, and the algorithm you
need to use to process the data. I did something similar a while ago with a
medium amount of data and we saw significant speed up by first assigning each
record a new key based off of the expected range of
All,
I am mostly seeking confirmation as to my thinking on this matter.
I have an MR job that I believe will force me into using a single reducer.
The nature of the process is one where calculations performed on a given
record rely on certain accumulated values whose calculation depends on
rollin
Hello All,
I have a question regarding configuring the number of reducers property in
case of a non FIFO scheduler (either Capacity/Fair-share scheduler).
As per the guidelines on the Hadoop wiki page, we should set number of
reducers = 0.75 * maximum_reduce_slots_available_in_cluster (minimum
Great. Just thought I'd be missing something :)
On Mon, Jun 7, 2010 at 01:55, Aaron Kimball wrote:
> Yes. LJR sets your number of reduce tasks to 1 if the number is >= 1.
> Subnote: I've posted a patch to fix this at MAPREDUCE-434, but it's not
> committed.
> - Aaron
>
> On Mon, Jun 7, 2010 at 1:
Yes. LJR sets your number of reduce tasks to 1 if the number is >= 1.
Subnote: I've posted a patch to fix this at MAPREDUCE-434, but it's not
committed.
- Aaron
On Mon, Jun 7, 2010 at 1:42 AM, Torsten Curdt wrote:
> I see only one.
>
> Could it be that using the LocalJobRunner interferes here?
I see only one.
Could it be that using the LocalJobRunner interferes here?
On Mon, Jun 7, 2010 at 01:31, Eric Sammer wrote:
> Torsten:
>
> To clarify, how many reducers do you actually see? (i.e. Do you see 4
> reducers or 1?) It should work as you expect.
>
> On Sun, Jun 6, 2010 at 1:33 PM, Tor
Torsten:
To clarify, how many reducers do you actually see? (i.e. Do you see 4
reducers or 1?) It should work as you expect.
On Sun, Jun 6, 2010 at 1:33 PM, Torsten Curdt wrote:
> When I set
>
> job.setPartitionerClass(MyPartitioner.class);
> job.setNumReduceTasks(4);
>
> I would expect to see
When I set
job.setPartitionerClass(MyPartitioner.class);
job.setNumReduceTasks(4);
I would expect to see my MyParitioner get called with
getPartition(key, value, 4)
but still I see it only get called with 1.
If also tried setting
conf.set("mapred.map.tasks.speculative.exe
14 matches
Mail list logo