Hi everyone,
I was running a MR job in java and this scenario happened:
Case 1:
Number of distinct output keys from mapper = 3
Expected # of reducers = 3
Defined set # of reducers to be called = 2
Expected outcome:
# of reducers spawned = 2
# of keys processed under first reducer = 1
#
Saurabh,
I do not see you talk about defining a custom Partitioner that can
guarantee such perfect key distribution. The default partitioner is the
HashPartitioner that can only guarantee randomized distribution (as it is
key data specific). Hence, your test here with just 3 keys is not really a
?
** **
Saurabh
** **
*From:* Harsh J [mailto:ha...@cloudera.com]
*Sent:* Thursday, August 02, 2012 4:05 PM
*To:* mapreduce-user@hadoop.apache.org
*Subject:* Re: All reducers are not being utilized
** **
Saurabh,
** **
I do not see you talk about defining a custom Partitioner that can
]
*Sent:* Thursday, August 02, 2012 4:05 PM
*To:* mapreduce-user@hadoop.apache.org
*Subject:* Re: All reducers are not being utilized
** **
Saurabh,
** **
I do not see you talk about defining a custom Partitioner that can
guarantee such perfect key distribution. The default
Hi Saurab/Steve
From my understanding the schedulers in hadoop consider only data
locality(for map tasks) and availability of slots for scheduling tasks on
various nodes. Say if you have a 3 TT nodes with 2 reducer slots each
(assume all slots are free) . If we execute a map reduce job with 3
If I have 2 nodes, and 150 input files in a single 'input' directory to
search using the 'grep' example, isn't it reasonable that both nodes would
be involved?
Thanks
On Thu, Aug 2, 2012 at 3:31 PM, Bejoy Ks bejoy.had...@gmail.com wrote:
Hi Saurab/Steve
From my understanding the schedulers