Re: Reducer Run on Which Machine?

2011-08-04 Thread Arun C Murthy
Nope, currently we don't do any smart scheduling for reduces since they need to fetch map outputs from many nodes anyway. Arun On Aug 4, 2011, at 10:24 PM, Suhendry Effendy wrote: > I understand that we can decide which task run by which reducer in Hadoop by > using custom partitioner, but is

Reducer Run on Which Machine?

2011-08-04 Thread Suhendry Effendy
I understand that we can decide which task run by which reducer in Hadoop by using custom partitioner, but is there anyway to decide which reducer run on which machine? Suhendry Effendy

Re: How does Hadoop reuse the objects?

2011-08-04 Thread Joey Echeverria
Wow, I didn't expect that. That's nastier than usual. I would think that cloning by serializing/deserializing would be unnecessarily slow. I would file a JIRA with Avro asking for a clone() or copy constructor in generated code. -Joey On Thu, Aug 4, 2011 at 5:07 PM, Vyacheslav Zholudev wrote: >

Re: How does Hadoop reuse the objects?

2011-08-04 Thread Vyacheslav Zholudev
Just sharing my today's discovery: Hadoop also reuses objects in internal lists, in my example the BAR objects. That is if the first FOO object has two BAR objects in the list, then the second FOO object will contain the same (equal by reference) first two BAR objects in the list. So in case of Avr

Re: How does Hadoop reuse the objects?

2011-08-04 Thread Milind.Bhandarkar
HADOOP-2399 has caused a lot of problems for users so far, and the saga still continues :-( I remember spending 18 straight hours in 2008 with a user debugging this issue. - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, an

Re: MultipleOutputs support

2011-08-04 Thread Harsh J
Vanja, On Thu, Aug 4, 2011 at 8:45 PM, Vanja Komadinovic wrote: > Thanks Harsh, > > I solved my problem with FAQ point you give me. Glad to know things are resolved! > Regarding MultipleOutputs, I was thinking that MultipleOutputs are not > working with new API on 0.20, but later found that in

Re: MultipleOutputs support

2011-08-04 Thread Vanja Komadinovic
Thanks Harsh, I solved my problem with FAQ point you give me. Regarding MultipleOutputs, I was thinking that MultipleOutputs are not working with new API on 0.20, but later found that in CDH distribution this is solved. Until all our production clusters are not switched to CDH3 I must use manu

Re: Multiple avro outputs from a reducer

2011-08-04 Thread Vyacheslav Zholudev
Hi all, I tried to follow the suggestions and also looked at the code how the Avro thing works in mappers and reducers and created a simple class for Avro multiple outputs. If you are interested in looking or reviewing you can follow the link: http://pastebin.com/HMPfgttg Any suggestions and c