hanging context.write() with large arrays

2012-05-05 Thread Zuhair Khayyat
Hi, I am building a MapReduce application that constructs the adjacency list of a graph from an input edge list. I noticed that my Reduce phase always hangs (and timeout eventually) as it calls the function context.write(Key_x,Value_x) when the Value_x is a very large ArrayWritable (around 4M elem

Re: hanging context.write() with large arrays

2012-05-05 Thread Zizon Qiu
for the timeout problem,you can use a background thread that invoke context.progress() timely which do "keep-alive" for forked Child(mapper/combiner/reducer)... it is tricky but works. On Sat, May 5, 2012 at 10:05 PM, Zuhair Khayyat wrote: > Hi, > > I am building a MapReduce application that con

Re: hanging context.write() with large arrays

2012-05-05 Thread Zuhair Khayyat
Thanks for the fast response.. I think it is a good idea, however the application becomes too slow with large output arrays. I would be more interested in a solution that helps speeding up the "context.write()" it self. On Sat, May 5, 2012 at 5:36 PM, Zizon Qiu wrote: > for the timeout problem,y

RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop

2012-05-05 Thread sharat attupurath
I looked at both the files. in AbstractNShotInputFormat it is mentioned that this input format does not read from files. My input is in a text file. I want the whole file as a single record. So is it enough if i just copy the contents of the file and return it as a string from getValueFromIndex

Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop

2012-05-05 Thread Steve Lewis
yes - if you know how you can put it in distributed cache or if it is small put in as a String in the config or have all InputFormats read it from somewhere On Sat, May 5, 2012 at 8:08 AM, sharat attupurath wrote: > I looked at both the files. in AbstractNShotInputFormat it is mentioned > that t

RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop

2012-05-05 Thread sharat attupurath
Since the input files are very small, the default input formats in Hadoop all generate just a single InputSplit, so only a single map task is executed, and we wont have much parallelism. I was thinking of writing an InputFormat that would read the whole file as an InputSplit and replicate thi

Re: hanging context.write() with large arrays

2012-05-05 Thread Zuhair Khayyat
Hi all, I think I solved the problem already. The default OutputFormat used by Hadoop calls the function "toString()" to create an output. The operation of appending large data into a single "String" is very expensive; which explains why Hadoop takes forever to write the output of a large array o

Re: Getting filename in case of MultipleInputs

2012-05-05 Thread Jim Donofrio
There is already a JIRA for this: MAPREDUCE-1743 On 05/03/2012 09:06 AM, Harsh J wrote: Subbu, The only way I can think of, is to use an overridden InputFormat/RecordReader pair that sets the "map.input.file" config value during its initialization, using the received FileSplit object. This sh

RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop

2012-05-05 Thread Steve Lewis
Look at what I sent you.it generates a number of splits three is a static variable to list how many On May 5, 2012 8:38 AM, "sharat attupurath" wrote: > Since the input files are very small, the default input formats in Hadoop > all generate just a single InputSplit, so only a single map task is

RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop

2012-05-05 Thread sharat attupurath
sorry i missed it the first time. Thank you Date: Sat, 5 May 2012 13:16:47 -0700 Subject: RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop From: lordjoe2...@gmail.com To: mapreduce-user@hadoop.apache.org Look at what I sent you.it generates a number of splits three is a stati

Re: Getting filename in case of MultipleInputs

2012-05-05 Thread Kasi Subrahmanyam
Yeah Jim, I have gone through the comments in that JIRA ticket and am able to solve my problem On Sat, May 5, 2012 at 11:25 PM, Jim Donofrio wrote: > There is already a JIRA for this: > > MAPREDUCE-1743 > > > On 05/03/2012 09:06 AM, Harsh J wrote: > >> Subbu, >> >> The only way I can think of, i