Hi,
I am building a MapReduce application that constructs the adjacency list of
a graph from an input edge list. I noticed that my Reduce phase always
hangs (and timeout eventually) as it calls the function
context.write(Key_x,Value_x) when the Value_x is a very large ArrayWritable
(around 4M elem
for the timeout problem,you can use a background thread that invoke
context.progress() timely which do "keep-alive" for forked
Child(mapper/combiner/reducer)...
it is tricky but works.
On Sat, May 5, 2012 at 10:05 PM, Zuhair Khayyat wrote:
> Hi,
>
> I am building a MapReduce application that con
Thanks for the fast response.. I think it is a good idea, however the
application becomes too slow with large output arrays. I would be more
interested in a solution that helps speeding up the "context.write()" it
self.
On Sat, May 5, 2012 at 5:36 PM, Zizon Qiu wrote:
> for the timeout problem,y
I looked at both the files. in AbstractNShotInputFormat it is mentioned that
this input format does not read from files. My input is in a text file. I want
the whole file as a single record. So is it enough if i just copy the contents
of the file and return it as a string from getValueFromIndex
yes - if you know how you can put it in distributed cache or if it is small
put in as a String in the config or have all InputFormats read it from
somewhere
On Sat, May 5, 2012 at 8:08 AM, sharat attupurath wrote:
> I looked at both the files. in AbstractNShotInputFormat it is mentioned
> that t
Since the input files are very small, the default input formats in Hadoop all
generate just a single InputSplit, so only a single map task is executed, and
we wont have much parallelism.
I was thinking of writing an InputFormat that would read the whole file as an
InputSplit and replicate thi
Hi all,
I think I solved the problem already.
The default OutputFormat used by Hadoop calls the function "toString()" to
create an output. The operation of appending large data into a single
"String" is very expensive; which explains why Hadoop takes forever to
write the output of a large array o
There is already a JIRA for this:
MAPREDUCE-1743
On 05/03/2012 09:06 AM, Harsh J wrote:
Subbu,
The only way I can think of, is to use an overridden
InputFormat/RecordReader pair that sets the "map.input.file" config
value during its initialization, using the received FileSplit object.
This sh
Look at what I sent you.it generates a number of splits three is a static
variable to list how many
On May 5, 2012 8:38 AM, "sharat attupurath" wrote:
> Since the input files are very small, the default input formats in Hadoop
> all generate just a single InputSplit, so only a single map task is
sorry i missed it the first time. Thank you
Date: Sat, 5 May 2012 13:16:47 -0700
Subject: RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
From: lordjoe2...@gmail.com
To: mapreduce-user@hadoop.apache.org
Look at what I sent you.it generates a number of splits three is a stati
Yeah Jim,
I have gone through the comments in that JIRA ticket and am able to solve
my problem
On Sat, May 5, 2012 at 11:25 PM, Jim Donofrio wrote:
> There is already a JIRA for this:
>
> MAPREDUCE-1743
>
>
> On 05/03/2012 09:06 AM, Harsh J wrote:
>
>> Subbu,
>>
>> The only way I can think of, i
11 matches
Mail list logo