Thank you Saurabh, but the following setting didn't change # of spilled records:
conf.set("mapred.job.shuffle.merge.percent", ".9");//instead of .66
conf.set("mapred.inmem.merge.threshold", "1000");// instead of 1000
IS it's because of my memory being 4GB ??
I'm using the pseudo d
Make sure the instances' ports aren't conflicting and all directories
(NN, JT, etc.) are unique. That should do it.
--
Take care,
Konstantin (Cos) Boudnik
On Mon, Feb 21, 2011 at 20:09, Gang Luo wrote:
> Hello folks,
> I am trying to run multiple hadoop instances on the same cluster. I find it
Hello folks,
I am trying to run multiple hadoop instances on the same cluster. I find it
hard
to share. First I try two instances, each of them run with the same master and
slaves. Only one of them could work. I try to divide the cluster such that
hadoop 1 use machine 0-9 and hadoop 2 uses mac
Hi Maha,
The spilled record has to do with the transient data during the map and reduce
operations. Note that it's not just the map operations that generate the
spilled records. When the in-memory buffer (controlled by
mapred.job.shuffle.merge.percent) runs out or reaches the threshold number o
Hello every one,
Does spilled records mean that the sort-buffer size for sorting is not enough
to sort all the input records, hence some records are written to local disk ?
If so, I tried setting my io.sort.mb from the default 100 to 200 and there was
still the same # of spilled records. Why
hi,
I think the third error pattern is are not caused by xceiver key.
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle
in fetcher#5
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
at org.apache.hadoop.mapred.ReduceTask.run(R
I wonder what companies like Amazon, Cloudera, RackSpace, Facebook, Yahoo
etc. look at for the purpose of benchmarking. I guess GridMix v3 might be of
more interest to Yahoo.
I would appreciate if someone can comment more on this.
Thanks,
-Shrinivas
On Fri, Feb 18, 2011 at 4:50 PM, Konstantin Bo
Thank you for the explanation. Avro is a good serialization tool. I haven't
looked at the codes yet. I will probably dig into the codes very soon.
On Mon, Feb 21, 2011 at 10:20 AM, Harsh J wrote:
> Hello,
>
> On Mon, Feb 21, 2011 at 9:33 PM, Weishung Chung
> wrote:
> > What is the main use of o
How can then I produce an output/file per mapper not map-task?
Thank you,
Maha
On Feb 20, 2011, at 10:22 PM, Ted Dunning wrote:
> This is the most important thing that you have said. The map function
> is called once per unit of input but the mapper object persists for
> many input units of inpu
Hi,
Is there a way in which we can measure the execution time for stragglers and
non-stragglers tasks separately in Hadoop mapreduce?
-bikash
Hello,
On Mon, Feb 21, 2011 at 9:33 PM, Weishung Chung wrote:
> What is the main use of org.apache.hadoop.io.ObjectWritable ? Thank you :)
To use any primitive Java object as a Writable without requiring it to
be implementing that interface. It will write out a class name for
every type of objec
What is the main use of org.apache.hadoop.io.ObjectWritable ? Thank you :)
Thanks for your answers Ted and Jim :)
Maha
On Feb 21, 2011, at 6:41 AM, Jim Falgout wrote:
> You're scenario matches the capability of NLineInputFormat exactly, so that
> looks to be the best solution. If you wrote your own input format, it would
> have to basically do what NLineInputFormat i
Hi,
Can anyone throw some more light on resource based scheduling in Hadoop.
Specifically, are resources like CPU, Memory partitioned across slots?
>From the blog by Arun on capacity scheduler,
http://developer.yahoo.com/blogs/hadoop/posts/2011/02/capacity-scheduler/
I understand that memory is the
You're scenario matches the capability of NLineInputFormat exactly, so that
looks to be the best solution. If you wrote your own input format, it would
have to basically do what NLineInputFormat is already doing for you.
-Original Message-
From: maha [mailto:m...@umail.ucsb.edu]
Sent: S
15 matches
Mail list logo