Re: Spilled Records

2011-02-21 Thread maha
Thank you Saurabh, but the following setting didn't change # of spilled records: conf.set("mapred.job.shuffle.merge.percent", ".9");//instead of .66 conf.set("mapred.inmem.merge.threshold", "1000");// instead of 1000 IS it's because of my memory being 4GB ?? I'm using the pseudo d

Re: multiple hadoop instances on same cluster

2011-02-21 Thread Konstantin Boudnik
Make sure the instances' ports aren't conflicting and all directories (NN, JT, etc.) are unique. That should do it. --   Take care, Konstantin (Cos) Boudnik On Mon, Feb 21, 2011 at 20:09, Gang Luo wrote: > Hello folks, > I am trying to run multiple hadoop instances on the same cluster. I find it

multiple hadoop instances on same cluster

2011-02-21 Thread Gang Luo
Hello folks, I am trying to run multiple hadoop instances on the same cluster. I find it hard to share. First I try two instances, each of them run with the same master and slaves. Only one of them could work. I try to divide the cluster such that hadoop 1 use machine 0-9 and hadoop 2 uses mac

RE: Spilled Records

2011-02-21 Thread Saurabh Dutta
Hi Maha, The spilled record has to do with the transient data during the map and reduce operations. Note that it's not just the map operations that generate the spilled records. When the in-memory buffer (controlled by mapred.job.shuffle.merge.percent) runs out or reaches the threshold number o

Spilled Records

2011-02-21 Thread maha
Hello every one, Does spilled records mean that the sort-buffer size for sorting is not enough to sort all the input records, hence some records are written to local disk ? If so, I tried setting my io.sort.mb from the default 100 to 200 and there was still the same # of spilled records. Why

Re: how many output files can support by MultipleOutputs?

2011-02-21 Thread Jun Young Kim
hi, I think the third error pattern is are not caused by xceiver key. org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#5 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124) at org.apache.hadoop.mapred.ReduceTask.run(R

Re: benchmark choices

2011-02-21 Thread Shrinivas Joshi
I wonder what companies like Amazon, Cloudera, RackSpace, Facebook, Yahoo etc. look at for the purpose of benchmarking. I guess GridMix v3 might be of more interest to Yahoo. I would appreciate if someone can comment more on this. Thanks, -Shrinivas On Fri, Feb 18, 2011 at 4:50 PM, Konstantin Bo

Re: ObjectWritable

2011-02-21 Thread Weishung Chung
Thank you for the explanation. Avro is a good serialization tool. I haven't looked at the codes yet. I will probably dig into the codes very soon. On Mon, Feb 21, 2011 at 10:20 AM, Harsh J wrote: > Hello, > > On Mon, Feb 21, 2011 at 9:33 PM, Weishung Chung > wrote: > > What is the main use of o

Re: Quick question

2011-02-21 Thread maha
How can then I produce an output/file per mapper not map-task? Thank you, Maha On Feb 20, 2011, at 10:22 PM, Ted Dunning wrote: > This is the most important thing that you have said. The map function > is called once per unit of input but the mapper object persists for > many input units of inpu

measure the time taken by stragglers

2011-02-21 Thread bikash sharma
Hi, Is there a way in which we can measure the execution time for stragglers and non-stragglers tasks separately in Hadoop mapreduce? -bikash

Re: ObjectWritable

2011-02-21 Thread Harsh J
Hello, On Mon, Feb 21, 2011 at 9:33 PM, Weishung Chung wrote: > What is the main use of org.apache.hadoop.io.ObjectWritable ? Thank you :) To use any primitive Java object as a Writable without requiring it to be implementing that interface. It will write out a class name for every type of objec

ObjectWritable

2011-02-21 Thread Weishung Chung
What is the main use of org.apache.hadoop.io.ObjectWritable ? Thank you :)

Re: Quick question

2011-02-21 Thread maha
Thanks for your answers Ted and Jim :) Maha On Feb 21, 2011, at 6:41 AM, Jim Falgout wrote: > You're scenario matches the capability of NLineInputFormat exactly, so that > looks to be the best solution. If you wrote your own input format, it would > have to basically do what NLineInputFormat i

task scheduling based on slots in Hadoop

2011-02-21 Thread bikash sharma
Hi, Can anyone throw some more light on resource based scheduling in Hadoop. Specifically, are resources like CPU, Memory partitioned across slots? >From the blog by Arun on capacity scheduler, http://developer.yahoo.com/blogs/hadoop/posts/2011/02/capacity-scheduler/ I understand that memory is the

RE: Quick question

2011-02-21 Thread Jim Falgout
You're scenario matches the capability of NLineInputFormat exactly, so that looks to be the best solution. If you wrote your own input format, it would have to basically do what NLineInputFormat is already doing for you. -Original Message- From: maha [mailto:m...@umail.ucsb.edu] Sent: S