NNbench and MRBench

2011-05-06 Thread stanley.shi
Hi guys, I have a cluster of 16 machines running Hadoop. Now I want to do some benchmark on this cluster with the "nnbench" and "mrbench". I'm new to the hadoop thing and have no one to refer to. I don't know what the supposed result should I have? Now for mrbench, I have an average time of 22se

passing classpath through to datanodes?

2011-05-06 Thread Tom Melendez
Hi Folks, I'm having trouble getting a custom classpath through to the datanodes in my cluster. I'm using libhdfs and pipes, and the hdfsConnect call in libhdfs requires that the classpath is set. My code executes fine on a standalone machine, but when I take to the cluster, I can see that the c

Re: How do I create per-reducer temporary files?

2011-05-06 Thread Harsh J
Bryan, On Fri, May 6, 2011 at 10:50 PM, Bryan Keller wrote: > I wanted to be able to use the same local directory that the reducer is using > so, if there are multiple reducers running, I can take advantage of all of > the drives I have configured in mapred.local.dir. If I was unclear before,

How masters and slaves works in hadoop cluster

2011-05-06 Thread hadoopfan
masters is the central controller, slaves are specific cluster nodes? how to do if the masters crashing? thanks

Re: How do I create per-reducer temporary files?

2011-05-06 Thread Bryan Keller
Thanks for the info Matt. My use case is this. I have a fairly large amount of data being passed to my reducer. I need to load all of the data into a matrix and run a linear program. The most efficient way to do this from the LP side is to write all of the data to a file, and pass the file to t

Re: Cluster hard drive ratios

2011-05-06 Thread Matthew Foley
Ah, so you're suggesting there should be some hysteresis in the system, delaying response for a while to large-scale events? In particular, are you suggesting that for anticipated events, like "I'm taking this rack offline for 30 minutes, but it will be back with data intact, AN

Getting Job Name from CLI

2011-05-06 Thread Quinn Gil
Is there a command that will allow you to retrieve a setting's value for a specific job? I'm looking specifically for the JobName (mapred.job.name), but a more general purpose 'get setting' command would be very nice. I went through the 'hadoop -job' options and could get the job id, and lo

Re: can a `hadoop -jar streaming.jar` command return when a job is packaged and submitted?

2011-05-06 Thread Bharath Mundlapudi
One option is changing the code in streaming.jar to not wait for job completion but then you are on your own to check the job status, failures etc of these asynchronous jobs. -Bharath From: Dieter Plaetinck To: common-user@hadoop.apache.org Cc: bharathw...@ya

Re: How do I configure a Partitioner in the new API?

2011-05-06 Thread W.P. McNeill
Here is a configurable custom partitioner template along with a discussion of when the configurable interface methods are called: http://cornercases.wordpress.com/2011/05/06/an-example-configurable-partitioner/ . On Thu, May 5, 2011 at 9:03 AM, W.P. McNeill wrote: > The other thing you want to d

Re: can a `hadoop -jar streaming.jar` command return when a job is packaged and submitted?

2011-05-06 Thread Dieter Plaetinck
that will cause 200 regenerate-files processes running on the same files, at the same time. not good.. Dieter On Fri, 6 May 2011 07:49:45 -0700 (PDT) Bharath Mundlapudi wrote: > how about this? > > for i in $(seq 1 200); do > exec_stream_job.sh $dir $i & > > > exec_stream_job.sh > > --

Re: can a `hadoop -jar streaming.jar` command return when a job is packaged and submitted?

2011-05-06 Thread Bharath Mundlapudi
how about this? for i in $(seq 1 200); do exec_stream_job.sh $dir $i & exec_stream_job.sh regenerate-files $dir $i hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \         -D mapred.job.name="$i" \         -file $dir \         -mapper "..." -

can a `hadoop -jar streaming.jar` command return when a job is packaged and submitted?

2011-05-06 Thread Dieter Plaetinck
Hi, I have a script something like this (simplified): for i in $(seq 1 200); do regenerate-files $dir $i hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \ -D mapred.job.name="$i" \ -file $dir \ -mapper "..." -reducer "..." -input $i-input -o

Re: Cluster hard drive ratios

2011-05-06 Thread Steve Loughran
On 05/05/11 19:14, Matthew Foley wrote: "a node (or rack) is going down, don't replicate" == DataNode Decommissioning. This feature is available. The current usage is to add the hosts to be decommissioned to the exclusion file named in dfs.hosts.exclude, then use DFSAdmin to invoke "-refreshNo