Hi guys,
I have a cluster of 16 machines running Hadoop. Now I want to do some benchmark
on this cluster with the "nnbench" and "mrbench".
I'm new to the hadoop thing and have no one to refer to. I don't know what the
supposed result should I have?
Now for mrbench, I have an average time of 22se
Hi Folks,
I'm having trouble getting a custom classpath through to the datanodes
in my cluster.
I'm using libhdfs and pipes, and the hdfsConnect call in libhdfs
requires that the classpath is set. My code executes fine on a
standalone machine, but when I take to the cluster, I can see that the
c
Bryan,
On Fri, May 6, 2011 at 10:50 PM, Bryan Keller wrote:
> I wanted to be able to use the same local directory that the reducer is using
> so, if there are multiple reducers running, I can take advantage of all of
> the drives I have configured in mapred.local.dir.
If I was unclear before,
masters is the central controller, slaves are specific cluster nodes? how to
do if the masters crashing?
thanks
Thanks for the info Matt.
My use case is this. I have a fairly large amount of data being passed to my
reducer. I need to load all of the data into a matrix and run a linear program.
The most efficient way to do this from the LP side is to write all of the data
to a file, and pass the file to t
Ah, so you're suggesting there should be some hysteresis in the system,
delaying response for a while to large-scale events?
In particular, are you suggesting that for anticipated events, like
"I'm taking this rack offline for 30 minutes,
but it will be back with data intact, AN
Is there a command that will allow you to retrieve a setting's value for
a specific job?
I'm looking specifically for the JobName (mapred.job.name), but a more
general purpose 'get setting' command would be very nice.
I went through the 'hadoop -job' options and could get the job id, and
lo
One option is changing the code in streaming.jar to not wait for job completion
but then you are on your own to check the job status, failures etc of these
asynchronous jobs.
-Bharath
From: Dieter Plaetinck
To: common-user@hadoop.apache.org
Cc: bharathw...@ya
Here is a configurable custom partitioner template along with a discussion
of when the configurable interface methods are called:
http://cornercases.wordpress.com/2011/05/06/an-example-configurable-partitioner/
.
On Thu, May 5, 2011 at 9:03 AM, W.P. McNeill wrote:
> The other thing you want to d
that will cause 200 regenerate-files processes running on the same
files, at the same time. not good..
Dieter
On Fri, 6 May 2011 07:49:45 -0700 (PDT)
Bharath Mundlapudi wrote:
> how about this?
>
> for i in $(seq 1 200); do
> exec_stream_job.sh $dir $i &
>
>
> exec_stream_job.sh
>
> --
how about this?
for i in $(seq 1 200); do
exec_stream_job.sh $dir $i &
exec_stream_job.sh
regenerate-files $dir $i
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
-D mapred.job.name="$i" \
-file $dir \
-mapper "..." -
Hi,
I have a script something like this (simplified):
for i in $(seq 1 200); do
regenerate-files $dir $i
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
-D mapred.job.name="$i" \
-file $dir \
-mapper "..." -reducer "..." -input $i-input -o
On 05/05/11 19:14, Matthew Foley wrote:
"a node (or rack) is going down, don't replicate" == DataNode Decommissioning.
This feature is available. The current usage is to add the hosts to be decommissioned to the
exclusion file named in dfs.hosts.exclude, then use DFSAdmin to invoke "-refreshNo
13 matches
Mail list logo