25, 2011 8:09 PM
To: hdfs-user@hadoop.apache.org
Cc: Jeffrey Buell
Subject: Re: balancing and replication in HDFS
When you run terasort, pass -Dmapred.reduce.tasks=4 and see how that goes for
you. See this old thread for info:
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906
hunks (1.7,...,3.2 GB) seem to be repeatable, but they land on different
> nodes each time teragen is run.
>
> Jeff
>
> > -Original Message-
> > From: Todd Lipcon [mailto:t...@cloudera.com]
> > Sent: Friday, February 25, 2011 3:13 PM
> > To: hdfs-user@hado
n is run.
Jeff
> -Original Message-
> From: Todd Lipcon [mailto:t...@cloudera.com]
> Sent: Friday, February 25, 2011 3:13 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: balancing and replication in HDFS
>
> Hi Jeff,
> The output of terasort has replication level 1 by de
Hi Jeff,
The output of terasort has replication level 1 by default. This is so
it goes faster with the default settings and makes for more impressive
benchmark results :)
The reason you see it all on one machine is probably that you're
running with one reducer. Try configuring your terasort to use
I'm a newbie to hadoop and HDFS. I'm seeing odd behavior in HDFS that I hope
somebody can clear up for me. I'm running hadoop version 0.20.1+169.127 from
the cloudera distro on 4 identical nodes, each with 4 cpus and 100GB disk
space. Replication is set to 2.
I run:
hadoop jar /usr/lib/hado