RE: balancing and replication in HDFS

2011-02-27 Thread Jeffrey Buell
25, 2011 8:09 PM To: hdfs-user@hadoop.apache.org Cc: Jeffrey Buell Subject: Re: balancing and replication in HDFS When you run terasort, pass -Dmapred.reduce.tasks=4 and see how that goes for you. See this old thread for info: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906

Re: balancing and replication in HDFS

2011-02-25 Thread Todd Lipcon
hunks (1.7,...,3.2 GB) seem to be repeatable, but they land on different > nodes each time teragen is run. > > Jeff > > > -Original Message- > > From: Todd Lipcon [mailto:t...@cloudera.com] > > Sent: Friday, February 25, 2011 3:13 PM > > To: hdfs-user@hado

RE: balancing and replication in HDFS

2011-02-25 Thread Jeffrey Buell
n is run. Jeff > -Original Message- > From: Todd Lipcon [mailto:t...@cloudera.com] > Sent: Friday, February 25, 2011 3:13 PM > To: hdfs-user@hadoop.apache.org > Subject: Re: balancing and replication in HDFS > > Hi Jeff, > The output of terasort has replication level 1 by de

Re: balancing and replication in HDFS

2011-02-25 Thread Todd Lipcon
Hi Jeff, The output of terasort has replication level 1 by default. This is so it goes faster with the default settings and makes for more impressive benchmark results :) The reason you see it all on one machine is probably that you're running with one reducer. Try configuring your terasort to use

balancing and replication in HDFS

2011-02-25 Thread Jeffrey Buell
I'm a newbie to hadoop and HDFS. I'm seeing odd behavior in HDFS that I hope somebody can clear up for me. I'm running hadoop version 0.20.1+169.127 from the cloudera distro on 4 identical nodes, each with 4 cpus and 100GB disk space. Replication is set to 2. I run: hadoop jar /usr/lib/hado