Re: Starting up a larger cluster

2008-02-12 Thread Owen O'Malley
On Feb 12, 2008, at 7:08 AM, Marco Nicosia wrote: DFS should place one replica per rack: http://issues.apache.org/jira/browse/HADOOP-2559 No, that would hurt the aggregate write throughput. Read the comment on 2559: http://issues.apache.org/jira/browse/HADOOP-2559? focusedCommentId=1256712

Re: Starting up a larger cluster

2008-02-12 Thread Marco Nicosia
DFS should place one replica per rack: http://issues.apache.org/jira/browse/HADOOP-2559 On 2/9/08 22:53, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: > > On Feb 8, 2008, at 9:32 AM, Jeff Eastman wrote: > >> I noticed that phenomena right off the bat. Is that a designed >> "feature" >> or just an

Re: Starting up a larger cluster

2008-02-09 Thread Owen O'Malley
On Feb 8, 2008, at 9:32 AM, Jeff Eastman wrote: I noticed that phenomena right off the bat. Is that a designed "feature" or just an unhappy consequence of how blocks are allocated? It was driven by a desire to maximize HDFS write throughput, which has unfortunate effects in the case of a

Re: Starting up a larger cluster

2008-02-08 Thread Allen Wittenauer
On 2/8/08 9:32 AM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote: > I noticed that phenomena right off the bat. Is that a designed "feature" > or just an unhappy consequence of how blocks are allocated? My understanding is that this is by design--when you are running a MR job, you want the output,

RE: Starting up a larger cluster

2008-02-08 Thread Jeff Eastman
-- From: Allen Wittenauer [mailto:[EMAIL PROTECTED] Sent: Friday, February 08, 2008 9:15 AM To: core-user@hadoop.apache.org Subject: Re: Starting up a larger cluster On 2/7/08 11:01 PM, "Tim Wintle" <[EMAIL PROTECTED]> wrote: > it's > useful to be able to connect from no

Re: Starting up a larger cluster

2008-02-08 Thread Allen Wittenauer
On 2/7/08 11:01 PM, "Tim Wintle" <[EMAIL PROTECTED]> wrote: > it's > useful to be able to connect from nodes that aren't in the slaves file > so that you can put in input data direct from another machine that's not > part of the cluster, I'd actually recommend this as a best practice. We've

RE: Starting up a larger cluster

2008-02-07 Thread Tim Wintle
You can set which nodes are allowed to connect in hadoop-site.xml - it's useful to be able to connect from nodes that aren't in the slaves file so that you can put in input data direct from another machine that's not part of the cluster, or add extra machines on the fly (just make sure they're rout

RE: Starting up a larger cluster

2008-02-07 Thread Jeff Eastman
Oops, should be TaskTracker. -Original Message- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 12:24 PM To: core-user@hadoop.apache.org Subject: RE: Starting up a larger cluster Hi Ben, I've been down this same path recently and I think I understand

RE: Starting up a larger cluster

2008-02-07 Thread Jeff Eastman
Hi Ben, I've been down this same path recently and I think I understand your issues: 1) Yes, you need the hadoop folder to be in the same location on each node. Only the master node actually uses the slaves file, to start up DataNode and JobTracker daemons on those nodes. 2) If you did not specif