What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread yang song
Dear all I'm sorry to disturb you. Our cluster has 200 nodes now. In order to improve its ability, we hope to add 60 nodes into the current cluster. However, we all don't know what will happen if we add so many nodes at the same time. Could you give me some tips and notes? During the

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread Ted Dunning
If you add these nodes, data will be put on them as you add data to the cluster. Soon after adding the nodes you should rebalance the storage to avoid age related surprises in how files are arranged in your cluster. Other than that, your addition should cause little in the way of surprises. On

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread Aaron Kimball
Also, if you haven't yet configured rack awareness, now's a good time to start :) - Aaron On Tue, Aug 11, 2009 at 11:27 PM, Ted Dunning ted.dunn...@gmail.com wrote: If you add these nodes, data will be put on them as you add data to the cluster. Soon after adding the nodes you should

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread yang song
Thank you for teaching me that. I'm trying to use the balance tool(bin/hadoop balancer -t xxx). However, the data transfer is so slow that it will take a long long time. Is there a good method to solve it? What's more, I have a puzzle. The situation is we rarely use the existed data in the

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread Harish Mallipeddi
On Thu, Aug 13, 2009 at 8:06 AM, yang song hadoop.ini...@gmail.com wrote: Thank you for teaching me that. I'm trying to use the balance tool(bin/hadoop balancer -t xxx). However, the data transfer is so slow that it will take a long long time. Is there a good method to solve it? What's

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread Ted Dunning
There is a parameter (dfs.balance.bandwidthPerSec) that limits the rebalancing bandwidth. The default is rather low. See http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing On Wed, Aug 12, 2009 at 7:36 PM, yang song hadoop.ini...@gmail.com wrote: I'm trying to use the balance