Re: Running a single cluster in multiple datacenters

2013-07-16 Thread Azuryy Yu
Hi Bertrand, I guess you configured two racks totally. one IDC is a rack, and another IDC is another rack. so if you want to don't replicate populate during one IDC down, you had to change the replicate placement policy, if there are minimum blocks on one rack, then don't do anything. (here

Running a single cluster in multiple datacenters

2013-07-15 Thread Niels Basjes
Hi, Last week we had a discussion at work regarding setting up our new Hadoop cluster(s). One of the things that has changed is that the importance of the Hadoop stack is growing so we want to be more available. One of the points we talked about was setting up the cluster in such a way that the

Re: Running a single cluster in multiple datacenters

2013-07-15 Thread jb
Hi Niels, it's depend of the number of replicas and the Hadoop rack configuration (level). It's possible to have replicas on the two datacenters. What's the rack configuration that you plan ? You can implement your own one and define it using the topology.node.switch.mapping.impl property.

Re: Running a single cluster in multiple datacenters

2013-07-15 Thread Bertrand Dechoux
According to your own analysis, you wouldn't be more available but that was your aim. Did you consider having two separate clusters? One per datacenter, with an automatic copy of the data? I understand that load balancing of work and data would not be easy but it seems to me a simple strategy