Ishan, have to admit that I am a bit surprise about the need of have data center in 10 different locations. Well, I guess I shouldn't be, as every company is global now(anyone from Mars yet?)
In your case, since there is only one column family. The headache is not as bad. Let's call your clusters as C1, C2, ... C10 The safest way for your most critical data is still have setup the M-M replication by 1 to N-1. That is every cluster add the rest of clusters as its peer. For example C1 will have C2, C3...C10 as its peers; C2 will have C1, C3.. C10. Well, that will be a lot of data over the network. Although it is the best/fast way to get all the cluster sync-up. I don't like the idea at all(too expensive for one). Now, let's improve it a bit. C1 will setup M-M to 2 of the rest 9, and carefully planned the distribution so that all the clusters will get equal load. Well, a system administrator has to do it manually. Now, thinking about the headache: 1) what if your company(that is your manager who has no idea how difficult it is) decide to have one more column family to be replicated? how about two more? The load will grow exponentially 2) how about your company have a new office in the 11th locations? again, grow exponentially 3) let's say you are the best administrator, and keep nice record of everything (unforturnatly, Hbase alone doesn't have a good way to maintain all the record of who is being replicated). And then, the admin left the company? or this is a global company has 10 admin at different locations. How do they communicate of the replication setup? :-) Well, the 3) is not too bad. I just like to point it out as it can be quite true for a company large enough to have 10 locations Demai On Fri, Nov 8, 2013 at 2:42 PM, Ishan Chhabra <[email protected]>wrote: > Ted: > Yes. It is the same table that is being written to from all locations. A > single row could be updated from multiple locations, but our schema is > designed in a manner that writes will be independent and not clobber each > other. > > > On Fri, Nov 8, 2013 at 2:33 PM, Ted Yu <[email protected]> wrote: > > > Ishan: > > In your use case, the same table is written to in 10 clusters at roughly > > the same time ? > > > > Please clarify. > > > > > > On Fri, Nov 8, 2013 at 2:29 PM, Ishan Chhabra <[email protected] > > >wrote: > > > > > @Demai, > > > We actually have 10 clusters in different locations. > > > The replication scope is not an issue for me since I have only one > column > > > family and we want it replicated to each location. > > > Can you elaborate more on why a replication setup of more than 3-4 > > clusters > > > would be a headache in your opinion? > > > > > > > > > On Fri, Nov 8, 2013 at 2:16 PM, Ishan Chhabra <[email protected] > > > >wrote: > > > > > > > @Demai, > > > > Writes from B should also go to A and C. So, if I were to continue on > > > your > > > > suggestion, I would setup A-B master master and B-C master-master, > > which > > > is > > > > what I was proposing in the 2nd approach (MST based). > > > > > > > > @Vladimir > > > > That is classified. :P > > > > > > > > > > > > On Fri, Nov 8, 2013 at 1:20 PM, Vladimir Rodionov < > > > [email protected]>wrote: > > > > > > > >> *I want to setup NxN replication i.e. N clusters each replicating to > > > each > > > >> other. N is expected to be around 10.* > > > >> > > > >> Preparing to thermonuclear war? > > > >> > > > >> > > > >> > > > >> > > > >> On Fri, Nov 8, 2013 at 1:14 PM, Ishan Chhabra < > > [email protected] > > > >> >wrote: > > > >> > > > >> > I want to setup NxN replication i.e. N clusters each replicating > to > > > each > > > >> > other. N is expected to be around 10. > > > >> > > > > >> > On doing some research, I realize it is possible after HBASE-7709 > > fix, > > > >> but > > > >> > it would lead to much more data flowing in the system. eg. > > > >> > > > > >> > Lets say we have 3 clusters: A,B and C. > > > >> > A new write to A will go to B and then C, and also go to C > directly > > > via > > > >> the > > > >> > direct path. This leads to unnecessary network usage and writes to > > WAL > > > >> of > > > >> > B, that should be avoided. Now imagine this with 10 clusters, it > > won’t > > > >> > scale. > > > >> > > > > >> > One option is to create a minimum spanning tree joining all the > > > clusters > > > >> > and make nodes replicate to their immediate peers in a > master-master > > > >> > fashion. This is much better than NxN mesh, but still has extra > > > network > > > >> and > > > >> > WAL usage. It also suffers from a failure scenarios where the a > > single > > > >> > cluster going down will pause replication to clusters downstream. > > > >> > > > > >> > What I really want is that the ReplicationSource should only > forward > > > >> > WALEdits with cluster-id same as the local cluster-id. This seems > > > like a > > > >> > straight forward patch to put in. > > > >> > > > > >> > Any thoughts on the suggested approach or alternatives? > > > >> > > > > >> > -- > > > >> > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. > > > > > > > > > > > > > > > > -- > > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. > > > > > > > > > -- > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc. >
