Hi Han, Thank you for your response. I understand. Its not possible to have a third rack/server room at the moment as the requirement is to have redundancy between both. I tried already to get one :-/
Is it possible to have a Zookeeper Ensemble (3 node) in one server room and same in the other and have some sort of master-master replication in between both of them ? would this make sense if its possible ? since in this case both would have same config and split brain theoretically should not happen. I haven't does this Zookeeper 3rd node hack before :) i guess i need to play around with it for a while to get it proper documented and functional / tested :) Thanks again! Le On Mon, Mar 6, 2017 at 8:22 PM, Hans Jespersen <h...@confluent.io> wrote: > > Is there any way you can find a third rack/server room/power supply nearby > just for the 1 extra zookeeper node? You don’t have to put any kafka > brokers there, just a single zookeeper. It’s less likely to have a 3-way > split brain because of a network partition. It’s so much cleaner with 3 > availability zones because everything would be automatic failover. This is > how most people run when deployed in Amazon. > > Baring that I would say the next best thing would be 3 zookeepers in one > zone and 2 zookeepers in the other zone so it will auto-failover if the 2 > zk zone fails. If the 3 zk zone fails you can setup a well tested set of > manual steps to carefully configure a 3rd zookeeper clone (which matches > the id of one of the failed nodes) and still get your system back up and > running. If this is not something you have done before I suggest getting a > few days of expert consulting to have someone help you set it up, test it, > and document the proper failover and recovery procedures. > > -hans > > > > > > On Mar 6, 2017, at 10:44 AM, Le Cyberian <lecyber...@gmail.com> wrote: > > > > Thanks Han and Alexander for taking time out and your responses. > > > > I now understand the risks and the possible outcome of having the desired > > setup. > > > > What would be better in your opinion to have failover (active-active) > > between both of these server rooms to avoid switching to the clone / 3rd > > zookeeper. > > > > I mean even if there are 5 nodes having 3 in one server room and 2 in > other > > still there would be problem related to zookeeper majority leader > election > > if the server room goes down that has 3 nodes. > > > > is there some way to achieve this ? > > > > Thanks again! > > > > Lee > > > > On Mon, Mar 6, 2017 at 4:16 PM, Alexander Binzberger < > > alexander.binzber...@wingcon.com> wrote: > > > >> I agree on this is one cluster but having one additional ZK node per > site > >> does not help. (as far as I understand ZK) > >> > >> A 3 out of 6 is also not a majority. So I think you mean 3/5 with a > cloned > >> 3rd one. This would mean manually switching the cloned one for majority > >> which can cause issues again. > >> 1. You actually build a master/slave ZK with manually switch over. > >> 2. While switching the clone from room to room you would have downtime. > >> 3. If you switch on both ZK node clones at the same time (by mistake) > you > >> screwed. > >> 4. If you "switch" clones instead of moving it will all data on disk you > >> generate a split brain from which you have to recover first. > >> > >> So if you loose the connection between the rooms / the rooms get > separated > >> / you loose one room: > >> * You (might) need manual interaction > >> * loose automatic fail-over between the rooms > >> * might face complete outage if your "master" room with the active 3rd > >> node is hit. > >> Actually this is the same scenario with 2/3 nodes spread over two > >> locations. > >> > >> What you need is a third cross connected location for real fault > tolerance > >> and distribute your 3 or 5 ZK nodes over those. > >> Or live with a possible outage in such a scenario. > >> > >> Additional Hints: > >> * You can run any number of Kafka brokers on a ZK cluster. In your case > >> this could be 4 Kafka brokers on 3 ZK nodes. > >> * You should set topic replication to 2 (can be done at any time) and > some > >> other producer/broker settings to ensure your messages will not get > lost in > >> switch over cases. > >> * ZK service does not react nicely on disk full. > >> > >> > >> > >> Am 06.03.2017 um 15:10 schrieb Hans Jespersen: > >> > >>> In that case it’s really one cluster. Make sure to set different rack > ids > >>> for each server room so kafka will ensure that the replicas always span > >>> both floors and you don’t loose availability of data if a server room > goes > >>> down. > >>> You will have to configure one addition zookeeper node in each site > which > >>> you will only ever startup if a site goes down because otherwise 2 of 4 > >>> zookeeper nodes is not a quorum.Again you would be better with 3 nodes > >>> because then you would only have to do this in the site that has the > single > >>> active node. > >>> > >>> -hans > >>> > >>> > >>> On Mar 6, 2017, at 5:57 AM, Le Cyberian <lecyber...@gmail.com> wrote: > >>>> > >>>> Hi Hans, > >>>> > >>>> Thank you for your reply. > >>>> > >>>> Its basically two different server rooms on different floors and they > are > >>>> connected with fiber connectivity so its almost like a local > connection > >>>> between them no network latencies / lag. > >>>> > >>>> If i do a Mirror Maker / Replicator then i will not be able to use > them > >>>> at > >>>> the same time for writes./ producers. because the consumers / > producers > >>>> will request from all of them > >>>> > >>>> BR, > >>>> > >>>> Lee > >>>> > >>>> On Mon, Mar 6, 2017 at 2:50 PM, Hans Jespersen <h...@confluent.io> > >>>> wrote: > >>>> > >>>> What do you mean when you say you have "2 sites not datacenters"? You > >>>>> should be very careful configuring a stretch cluster across multiple > >>>>> sites. > >>>>> What is the RTT between the two sites? Why do you think that MIrror > >>>>> Maker > >>>>> (or Confluent Replicator) would not work between the sites and yet > you > >>>>> think a stretch cluster will work? That seems wrong. > >>>>> > >>>>> -hans > >>>>> > >>>>> /** > >>>>> * Hans Jespersen, Principal Systems Engineer, Confluent Inc. > >>>>> * h...@confluent.io (650)924-2670 > >>>>> */ > >>>>> > >>>>> On Mon, Mar 6, 2017 at 5:37 AM, Le Cyberian <lecyber...@gmail.com> > >>>>> wrote: > >>>>> > >>>>> Hi Guys, > >>>>>> > >>>>>> Thank you very much for you reply. > >>>>>> > >>>>>> The scenario which i have to implement is that i have 2 sites not > >>>>>> datacenters so mirror maker would not work here. > >>>>>> > >>>>>> There will be 4 nodes in total, like 2 in Site A and 2 in Site B. > The > >>>>>> > >>>>> idea > >>>>> > >>>>>> is to have Active-Active setup along with fault tolerance so that if > >>>>>> one > >>>>>> > >>>>> of > >>>>> > >>>>>> the site goes on the operations are normal. > >>>>>> > >>>>>> In this case if i go ahead with 4 node-cluster of both zookeeper and > >>>>>> > >>>>> kafka > >>>>> > >>>>>> it will give failover tolerance for 1 node only. > >>>>>> > >>>>>> What do you suggest to do in this case ? because to divide between 2 > >>>>>> > >>>>> sites > >>>>> > >>>>>> it needs to be even number if that makes sense ? Also if possible > some > >>>>>> > >>>>> help > >>>>> > >>>>>> regarding partitions for topic and replication factor. > >>>>>> > >>>>>> I already have Kafka running with quiet few topics having > replication > >>>>>> factor 1 along with 1 default partition, is there a way to > repartition > >>>>>> / > >>>>>> increase partition of existing topics when i migrate to above setup > ? I > >>>>>> think we can increase replication factor by Kafka rebalance tool. > >>>>>> > >>>>>> Thanks alot for your help and time looking into this. > >>>>>> > >>>>>> BR, > >>>>>> > >>>>>> Le > >>>>>> > >>>>>> On Mon, Mar 6, 2017 at 12:20 PM, Hans Jespersen <h...@confluent.io> > >>>>>> > >>>>> wrote: > >>>>> > >>>>>> Jens, > >>>>>>> > >>>>>>> I think you are correct that a 4 node zookeeper ensemble can be > made > >>>>>>> to > >>>>>>> work but it will be slightly less resilient than a 3 node ensemble > >>>>>>> > >>>>>> because > >>>>>> > >>>>>>> it can only tolerate 1 failure (same as a 3 node ensemble) and the > >>>>>>> likelihood of node failures is higher because there is 1 more node > >>>>>>> that > >>>>>>> could fail. > >>>>>>> So it SHOULD be an odd number of zookeeper nodes (not MUST). > >>>>>>> > >>>>>>> -hans > >>>>>>> > >>>>>>> > >>>>>>> On Mar 6, 2017, at 12:20 AM, Jens Rantil <jens.ran...@tink.se> > >>>>>>>> > >>>>>>> wrote: > >>>>> > >>>>>> Hi Hans, > >>>>>>>> > >>>>>>>> On Mon, Mar 6, 2017 at 12:10 AM, Hans Jespersen < > h...@confluent.io> > >>>>>>>>> > >>>>>>>> wrote: > >>>>>>> > >>>>>>>> A 4 node zookeeper ensemble will not even work. It MUST be an odd > >>>>>>>>> > >>>>>>>> number > >>>>>> > >>>>>>> of zookeeper nodes to start. > >>>>>>>>> > >>>>>>>> > >>>>>>>> Are you sure about that? If Zookeer doesn't run with four nodes, > that > >>>>>>>> > >>>>>>> means > >>>>>>> > >>>>>>>> a running ensemble of three can't be live-migrated to other nodes > >>>>>>>> > >>>>>>> (because > >>>>>>> > >>>>>>>> that's done by increasing the ensemble and then reducing it in the > >>>>>>>> > >>>>>>> case > >>>>> > >>>>>> of > >>>>>>> > >>>>>>>> 3-node ensembles). IIRC, you can run four Zookeeper nodes, but > that > >>>>>>>> > >>>>>>> means > >>>>>> > >>>>>>> quorum will be three nodes, so there's no added benefit in terms of > >>>>>>>> availability since you can only loose one node just like with a > three > >>>>>>>> > >>>>>>> node > >>>>>>> > >>>>>>>> cluster. > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> Jens > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Jens Rantil > >>>>>>>> Backend engineer > >>>>>>>> Tink AB > >>>>>>>> > >>>>>>>> Email: jens.ran...@tink.se > >>>>>>>> Phone: +46 708 84 18 32 > >>>>>>>> Web: www.tink.se > >>>>>>>> > >>>>>>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin > >>>>>>>> <http://www.linkedin.com/company/2735919?trk=vsrp_ > >>>>>>>> > >>>>>>> companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670% > >>>>>>> 2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> > >>>>>>> > >>>>>>>> Twitter <https://twitter.com/tink> > >>>>>>>> > >>>>>>> > >>> > >> > >