As Justin pointed out, look at the Network Health Check. Or to use a better infra-structure to avoid split brains.
On Mon, Jan 30, 2017 at 11:48 AM, Justin Bertram <jbert...@apache.com> wrote: >> It does what I think it does, now my slave and my master are active. This >> however is acceptable, no problems yet. > > Actually, this is a problem. This is the classic split-brain scenario. > Since both your master and slave are active with the same messages you will > lose data integrity. Once the network connection between the live and (now > active) backup is restored there is nothing which can be done to re-integrate > the data since there is no way of knowing which broker has the right data. > This is the risk you run with a single live and backup. To mitigate the risk > of split-brain you have a couple of options: > > 1) Invest in redundant network infrastructure (e.g. multiple NICs on each > machine, redundant network switches, etc.). Obviously you'll need to perform > a cost/risk analysis here to determine how much your data is actually worth. > 2) Configure a larger cluster of live/backup pairs so that if a connection > between nodes is lost a quorum vote can (hopefully) prevent the illegitimate > activation of a backup. > 3) Similar to #2 you can use the recently added "network check" > functionality [1]. > > > Justin > > > [1] http://activemq.apache.org/artemis/docs/1.5.2/network-isolation.html > > ----- Original Message ----- > From: "Gerrit Tamboer" <gerrit.tamb...@crv4all.com> > To: users@activemq.apache.org > Sent: Monday, January 30, 2017 10:03:42 AM > Subject: Re: Problems setting up replicated ha-policy. > > Hi Clebert, > > Thanks for pointing me in the right direction, I was able to set up > replication with active/passive failover. > > I am able to stop the master or kill the master and the slave is responding > to it. If I start up the master again the slave replicates back to master and > the master becomes active. So far so good. > > So what I simulated now is a network outage. I did this by simply making sure > that the master cannot connect to the slave and vice versa (VirtualBox, > setting the network adapter to disabled). > It does what I think it does, now my slave and my master are active. This > however is acceptable, no problems yet. But when I enable the network adapter > again, making sure the master and slave can connect, it does not do a > failback. The slave stays active, as well as the master, and they don’t seem > to communicate. Is this some sort of splitbrain situation? > > Regards, > Gerrit > > > On 27/01/17 21:25, "Clebert Suconic" <clebert.suco...@gmail.com> wrote: > > The only issue I found is how you are defining this: > > <connector name="localhost">tcp://localhost:61616</connector> > > on the cluster connection you are passing localhost as the node, that > is sent to the backup, backup will try to connect to localhost which > is itself, so it won't actually connect to the other node. > > > You should pass in a valid IP that will be valid on the second node. > > > Hope this helps... > > > Look at the examples/features/ha/replicated-failback-static example > > On Fri, Jan 27, 2017 at 9:28 AM, Clebert Suconic > <clebert.suco...@gmail.com> wrote: >> I won't be able to get to a computer today. Only on Monday. >> >> >> Meanwhile can you compare your config with the replicated examples from the >> release? That's what I would do anyways. >> >> >> Try with a single live/backup. Make sure the Id match on the backup so it >> can pull the data. >> >> Let me know how it goes. I may find a time to open a computer this >> afternoon. >> >> On Fri, Jan 27, 2017 at 5:32 AM Gerrit Tamboer <gerrit.tamb...@crv4all.com> >> wrote: >>> >>> Hi Clebert, >>> >>> Thanks for pointing this out. >>> >>> I just tested 1.5.2 but unfortunately the results are exactly the same. No >>> failover situation although the slave sees the master going down. The slave >>> does not even notice a master being gone after a kill -9. >>> >>> This leads me to believe I have a misconfiguration, because if this is >>> designed to work like this, it’s not really HA . >>> >>> I have added the broker.xml’s of all nodes to this mail again, hopefully >>> somebody has a simular setup and can verify the configuration. >>> >>> Thanks a bunch! >>> >>> Regards, >>> Gerrit Tamboer >>> >>> >>> On 27/01/17 04:33, "Clebert Suconic" <clebert.suco...@gmail.com> wrote: >>> >>> Until recently (1.5.0) you would only have the TTL to decide when to >>> activate backup. >>> >>> >>> Recently connection failures will also play in the decision to activate >>> it. >>> >>> >>> So on 1.3.0 you will be bound to the TTL of the cluster connection. >>> >>> >>> On 1.5.2 ir should work with kill but you would still be bound to TTL in >>> case of a cable cut or switch of but that's the deal of tcp-ip >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Jan 26, 2017 at 7:24 AM Gerrit Tambour >>> <gerrit.tamb...@crv4all.com> >>> wrote: >>> >>> > Forgot to send the attachments! >>> > >>> > >>> > >>> > *From: *Gerrit Tamboer <gerrit.tamb...@crv4all.com> >>> > *Date: *Thursday 26 January 2017 at 13:23 >>> > *To: *"users@activemq.apache.org" <users@activemq.apache.org> >>> > *Subject *Problems setting up replicated ha-policy. >>> > >>> > >>> > >>> > Hi community, >>> > >>> > >>> > >>> > We are attempting to setup a 3 node Artemis (1.3.0) cluster with an >>> > active-passive failover situation. We see that the master node is >>> > actively >>> > accepting connections: >>> > >>> > >>> > >>> > 09:52:30,167 INFO [org.apache.activemq.artemis.core.server] AMQ221000: >>> > live Message Broker is starting with configuration Broker Configuration >>> > (clustered=true >>> > >>> > ,journalDirectory=./data/journal,bindingsDirectory=./data/bindings,largeMessagesDirectory=./data/large-messages,pagingDirectory=/opt/jamq_paging_data/data) >>> > >>> > 09:52:33,176 INFO [org.apache.activemq.artemis.core.server] AMQ221020: >>> > Started Acceptor at 0.0.0.0:61616 for protocols >>> > [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE] >>> > >>> > >>> > >>> > The slaves are able to connect to the master and are reporting that they >>> > are in standby mode: >>> > >>> > >>> > >>> > 08:16:57,426 INFO [org.apache.activemq.artemis.core.server] AMQ221000: >>> > backup Message Broker is starting with configuration Broker Configuration >>> > (clustered=true,journalDirectory=./data/journal,bindingsDirectory=./data/bindings,largeMessagesDirectory=./data/large-messages,pagingDirectory=/opt/jamq_paging_data/data) >>> > >>> > 08:18:38,529 INFO [org.apache.activemq.artemis.core.server] AMQ221109: >>> > Apache ActiveMQ Artemis Backup Server version 1.3.0 [null] started, >>> > waiting >>> > live to fail before it gets active >>> > >>> > >>> > >>> > However, when I kill the master node now, it reports that the master is >>> > gone , but does not become active itself: >>> > >>> > >>> > >>> > 08:20:14,987 WARN [org.apache.activemq.artemis.core.client] AMQ212037: >>> > Connection failure has been detected: AMQ119015: The connection was >>> > disconnected because of server shutdown [code=DISCONNECTED] >>> > >>> > >>> > >>> > When I do a kill -9 on the PID of the master java process, it does not >>> > even report that the master has gone away. >>> > >>> > I also tested this in Artemis 1.5.1, with the same results. Also >>> > removing >>> > one of the slaves (to have a simple master-slave setup), also does not >>> > work. >>> > >>> > My expectation is that if the master dies, one of the slaves becomes >>> > active. >>> > >>> > Attached you will find the broker.xml of all 3 nodes. >>> > >>> > >>> > >>> > Thanks in advance for the help! >>> > >>> > >>> > >>> > Kind regards, >>> > >>> > Gerrit Tamboer >>> > >>> > >>> > >>> > >>> > This message is subject to the following E-mail Disclaimer. ( >>> > http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats >>> > according to the articles of association in Arnhem, Dutch trade number >>> > 09125050. >>> > >>> -- >>> Clebert Suconic >>> >>> >>> This message is subject to the following E-mail Disclaimer. >>> (http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats according >>> to the articles of association in Arnhem, Dutch trade number 09125050. >> >> -- >> Clebert Suconic > > > > -- > Clebert Suconic > > > This message is subject to the following E-mail Disclaimer. > (http://www.crv4all.com/disclaimer-email/) CRV Holding B.V. seats according > to the articles of association in Arnhem, Dutch trade number 09125050. -- Clebert Suconic