Its simple HA subsystem, with a simple ask in replicated state system, it should start from last committed state…
Step1: Master (M1) & Standby (S1) Alive Step2: Producer Send 10 Message à M1 receives it and replicates it to S1 Step3: Kill Master ( M1) à It makes S1 as New Master Step4: Producer Send 10 Message à S1 receives messages and is not replicated as M1 is Down Step5: Kill Standby ( S1 ) Step6: Start Master ( M1 ) Step7: Start Standby (S1) ( it sync with Master (M1) discarding its internal state ) This is wrong. M1 should sync with S1 since S1 represents the current state of the queue. How can we protect Step 4 Messages being lost… We are using transacted session and calling commit to make sure messages are persisted.. --- Udayan Sahu From: Clebert Suconic [mailto:clebert.suco...@gmail.com] Sent: Tuesday, July 17, 2018 2:50 PM To: users@activemq.apache.org Cc: Udayan Sahu <udayan.s...@oracle.com> Subject: Re: Potential message loss seen with HA topology in Artemis 2.6.2 on failback Ha is about preserving the journals between failures. When you read and send messages you may still have an failure during the reading. I would need to understand what you do in case of a failure with your consumer and producer. Retries on send and duplicate detection are key for your case. You could also play with XA and a transaction manager. On Tue, Jul 17, 2018 at 5:01 PM Neha Sareen <HYPERLINK "mailto:neha.sar...@oracle.com"neha.sar...@oracle.com> wrote: Hi, We are setting up a cluster of 6 brokers using Artemis 2.6.2. The cluster has 3 groups. - Each group has one master, and one slave broker pair. - The HA uses replication. - Each master broker configuration has the flag 'check-for-live-server' set to true. - Each slave broker configuration has the flag 'allow-failback' set to true. - We use static connectors for allowing cluster topology discovery. - Each broker's static connector list includes the connectors to the other 5 servers in the cluster. - Each broker declares its acceptor. - Each broker exports its own connector information via the 'connector-ref' configuration element. - The acceptor and the connector URLs for each broker are identical with respect to the host and port information We have a standalone test application that creates producers and consumers to write messages and receive messages respectively using a transacted JMS session. > We are trying to execute an automatic failover test case followed by failback > as follows: TestCase -1 Step1: Master & Standby Alive Step2: Producer Send Message , say 9 messages Step3: Kill Master Step4: Producer Send Message , say another 9 messages Step5: Kill Standby Step6: Start Master Step7: Start Standby. What we see is that it sync with Master discarding its internal state , and we are able to consume only 9 messages, leading to a loss of 9 messages Test Case - 2 Step1: Master & Standby Alive Step2: Producer Send Message Step3: Kill Master Step4: Producer Send Message Step5: Kill Standby Step6: Start Standby ( it waits for Master ) Step7: Start Master (Question does it wait for slave ??) Step8: Consume Message Can someone provide any insights here regarding the potential message loss? Also are there alternatives to a different topology we may use here to get around this issue? Thanks Neha -- Clebert Suconic