At the moment you have to start the latest server to be alive first. I know there's a task to compare age of the journals before synchronizing it.. but it's not done yet.
On Tue, Jul 17, 2018 at 6:48 PM, Udayan Sahu <udayan.s...@oracle.com> wrote: > Its simple HA subsystem, with a simple ask in replicated state system, it > should start from last committed state… > > > > Step1: Master (M1) & Standby (S1) Alive > > Step2: Producer Send 10 Message à M1 receives it and replicates it to S1 > > Step3: Kill Master ( M1) à It makes S1 as New Master > > Step4: Producer Send 10 Message à S1 receives messages and is not replicated > as M1 is Down > > Step5: Kill Standby ( S1 ) > > Step6: Start Master ( M1 ) > > Step7: Start Standby (S1) ( it sync with Master (M1) discarding its internal > state ) > > This is wrong. M1 should sync with S1 since S1 represents the current state > of the queue. > > > > How can we protect Step 4 Messages being lost… We are using transacted > session and calling commit to make sure messages are persisted.. > > > > --- Udayan Sahu > > > > > > From: Clebert Suconic [mailto:clebert.suco...@gmail.com] > Sent: Tuesday, July 17, 2018 2:50 PM > To: users@activemq.apache.org > Cc: Udayan Sahu <udayan.s...@oracle.com> > Subject: Re: Potential message loss seen with HA topology in Artemis 2.6.2 > on failback > > > > Ha is about preserving the journals between failures. > > > > When you read and send messages you may still have an failure during the > reading. I would need to understand what you do in case of a failure with > your consumer and producer. > > > > Retries on send and duplicate detection are key for your case. > > > > You could also play with XA and a transaction manager. > > > > On Tue, Jul 17, 2018 at 5:01 PM Neha Sareen <neha.sar...@oracle.com> wrote: > > Hi, > > > > We are setting up a cluster of 6 brokers using Artemis 2.6.2. > > > > The cluster has 3 groups. > > - Each group has one master, and one slave broker pair. > > - The HA uses replication. > > - Each master broker configuration has the flag 'check-for-live-server' set > to true. > > - Each slave broker configuration has the flag 'allow-failback' set to true. > > - We use static connectors for allowing cluster topology discovery. > > - Each broker's static connector list includes the connectors to the other 5 > servers in the cluster. > > - Each broker declares its acceptor. > > - Each broker exports its own connector information via the 'connector-ref' > configuration element. > > - The acceptor and the connector URLs for each broker are identical with > respect to the host and port information > > > > We have a standalone test application that creates producers and > > consumers to write messages and receive messages respectively using a > transacted JMS session. > > > >> We are trying to execute an automatic failover test case followed by >> failback as follows: > > TestCase -1 > > Step1: Master & Standby Alive > > Step2: Producer Send Message , say 9 messages > > Step3: Kill Master > > Step4: Producer Send Message , say another 9 messages > > Step5: Kill Standby > > Step6: Start Master > > Step7: Start Standby. > > What we see is that it sync with Master discarding its internal state , and > we are able to consume only 9 messages, leading to a loss of 9 messages > > > > > > Test Case - 2 > > Step1: Master & Standby Alive > > Step2: Producer Send Message > > Step3: Kill Master > > Step4: Producer Send Message > > Step5: Kill Standby > > Step6: Start Standby ( it waits for Master ) > > Step7: Start Master (Question does it wait for slave ??) > > Step8: Consume Message > > > > Can someone provide any insights here regarding the potential message loss? > > Also are there alternatives to a different topology we may use here to get > around this issue? > > > > Thanks > > Neha > > > > -- > > Clebert Suconic -- Clebert Suconic