RE: Potential message loss seen with HA topology in Artemis 2.6.2 on failback

Udayan Sahu Tue, 17 Jul 2018 16:02:07 -0700

Its simple HA subsystem, with a simple ask in replicated state system, it 
should start from last committed state…

Step1: Master (M1) & Standby (S1) Alive

Step2: Producer Send 10 Message à M1 receives it and replicates it to S1

Step3: Kill Master ( M1) à It makes S1 as New Master 

Step4: Producer Send 10 Message à S1 receives messages and is not replicated as 
M1 is Down

Step5: Kill Standby ( S1 )

Step6: Start Master ( M1 ) 

Step7: Start Standby (S1) ( it sync with Master (M1) discarding its internal 
state )

This is wrong. M1 should sync with S1 since S1 represents the current state of 
the queue.

How can we protect Step 4 Messages being lost… We are using transacted session 
and calling commit to make sure messages are persisted..

--- Udayan Sahu

From: Clebert Suconic [mailto:clebert.suco...@gmail.com] 
Sent: Tuesday, July 17, 2018 2:50 PM
To: users@activemq.apache.org
Cc: Udayan Sahu <udayan.s...@oracle.com>
Subject: Re: Potential message loss seen with HA topology in Artemis 2.6.2 on 
failback

Ha is about preserving the journals between failures. 

When you read and send messages you may still have an failure during the 
reading.  I would need to understand what you do in case of a failure with your 
consumer and producer.  

Retries on send and duplicate detection are key for your case.  

You could also play with XA and a transaction manager.  

On Tue, Jul 17, 2018 at 5:01 PM Neha Sareen <HYPERLINK 
"mailto:neha.sar...@oracle.com"neha.sar...@oracle.com> wrote:

Hi,

We are setting up a cluster of 6 brokers using Artemis 2.6.2.

The cluster has 3 groups.

- Each group has one master, and one slave broker pair.

- The HA uses replication.

- Each master broker configuration has the flag 'check-for-live-server' set to 
true.

- Each slave broker configuration has the flag 'allow-failback' set to true.

- We use static connectors for allowing cluster topology discovery.

- Each broker's static connector list includes the connectors to the other 5 
servers in the cluster.

- Each broker declares its acceptor.

- Each broker exports its own connector information via the  'connector-ref' 
configuration element.

- The acceptor and the connector URLs for each broker are identical with 
respect to the host and port information

We have a standalone test application that creates producers and 

consumers to write messages and receive messages respectively using a 
transacted JMS session.

> We are trying to execute an automatic failover test case followed by failback 
> as follows:

TestCase -1

Step1: Master & Standby Alive

Step2: Producer Send Message , say 9 messages

Step3: Kill Master

Step4: Producer Send Message , say another 9 messages

Step5: Kill Standby

Step6: Start Master 

Step7: Start Standby. 

What we see is that it sync with Master discarding its internal state , and we 
are able to consume only 9 messages, leading to a loss of 9 messages

Test Case - 2

Step1: Master & Standby Alive

Step2: Producer Send Message 

Step3: Kill Master

Step4: Producer Send Message 

Step5: Kill Standby

Step6: Start Standby ( it waits for Master )

Step7: Start Master (Question does it wait for slave ??)

Step8: Consume Message

Can someone provide any insights here regarding the potential message loss?

Also are there alternatives to a different topology we may use here to get 
around this issue?

Thanks

Neha

-- 

Clebert Suconic

RE: Potential message loss seen with HA topology in Artemis 2.6.2 on failback

Reply via email to