Re: Potential message loss seen with HA topology in Artemis 2.6.2 on failback

Clebert Suconic Wed, 18 Jul 2018 06:28:27 -0700

You could have another passive backup that would assume when M1 is
killed and it could become the backup.


But if the node is alone and you killed it. you need to start it first.

On Wed, Jul 18, 2018 at 9:27 AM, Clebert Suconic
<clebert.suco...@gmail.com> wrote:
> At the moment you have to start the latest server to be alive first.
>
> I know there's a task to compare age of the journals before
> synchronizing it.. but it's not done yet.
>
> On Tue, Jul 17, 2018 at 6:48 PM, Udayan Sahu <udayan.s...@oracle.com> wrote:
>> Its simple HA subsystem, with a simple ask in replicated state system, it
>> should start from last committed state…
>>
>>
>>
>> Step1: Master (M1) & Standby (S1) Alive
>>
>> Step2: Producer Send 10 Message à M1 receives it and replicates it to S1
>>
>> Step3: Kill Master ( M1) à It makes S1 as New Master
>>
>> Step4: Producer Send 10 Message à S1 receives messages and is not replicated
>> as M1 is Down
>>
>> Step5: Kill Standby ( S1 )
>>
>> Step6: Start Master ( M1 )
>>
>> Step7: Start Standby (S1) ( it sync with Master (M1) discarding its internal
>> state )
>>
>> This is wrong. M1 should sync with S1 since S1 represents the current state
>> of the queue.
>>
>>
>>
>> How can we protect Step 4 Messages being lost… We are using transacted
>> session and calling commit to make sure messages are persisted..
>>
>>
>>
>> --- Udayan Sahu
>>
>>
>>
>>
>>
>> From: Clebert Suconic [mailto:clebert.suco...@gmail.com]
>> Sent: Tuesday, July 17, 2018 2:50 PM
>> To: users@activemq.apache.org
>> Cc: Udayan Sahu <udayan.s...@oracle.com>
>> Subject: Re: Potential message loss seen with HA topology in Artemis 2.6.2
>> on failback
>>
>>
>>
>> Ha is about preserving the journals between failures.
>>
>>
>>
>> When you read and send messages you may still have an failure during the
>> reading.  I would need to understand what you do in case of a failure with
>> your consumer and producer.
>>
>>
>>
>> Retries on send and duplicate detection are key for your case.
>>
>>
>>
>> You could also play with XA and a transaction manager.
>>
>>
>>
>> On Tue, Jul 17, 2018 at 5:01 PM Neha Sareen <neha.sar...@oracle.com> wrote:
>>
>> Hi,
>>
>>
>>
>> We are setting up a cluster of 6 brokers using Artemis 2.6.2.
>>
>>
>>
>> The cluster has 3 groups.
>>
>> - Each group has one master, and one slave broker pair.
>>
>> - The HA uses replication.
>>
>> - Each master broker configuration has the flag 'check-for-live-server' set
>> to true.
>>
>> - Each slave broker configuration has the flag 'allow-failback' set to true.
>>
>> - We use static connectors for allowing cluster topology discovery.
>>
>> - Each broker's static connector list includes the connectors to the other 5
>> servers in the cluster.
>>
>> - Each broker declares its acceptor.
>>
>> - Each broker exports its own connector information via the  'connector-ref'
>> configuration element.
>>
>> - The acceptor and the connector URLs for each broker are identical with
>> respect to the host and port information
>>
>>
>>
>> We have a standalone test application that creates producers and
>>
>> consumers to write messages and receive messages respectively using a
>> transacted JMS session.
>>
>>
>>
>>> We are trying to execute an automatic failover test case followed by
>>> failback as follows:
>>
>> TestCase -1
>>
>> Step1: Master & Standby Alive
>>
>> Step2: Producer Send Message , say 9 messages
>>
>> Step3: Kill Master
>>
>> Step4: Producer Send Message , say another 9 messages
>>
>> Step5: Kill Standby
>>
>> Step6: Start Master
>>
>> Step7: Start Standby.
>>
>> What we see is that it sync with Master discarding its internal state , and
>> we are able to consume only 9 messages, leading to a loss of 9 messages
>>
>>
>>
>>
>>
>> Test Case - 2
>>
>> Step1: Master & Standby Alive
>>
>> Step2: Producer Send Message
>>
>> Step3: Kill Master
>>
>> Step4: Producer Send Message
>>
>> Step5: Kill Standby
>>
>> Step6: Start Standby ( it waits for Master )
>>
>> Step7: Start Master (Question does it wait for slave ??)
>>
>> Step8: Consume Message
>>
>>
>>
>> Can someone provide any insights here regarding the potential message loss?
>>
>> Also are there alternatives to a different topology we may use here to get
>> around this issue?
>>
>>
>>
>> Thanks
>>
>> Neha
>>
>>
>>
>> --
>>
>> Clebert Suconic
>
>
>
> --
> Clebert Suconic



-- 
Clebert Suconic

Re: Potential message loss seen with HA topology in Artemis 2.6.2 on failback

Reply via email to