Hi Lewis,

the `quorum-vote-wait` parameter only affects nodes that are acting as
backup. It defines the time that the backup nodes will wait for quorum vote
responses and not time to wait before sending a quorum vote request. So
this parameter is not useful to allow Backup-1 to participate in the quorum
vote.

Anyway, I would not keep the Active-3 live without the quorum to avoid
split-brains. ARTEMIS-2716[1] should address your use case.

[1] https://issues.apache.org/jira/browse/ARTEMIS-2716

Regards,
Domenico

On Mon, 12 Jul 2021 at 06:10, Lewis Gardner <lewisgard...@gmail.com> wrote:

> I have a 3-active/backup pair HA setup with each pair on a separate network
> segment.
>
> Seg 1: Active-1 and Backup-3 (backup for Active-3)
> Seg 2: Active-2 and Backup-1 (backup for Active-1)
> Seg 3: Active-3 and Backup-2 (backup for Active-2)
>
> I am using the "vote-on-replication-failure = true" option to automatically
> shutdown active nodes which have been network isolated.
>
> If I disconnect network segment 1, Backup-1 on segment 2 properly announces
> itself as Live. Active-3 however attempts to get quorum votes from both
> Active-1 and Active-2, does not receive a reply from Active-1 (as that one
> is on the same failed network segment as Backup-3) and shuts itself down
> after 5 seconds with "Timeout waiting for quorum vote responses"
>
> I have tried increasing the timeout to allow Backup-1 to complete becoming
> Live and participating in Active-3's quorum request but Active-3 always
> prints "Waiting 5 seconds for quorum vote results", independently of what
> value I specify in the "quorum-vote-wait" option.
>
> The Active-3 configuration is shown below:
>
> <connectors>
>         <connector name="netty-active-1">tcp://
> 192.168.2.20:61616?sslEnabled=true</connector>
>         <connector name="netty-active-2">tcp://
> 192.168.2.21:61616?sslEnabled=true</connector>
>         <connector name="netty-active-3">tcp://
> 192.168.2.22:61616?sslEnabled=true</connector>
>         <connector name="netty-backup-1">tcp://
> 192.168.2.20:61716?sslEnabled=true</connector>
>         <connector name="netty-backup-2">tcp://
> 192.168.2.21:61716?sslEnabled=true</connector>
>         <connector name="netty-backup-3">tcp://
> 192.168.2.22:61716?sslEnabled=true</connector>
> </connectors>
>
> <cluster-connections>
>         <cluster-connection name="my-cluster">
>                 <connector-ref>netty-active-3</connector-ref>
>                 <check-period>1000</check-period>
>                 <connection-ttl>5000</connection-ttl>
>                 <call-timeout>5000</call-timeout>
>                 <retry-interval>500</retry-interval>
>                 <retry-interval-multiplier>1.0</retry-interval-multiplier>
>                 <max-retry-interval>5000</max-retry-interval>
>                 <initial-connect-attempts>-1</initial-connect-attempts>
>                 <reconnect-attempts>-1</reconnect-attempts>
>                 <use-duplicate-detection>true</use-duplicate-detection>
>                 <message-load-balancing>ON_DEMAND</message-load-balancing>
>                 <max-hops>1</max-hops>
>                 <notification-interval>1000</notification-interval>
>                 <notification-attempts>2</notification-attempts>
>                 <static-connectors>
>                         <connector-ref>netty-active-2</connector-ref>
>                         <connector-ref>netty-active-3</connector-ref>
>                         <connector-ref>netty-backup-1</connector-ref>
>                         <connector-ref>netty-backup-2</connector-ref>
>                         <connector-ref>netty-backup-3</connector-ref>
>                 </static-connectors>
>         </cluster-connection>
> </cluster-connections>
>
> <ha-policy>
>         <replication>
>                 <master>
>
> <vote-on-replication-failure>true</vote-on-replication-failure>
>                   <quorum-vote-wait>12</quorum-vote-wait>
>                         <check-for-live-server>true</check-for-live-server>
>                         <group-name>server3</group-name>
>                 </master>
>         </replication>
> </ha-policy>
>
> How can I make Active-3 wait for Backup-1 to become live before shutting
> down?
>
> regards,
> Lewis
>

Reply via email to