I have a 3-active/backup pair HA setup with each pair on a separate network
segment.

Seg 1: Active-1 and Backup-3 (backup for Active-3)
Seg 2: Active-2 and Backup-1 (backup for Active-1)
Seg 3: Active-3 and Backup-2 (backup for Active-2)

I am using the "vote-on-replication-failure = true" option to automatically
shutdown active nodes which have been network isolated.

If I disconnect network segment 1, Backup-1 on segment 2 properly announces
itself as Live. Active-3 however attempts to get quorum votes from both
Active-1 and Active-2, does not receive a reply from Active-1 (as that one
is on the same failed network segment as Backup-3) and shuts itself down
after 5 seconds with "Timeout waiting for quorum vote responses"

I have tried increasing the timeout to allow Backup-1 to complete becoming
Live and participating in Active-3's quorum request but Active-3 always
prints "Waiting 5 seconds for quorum vote results", independently of what
value I specify in the "quorum-vote-wait" option.

The Active-3 configuration is shown below:

<connectors>
        <connector name="netty-active-1">tcp://
192.168.2.20:61616?sslEnabled=true</connector>
        <connector name="netty-active-2">tcp://
192.168.2.21:61616?sslEnabled=true</connector>
        <connector name="netty-active-3">tcp://
192.168.2.22:61616?sslEnabled=true</connector>
        <connector name="netty-backup-1">tcp://
192.168.2.20:61716?sslEnabled=true</connector>
        <connector name="netty-backup-2">tcp://
192.168.2.21:61716?sslEnabled=true</connector>
        <connector name="netty-backup-3">tcp://
192.168.2.22:61716?sslEnabled=true</connector>
</connectors>

<cluster-connections>
        <cluster-connection name="my-cluster">
                <connector-ref>netty-active-3</connector-ref>
                <check-period>1000</check-period>
                <connection-ttl>5000</connection-ttl>
                <call-timeout>5000</call-timeout>
                <retry-interval>500</retry-interval>
                <retry-interval-multiplier>1.0</retry-interval-multiplier>
                <max-retry-interval>5000</max-retry-interval>
                <initial-connect-attempts>-1</initial-connect-attempts>
                <reconnect-attempts>-1</reconnect-attempts>
                <use-duplicate-detection>true</use-duplicate-detection>
                <message-load-balancing>ON_DEMAND</message-load-balancing>
                <max-hops>1</max-hops>
                <notification-interval>1000</notification-interval>
                <notification-attempts>2</notification-attempts>
                <static-connectors>
                        <connector-ref>netty-active-2</connector-ref>
                        <connector-ref>netty-active-3</connector-ref>
                        <connector-ref>netty-backup-1</connector-ref>
                        <connector-ref>netty-backup-2</connector-ref>
                        <connector-ref>netty-backup-3</connector-ref>
                </static-connectors>
        </cluster-connection>
</cluster-connections>

<ha-policy>
        <replication>
                <master>

<vote-on-replication-failure>true</vote-on-replication-failure>
                  <quorum-vote-wait>12</quorum-vote-wait>
                        <check-for-live-server>true</check-for-live-server>
                        <group-name>server3</group-name>
                </master>
        </replication>
</ha-policy>

How can I make Active-3 wait for Backup-1 to become live before shutting
down?

regards,
Lewis

Reply via email to