Is the amount of time different when the server goes down due to a graceful shutdown vs. a hard kill (kill -9 or equivalent)? Could what you're seeing be related to the amount of time it takes to detect that a TCP connection has been severed without a clean shutdown?
Tim On Nov 23, 2016 9:58 PM, "JasonHs" <hsiaja...@gmail.com> wrote: > Hi all, > > I'm running a 2 node Artemis cluster in replication mode. Everything is > running as it should, but we noticed a variance in the time it takes from > the 'backup' server to become 'live' when the primary server goes down. The > failover time can take a short as 10~20 seconds to upto a few minutes. > > I tried changing quite a few cluster-connection settings according to the > documentation, but had no luck so far. > > Also, is it recommended to run a 3 node cluster to avoid a split brain > scenario where both nodes in a 2-node cluster think they should be 'live' > in > a temporary network outage scenario. > > Thanks in advance, > Jason > > > > -- > View this message in context: http://activemq.2283324.n4. > nabble.com/Slow-failover-from-primary-to-backup-server-tp4719474.html > Sent from the ActiveMQ - User mailing list archive at Nabble.com. >