Hi everyone, I'm revisiting a thread from 2015 (https://www.mail-archive.com/[email protected]/msg00554.html) about achieving sub-second failover detection in HA clusters, and I'm curious about the current state of affairs nearly a decade later.
My Environment: - Corosync 3.1.6 - Pacemaker 2.1.2 - Architecture: 2-node cluster + QDevice (also testing 3-node setups) - Network: Dedicated physical NIC for cluster traffic (low-latency requirements) Specific Questions: 1. With modern Corosync/Pacemaker versions, is sub-second fault detection and failover initiation realistically achievable in production environments? 2. Are there any published measurements or community experiences showing the fastest stable failover times you've achieved? What's considered a reliable minimum time span? 3. Have there been significant enhancements in the newer versions of Corosync and Pacemaker (post-2015) that specifically target detection speed and failover latency? 4. If sub-second detection is possible, what are the key configuration parameters and potential trade-offs (false positives, network sensitivity, resource overhead)? Thanks in advance! Holger Haidinger
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
