On 4/30/21 4:04 PM, Matthew Schumacher wrote:
On 4/21/21 11:04 AM, Matthew Schumacher wrote:
On 4/21/21 10:21 AM, Andrei Borzenkov wrote:
If I set the stickiness to 100 then it's a race condition, many
times we
get the storage layer migrated without VirtualDomain noticing, but if
the stickiness is not set, then moving a resource causes the
cluster to
re-balance and will cause the VM to fail every time because validation
is one of the first things we do when we migrate the VM, and it's
at the
same time as a IP-ZFS-iSCSI move so the config file goes away for 5
seconds.
I'm not sure how to fix this. The nodes don't have local storage that
Your nodes must have operating system and pacemaker stack loaded from
somewhere before they can import zfs pool.
Yup, and they do. There are plenty of ways to do this: internal SD
card, usb boot, pxe boot, etc.... I prefer this because I don't need
to maintain a boot drive, the nodes boot from the exact same image,
and I have gobs of memory so the running system can run in a
ramdisk. This also makes it possible to boot my nodes with failed
disks/controllers which makes troubleshooting easier. I basically
made a live CD distro that has everything I need.
I suppose the next step is to see if NFS has some sort of retry
mode so
That is what "hard" mount option is for.
Thanks, I'll take a look.
For others searching the list, I did figure this out. The problem was
the order I was loading the resources in.
This doesn't work because we start the failover IP before ZFS which
starts the NFS share. This causes there to be a split second where
the IP is listening for NFS requests, but the NFS server isn't running
yet, so the IP stack sends a RST which causes the NFS client to report
to the OS a hard failure which causes the VirtualDomain resource to
see an invalid config, and thus breaks things.
* Resource Group: IP-ZFS-iSCSI:
* fence-datastore (stonith:fence_scsi): Started node1
* failover-ip (ocf::heartbeat:IPaddr): Started node1
* zfs-datastore (ocf::heartbeat:ZFS): Started node1
* ZFSiSCSI (ocf::heartbeat:ZFSiSCSI): Started node1
If I change it to this, then NFS requests simply go unanswered and the
client retries until it can make a connection, which is responded to.
* Resource Group: IP-ZFS-iSCSI:
* fence-datastore (stonith:fence_scsi): Started node1
* zfs-datastore (ocf::heartbeat:ZFS): Started node1
* ZFSiSCSI (ocf::heartbeat:ZFSiSCSI): Started node1
* failover-ip (ocf::heartbeat:IPaddr): Started node1
Originally I didn't do it this way because my iscsi and nfs stack bind
to the failover IP and I was worried stuff wouldn't start until the IP
was configured, but that doesn't seam to be a problem.
Therefore the idea of using a firewall-rule to suppress the negative
response
while the IP can be up already.
And thanks for coming back once you got it to work ;-)
Klaus
Matt
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/