Hi,

On 01/08/2022 16:18, john tillman wrote:
"john tillman" <jo...@panix.com> schrieb am 29.07.2022 um 22:51 in
Nachricht
<beb30bf64d4c615aff6034000038118c.squir...@mail.panix.com>:
On Thursday 28 July 2022 at 22:17:01, john tillman wrote:

I have a two cluster setup with a qdevice. 'pcs quorum status' from a
cluster node shows the qdevice casting a vote.  On the qdevice node
'corosync‑qnetd‑tool ‑s' says I have 2 connected clients and 1
cluster.
The vote count looks correct when I shutdown either one of the
cluster
nodes or the qdevice.  So the voting seems to be working at this
point.

Indeed ‑ shutting down 1 of 3 nodes leaves quorum intact, therefore
everything
still awake knows what's going on.

 From this state, if I reboot both my cluster nodes at the same time

Ugh!

but leave the qdevice node running, the cluster will not see the
qdevice
when the nodes come back up: 'pcs quorum status' show 3 votes
expected
but
only 2 votes cast (from the cluster nodes).

I would think this is to be expected, since if you reboot 2 out of 3
nodes,
you completely lose quorum, so the single node left has no idea what
to
trust
when the other nodes return.

No, no.  I do have quorum after the reboots.  It is courtesy of the 2
cluster nodes casting their quorum votes.  However, the qdevice is not
casting a vote so I am down to 2 out of 3 nodes.

And the qdevice is not part of the cluster.  It will never have any
resources running on it.  Its job is just to vote.

‑John


I thought maybe the problem was that the network wasn't ready when
corosync.service started so I forced a "ExecStartPre=/usr/bin/sleep 10"
into it but that didn't change anything.

This type of fix is broken anyway: You are not delaying, you are waiting
for
an event (network up).
Basically the OS distribution should have configured it correctly already.

In SLES15 there is:
Requires=network-online.target
After=network-online.target


Thank you for the response.

Yes, I saw that those values were correctly set in the service
configuration file for corosync.  The delay was just a test. I just wanted
to make sure that it wasn't a race condition of bringing up the bond and
trying to connect to the quorum node.

I was grep'ing the corosync log for VOTEQ entries and noticed when it
works I see consecutively:
... [VOTEQ ] Sending quorum callback, quorate = 0
... [VOTEQ ] Received qdevice op 1 req from node 1 [QDevice]
When it does not work I never see 'Received qdevice...' line in the log.
Is there something else I can look for to find this problem?  Some other
test you can think of?  Maybe some configuration of the votequorum
service?

maybe good start is to get cluster into state of "non working" qdevice and then paste:
- /var/log/messages of corosync/qdevice
- output of `corosync-qdevice-tool -sv` (from nodes) and `corosync-qnetd-tool -lv` (from machine where qnetd is running)

"Received qdevice op 1 req from node 1 [QDevice]" it means qdevice is registered (= corosync-qdevice was started) - if line is really missing it can mean corosync-qdevice is not running - log or running `corosync-qdevice -f -d` should give some insights why it is not running.

Honza





I could still use some advice with debugging this oddity.  Or have I
used
up my quota of questions this year :‑)

‑John


Starting from a situation such as this, your only hope is to rebuilt
the
cluster from scratch, IMHO.


Antony.

‑‑
Police have found a cartoonist dead in his house.  They say that
details
are
currently sketchy.

                                                    Please reply to the
list;
                                                          please
*don't*
CC
me.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to