On 5/29/21 12:21 AM, Strahil Nikolov wrote:
I agree -> fencing is mandatory.
Agreed that with proper fencing setup the cluster
wouldn'thave run into that state.
But still it might be interesting to find out what has
happened. Not seeing anything in the log snippet either.
Assuming you are running something systemd-based.
Did you check the journal for pacemaker to see what
systemd is thinking?
With the standard unit-file systemd should observe
pacemakerd and restart it if it goes away ungracefully.
You should be able to test this behavior sending a
SIGKILL to pacemakerd.
pacemakerd in turn watches out for signals from the
sub-daemons it has spawned (I'm currently working
on more in-depth observation here.).
So just disappearing shouldn't happen that easily.
Did you find any core-dumps?

Regards,
Klaus

You can enable the debug logs by editing corosync.conf or /etc/sysconfig/pacemaker.

In case simple reload doesn't work, you can set the cluster in global maintenance, stop and then start the stack.


Best Regards,
Strahil Nikolov

    On Fri, May 28, 2021 at 22:13, Digimer
    <li...@alteeve.ca> wrote:
    On 2021-05-28 3:08 p.m., Eric Robinson wrote:
    >
    >> -----Original Message-----
    >> From: Digimer <li...@alteeve.ca <mailto:li...@alteeve.ca>>
    >> Sent: Friday, May 28, 2021 12:43 PM
    >> To: Cluster Labs - All topics related to open-source clustering
    welcomed
    >> <users@clusterlabs.org <mailto:users@clusterlabs.org>>; Eric
    Robinson <eric.robin...@psmnv.com
    <mailto:eric.robin...@psmnv.com>>; Strahil
    >> Nikolov <hunter86...@yahoo.com <mailto:hunter86...@yahoo.com>>
    >> Subject: Re: [ClusterLabs] Cluster Stopped, No Messages?
    >>
    >> Shared storage is not what triggers the need for fencing.
    Coordinating actions
    >> is what triggers the need. Specifically; If you can run
    resource on both/all
    >> nodes at the same time, you don't need HA. If you can't, you
    need fencing.
    >>
    >> Digimer
    >
    > Thanks. That said, there is no fencing, so any thoughts on why
    the node behaved the way it did?

    Without fencing, when a communication or membership issues arises,
    it's
    hard to predict what will happen.

    I don't see anything in the short log snippet to indicate what
    happened.
    What's on the other node during the event? When did the node disappear
    and when was it rejoined, to help find relevant log entries?

    Going forward, if you want predictable and reliable operation,
    implement
    fencing asap. Fencing is required.


-- Digimer
    Papers and Projects: https://alteeve.com/w/ <https://alteeve.com/w/>
    "I am, somehow, less interested in the weight and convolutions of
    Einstein’s brain than in the near certainty that people of equal
    talent
    have lived and died in cotton fields and sweatshops." - Stephen
    Jay Gould


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to