On Thu, Jul 4, 2024 at 5:03 AM Artur Novik <freish...@gmail.com> wrote:
> Hi everybody, > I faced with a strange behavior and I since there was a lot of activity > around crm_node structs in 2.1.7, I want to believe that it's a regression > rather than a new behavior by default. > > "crm_node -i" occasionally, but very often, returns "*exit code 68* : Node > is not known to cluster". > > The quick test below (taken from two different clusters with pacemaker > 2.1.7 and 2.1.8): > > ``` > > [root@node1 ~]# crm_node -i > Node is not known to cluster > [root@node1 ~]# crm_node -i > 1 > [root@node1 ~]# crm_node -i > 1 > [root@node1 ~]# crm_node -i > Node is not known to cluster > [root@node1 ~]# for i in 1 2 3 4 5 6 7; do ssh node$i crm_node -i; done > 1 > 2 > Node is not known to cluster > Node is not known to cluster > 5 > Node is not known to cluster > 7 > [root@node1 ~]# for i in 1 2 3 4 5 6 7; do sleep 1; ssh node$i crm_node -i ; > done > Node is not known to cluster > Node is not known to cluster > Node is not known to cluster > Node is not known to cluster > Node is not known to cluster > 6 > 7 > > > [root@es-brick2 ~]# crm_node -i > 2 > [root@es-brick2 ~]# crm_node -i > 2 > [root@es-brick2 ~]# crm_node -i > Node is not known to cluster > [root@es-brick2 ~]# crm_node -i > 2 > [root@es-brick2 ~]# rpm -qa | grep pacemaker | sort > pacemaker-2.1.8.rc2-1.el8_10.x86_64 > pacemaker-cli-2.1.8.rc2-1.el8_10.x86_64 > pacemaker-cluster-libs-2.1.8.rc2-1.el8_10.x86_64 > pacemaker-libs-2.1.8.rc2-1.el8_10.x86_64 > pacemaker-remote-2.1.8.rc2-1.el8_10.x86_64 > pacemaker-schemas-2.1.8.rc2-1.el8_10.noarch > > ``` > > I checked next versions (all packages, except the last one, taken from > rocky linux and rebuilt against corosync 3.1.8 from rocky 8.10. The distro > itself rockylinux 8.10 too): > Pacemaker version Status > 2.1.5 (8.8) OK > 2.1.6 (8.9) OK > 2.1.7 (8.10) Broken > 2.1.8-RC2 (upstream) Broken > > I don't attach logs for now since I believe it could be reproduced > absolutely on any installation. > Hi, thanks for the report. I can try to reproduce on 2.1.8 later, but so far I'm unable to reproduce on the current upstream main branch. I don't believe there are any major differences in the relevant code between main and 2.1.8-rc2. I wonder if it's an issue where the controller is busy with a synchronous request when you run `crm_node -i` (which would be a bug). Can you share logs and your config? > Thanks, > A > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Regards, Reid Wahl (He/Him) Senior Software Engineer, Red Hat RHEL High Availability - Pacemaker
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/