Hi!

Generally I use "crm_mon -1Arfj” to see the cluster status, and I suspect it my 
be location restrictions or stickiness preventing resource balancing. Without 
config it’s hard to guess, however.

Kind regards,
Ulrich Windl

From: Users <users-boun...@clusterlabs.org> On Behalf Of chenzu...@gmail.com
Sent: Friday, June 6, 2025 10:20 AM
To: users <users@clusterlabs.org>
Subject: [EXT] [ClusterLabs] Resource is Unbalanced After Powering Off One Node



Hi all,
I am writing to report an issue with uneven resource migration in our Lustre 
cluster. Below are the details:

一 Background:
We have 3 physical nodes, each hosting 2 virtual machines: lustre-mds-nodexx 
(containing 2 MDTs) and lustre-oss-nodexx (containing 8 OSTs and MGS on one of 
them).
We are using Lustre version 2.15.5 along with Pacemaker(2.1.0) for cluster 
management.

二 Problem:
After powering off lustre-oss-node144 using the command virsh destroy 
lustre-oss-node144, the resources from lustre-oss-node144 did not migrate 
evenly. All resources migrated to lustre-oss-node31.

三 Resource Status Before and After powering off lustre-oss-node144:
Before :
[root@lustre-oss-node31 ~]# pcs status
Cluster name: oss_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: lustre-oss-node144 (version 2.1.7-5.el8_10-0f7f88312) - 
partition with quorum
  * Last updated: Fri Jun  6 14:10:54 2025 on lustre-oss-node31
  * Last change:  Fri Jun  6 14:06:46 2025 by root via root on lustre-oss-node31
  * 3 nodes configured
  * 28 resource instances configured

Node List:
  * Online: [ lustre-oss-node31 lustre-oss-node135 lustre-oss-node144 ]

Full List of Resources:
  * vmfence_lustre-oss-node31   (stonith:fence_xvm):     Started 
lustre-oss-node144
  * vmfence_lustre-oss-node144  (stonith:fence_xvm):     Started 
lustre-oss-node135
  * vmfence_lustre-oss-node135  (stonith:fence_xvm):     Started 
lustre-oss-node31
  * mgt (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-0       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-3       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-6       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-9       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-12      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-15      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-18      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-21      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-1       (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-4       (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-7       (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-10      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-13      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-16      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-19      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-22      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-2       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-5       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-8       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-11      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-14      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-17      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-20      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-23      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31

2 After
[root@lustre-oss-node31 ~]# date;pcs status
Fri Jun  6 14:12:50 CST 2025
Cluster name: oss_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: lustre-oss-node135 (version 2.1.7-5.el8_10-0f7f88312) - 
partition with quorum
  * Last updated: Fri Jun  6 14:12:50 2025 on lustre-oss-node31
  * Last change:  Fri Jun  6 14:06:46 2025 by root via root on lustre-oss-node31
  * 3 nodes configured
  * 28 resource instances configured

Node List:
  * Online: [ lustre-oss-node31 lustre-oss-node135 ]
  * OFFLINE: [ lustre-oss-node144 ]

Full List of Resources:
  * vmfence_lustre-oss-node31   (stonith:fence_xvm):     Started 
lustre-oss-node135
  * vmfence_lustre-oss-node144  (stonith:fence_xvm):     Started 
lustre-oss-node135
  * vmfence_lustre-oss-node135  (stonith:fence_xvm):     Started 
lustre-oss-node31
  * mgt (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-0       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-3       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-6       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-9       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-12      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-15      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-18      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-21      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-1       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-4       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-7       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-10      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-13      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-16      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-19      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-22      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-2       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-5       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-8       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-11      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-14      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-17      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-20      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-23      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

三 Logs:
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] 
(log_list_item)        notice: Actions: Move       ost-1                        
 (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] 
(log_list_item)        notice: Actions: Move       ost-4                        
 (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] 
(log_list_item)        notice: Actions: Move       ost-7                        
 (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] 
(log_list_item)        notice: Actions: Move       ost-10                       
 (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] 
(log_list_item)        notice: Actions: Move       ost-13                       
 (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] 
(log_list_item)        notice: Actions: Move       ost-16                       
 (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] 
(log_list_item)        notice: Actions: Move       ost-19                       
 (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] 
(log_list_item)        notice: Actions: Move       ost-22                       
 (  lustre-oss-node144 -> lustre-oss-node31 )

四 Attachments:
The attached files include the configuration(config.txt) and logs(node135.log) 
during the uneven migration.

Thank you for your attention and support.
Best regards


________________________________
chenzu...@gmail.com
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to