Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

Gabriele Bulfon Wed, 16 Dec 2020 12:06:13 -0800

Looking at the two logs, looks like corosync decided that xst1 was offline, 
while xst was still online.
I just issued an "ifconfig ha0 down" on xst1, so I expect both nodes cannot see 
other one, while I see these same lines both on xst1 and xst2 log:
 
ec 16 15:08:56 [667]    pengine:  warning: pe_fence_node:      Cluster node 
xstha1 will be fenced: peer is no longer part of the cluster
Dec 16 15:08:56 [667]    pengine:  warning: determine_online_status:    Node 
xstha1 is unclean
Dec 16 15:08:56 [667]    pengine:     info: determine_online_status_fencing:    
Node xstha2 is active
Dec 16 15:08:56 [667]    pengine:     info: determine_online_status:    Node 
xstha2 is online
 
why xst2 and not xst1?
I would expect no action at all in this case, until stonith is done...
While it goes on with :
 
Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action 
xstha1_san0_IP_stop_0 on xstha1 is unrunnable (offline)
Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action 
zpool_data_stop_0 on xstha1 is unrunnable (offline)
Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action 
xstha2-stonith_stop_0 on xstha1 is unrunnable (offline)
Dec 16 15:08:56 [667]    pengine:  warning: custom_action:      Action 
xstha2-stonith_stop_0 on xstha1 is unrunnable (offline)
 
trying to stop everythin on xst1 (but it's not runnable).
Then:
 
Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Move       
xstha1_san0_IP     ( xstha1 -> xstha2 )
Dec 16 15:08:56 [667]    pengine:     info: LogActions: Leave   xstha2_san0_IP  
(Started xstha2)
Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Move       
zpool_data         ( xstha1 -> xstha2 )
Dec 16 15:08:56 [667]    pengine:     info: LogActions: Leave   xstha1-stonith  
(Started xstha2)
Dec 16 15:08:56 [667]    pengine:   notice: LogAction:   * Stop       
xstha2-stonith     (           xstha1 )   due to node availability
 
as if xst2 has been elected to be the running node, not knowing xst1 will kill 
xst2 within few seconds.
 
What is wrong here?
 
Thanks!
Gabriele
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets


 


Da: Gabriele Bulfon <gbul...@sonicle.com>
A: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
Data: 16 dicembre 2020 15.56.28 CET
Oggetto: Re: [ClusterLabs] Antw: [EXT] delaying start of a resource



 
Thanks, here are the logs, there are infos about how it tried to start 
resources on the nodes.
Keep in mind the node1 was already running the resources, and I simulated a 
problem by turning down the ha interface.
 
Gabriele
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
 




----------------------------------------------------------------------------------

Da: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>
A: users@clusterlabs.org 
Data: 16 dicembre 2020 15.45.36 CET
Oggetto: [ClusterLabs] Antw: [EXT] delaying start of a resource


>>> Gabriele Bulfon <gbul...@sonicle.com> schrieb am 16.12.2020 um 15:32 in
Nachricht <1523391015.734.1608129155836@www>:
> Hi, I have now a two node cluster using stonith with different 
> pcmk_delay_base, so that node 1 has priority to stonith node 2 in case of 
> problems.
> 
> Though, there is still one problem: once node 2 delays its stonith action 
> for 10 seconds, and node 1 just 1, node 2 does not delay start of resources, 
> so it happens that while it's not yet powered off by node 1 (and waiting its 
> dalay to power off node 1) it actually starts resources, causing a moment of 
> few seconds where both NFS IP and ZFS pool (!!!!!) is mounted by both!

AFAIK pacemaker will not start resources on a node that is scheduled for 
stonith. Even more: Pacemaker will tra to stop resources on a node scheduled 
for stonith to start them elsewhere.

> How can I delay node 2 resource start until the delayed stonith action is 
> done? Or how can I just delay the resource start so I can make it larger than 
> its pcmk_delay_base?

We probably need to see logs and configs to understand.

> 
> Also, I was suggested to set "stonith-enabled=true", but I don't know where 
> to set this flag (cib-bootstrap-options is not happy with it...).

I think it's on by default, so you must have set it to false.
In crm shell it is "configure# property stonith-enabled=...".

Regards,
Ulrich


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________Manage your 
subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs 
home: https://www.clusterlabs.org/

<<stonith1.txt>>
<<stonith2.txt>>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] delaying start of a resource

Reply via email to