Re: [Users] Controller HA mechanisms

Saha Sayandeb-G19428 Wed, 28 Nov 2007 09:23:50 -0800

Hans,

Comments below ...


> How does OpenSAF handle the following scenario:
> 
> - Controller 1 (C1) power on
> - C1 RDE starts and decides to be active since it is alone in 
> the cluster
> - C1 PSR or AMF dies due to some reason
> - Controller 2 (C2) power on
> - C2 RDE starts and gets the role standby from RDE on C1
> - C2 waits forever to get synced from C1
> 
> Some issues:
> C1 RDE claims to be active although it is not
> C1 does not reboot
> C2 does not reboot when its looses contact with the active 
> controller and not in sync.
> C2 cannot become active if we reboot C1
> 
> Comments?

[SS] I simulated this condition quite easily by simply killing the
ncs_scap process in the one and only active controller and then running
the get_ha_state command  and as you say the RDE in this controller
still keeps thinking that it is active which prevents the second
controller to obtain the active state. So this is a hole as the RDE has
no clue that the Avd+AvM has crashed. I guess we could add a role
heart-beat from the Avd+AvM to the RDE to ensure that the RDE is always
in-synch with what's going on and can relinquish the active state so
that the other controller can become active under such a circumstance.
But this whole scenario of having only one controller which crashes and
then the second one that tries to come up is probably not so common or
do you think it will be because of the way OpenSAF waits 3 minutes
before rebooting payload blades when AvD goes down?

Sayan

> Regards,
> Hans
> _______________________________________________
> Users mailing list
> [email protected]
> http://list.opensaf.org/maillist/listinfo/users
> 
_______________________________________________
Users mailing list
[email protected]
http://list.opensaf.org/maillist/listinfo/users

Re: [Users] Controller HA mechanisms

Reply via email to