Hans, Comments below ...
> How does OpenSAF handle the following scenario: > > - Controller 1 (C1) power on > - C1 RDE starts and decides to be active since it is alone in > the cluster > - C1 PSR or AMF dies due to some reason > - Controller 2 (C2) power on > - C2 RDE starts and gets the role standby from RDE on C1 > - C2 waits forever to get synced from C1 > > Some issues: > C1 RDE claims to be active although it is not > C1 does not reboot > C2 does not reboot when its looses contact with the active > controller and not in sync. > C2 cannot become active if we reboot C1 > > Comments? [SS] I simulated this condition quite easily by simply killing the ncs_scap process in the one and only active controller and then running the get_ha_state command and as you say the RDE in this controller still keeps thinking that it is active which prevents the second controller to obtain the active state. So this is a hole as the RDE has no clue that the Avd+AvM has crashed. I guess we could add a role heart-beat from the Avd+AvM to the RDE to ensure that the RDE is always in-synch with what's going on and can relinquish the active state so that the other controller can become active under such a circumstance. But this whole scenario of having only one controller which crashes and then the second one that tries to come up is probably not so common or do you think it will be because of the way OpenSAF waits 3 minutes before rebooting payload blades when AvD goes down? Sayan > Regards, > Hans > _______________________________________________ > Users mailing list > [email protected] > http://list.opensaf.org/maillist/listinfo/users > _______________________________________________ Users mailing list [email protected] http://list.opensaf.org/maillist/listinfo/users
