Hi hans,
In a system the controller nodes that will have RDE/SCAP will power on by self. Case both blades in the box, the power to the box is applied: ------------------------------------------------------------- Sub case1: ---------- RDE on node 1 becomes active. SCAP on node 1 starts but fails to complete Init successfully. We expect the platform vendor porting openSAF to configure NID config to Reboot the node on failure or have his platform mechanisms do that for him. Sub case2: ---------- RDE on node 1 becomes active. SCAP on node 1 starts completes Init successfully. Immediately afterwards crashes. Since the two nodes node 1 and node 2 were in the box when the power was applied We expect that given small variations in the boot times node 2 will be at SCAP initialization Before node 1 is successfully initialized. Since the other RDE/SCAP is there this Situation is also solved. Case one blade in the box, the power to the box is applied: ---------------------------------------------------------------- Sub case1: ---------- RDE on node 1 becomes active. SCAP on node 1 starts but fails to complete Init successfully. We expect the platform vendor porting openSAF to configure NID config to Reboot the node on failure or have his platform mechanisms do that for him. Sub case2: ---------- RDE on node 1 becomes active. SCAP on node 1 starts completes Init successfully. Immediately afterwards crashes. This is a double fault case a manual repair of restarting this single node Is required. If the platform is normally run like this then the platform Vendor can have his fault manager track SCAP and on its death take the Necessary recover/repair actions. Regards Sugadeesh > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Hans Feldt > Sent: Thursday, November 29, 2007 2:43 AM > To: Saha Sayandeb-G19428 > Cc: [email protected] > Subject: Re: [Users] Controller HA mechanisms > > > > > -----Original Message----- > > From: Saha Sayandeb-G19428 [mailto:[EMAIL PROTECTED] > > Sent: den 28 november 2007 18:23 > > To: Hans Feldt > > Cc: [email protected] > > Subject: RE: [Users] Controller HA mechanisms > > > > Hans, > > > > Comments below ... > > > > > How does OpenSAF handle the following scenario: > > > > > > - Controller 1 (C1) power on > > > - C1 RDE starts and decides to be active since it is alone in the > > > cluster > > > - C1 PSR or AMF dies due to some reason > > > - Controller 2 (C2) power on > > > - C2 RDE starts and gets the role standby from RDE on C1 > > > - C2 waits forever to get synced from C1 > > > > > > Some issues: > > > C1 RDE claims to be active although it is not > > > C1 does not reboot > > > C2 does not reboot when its looses contact with the active > > controller > > > and not in sync. > > > C2 cannot become active if we reboot C1 > > > > > > Comments? > > > > [SS] I simulated this condition quite easily by simply killing the > > ncs_scap process in the one and only active controller and then > > running the get_ha_state command and as you say the RDE in this > > controller still keeps thinking that it is active which > prevents the > > second controller to obtain the active state. So this is a > hole as the > > RDE has no clue that the Avd+AvM has crashed. I guess we > could add a > > role heart-beat from the Avd+AvM to the RDE to ensure that > the RDE is > > always in-synch with what's going on and can relinquish the active > > state so that the other controller can become active under such a > > circumstance. > > But this whole scenario of having only one controller which crashes > > and then the second one that tries to come up is probably not so > > common or do you think it will be because of the way > OpenSAF waits 3 > > minutes before rebooting payload blades when AvD goes down? > > No I just stumbled on this since we're doing a lot power > on/off of controllers and fail-overs at the moment. > > As a solution, what if nid stays alive and supervise its > children? If rde or scap dies, nid reboots the system. > > Cheers, > Hans > > > > > Sayan > > > > > Regards, > > > Hans > > > _______________________________________________ > > > Users mailing list > > > [email protected] > > > http://list.opensaf.org/maillist/listinfo/users > > > > > > _______________________________________________ > Users mailing list > [email protected] > http://list.opensaf.org/maillist/listinfo/users > _______________________________________________ Users mailing list [email protected] http://list.opensaf.org/maillist/listinfo/users
