- **Milestone**: 4.5.2 --> 4.6.2


---

** [tickets:#1566] Cluster reset happened during switchover due to AMF director 
heart beat timeout.**

**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Sat Oct 24, 2015 06:25 AM UTC by Ritu Raj
**Last Updated:** Sat Oct 24, 2015 06:29 AM UTC
**Owner:** nobody


Changeset: 6901
70 nodes configured with PBE 
Application: Nway configured on all the nodes

Issues Observed:
> Cluster reset happened during switchover due to AMF director heart beat 
> timeout.

Steps Performed:
* AMF (Nway) application brought up on the nodes.
* Some operations are performed on Nway application hosted on PL-65 to PL-68.
* Stopped opensaf on the nodes PL-65 to PL-68.
* Two switchover performed on Cluster.  First switchover succeded without any 
issue. During second switchover old  standby controller (SC-2) rebooted when  
it is being promoted to ACTIVE state.

Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigning 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmd[2505]: WA IMMD not re-electing coord 
for switch-over (si-swap) coord at (2020f)
........

Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmnd[2516]: NO Implementer (applier) 
connected: 130 (@OpenSafImmReplicatorA) <10675, 2020f>
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigned 
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: ER 
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60
Oct 22 15:45:10 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; 
timeout=60

* After SC-2 went for reboot, SC-1 tried to become active during witch AMF 
director heart beat timeout and cluster reset happened.

Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfd[2557]: NO 
'safRankedSu=safSu=dummy_NWay_1Norm_4\,safSg=SG_dummy_n\,safApp=N_6,safSi=dummy_NWay_1Norm_6,safApp=N_6'
Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfnd[2567]: ER AMF director heart beat 
timeout, generating core for amfd
Oct 22 15:54:54 SLES-64BIT-SLOT1 osafamfnd[2567]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343, 
SupervisionTime = 60
Oct 22 15:54:54 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; 
timeout=60
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA MDS Send Failed
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA Error code 2 returned for 
message type 16 - ignoring
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: NO Implementer locally 
disconnected. Marking it as doomed 136 <4871, 2010f> (@safAmfService2010f)

* Traces are not availbale


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to