- **Milestone**: 4.5.2 --> 4.6.2
---
** [tickets:#1566] Cluster reset happened during switchover due to AMF director
heart beat timeout.**
**Status:** unassigned
**Milestone:** 4.6.2
**Created:** Sat Oct 24, 2015 06:25 AM UTC by Ritu Raj
**Last Updated:** Sat Oct 24, 2015 06:29 AM UTC
**Owner:** nobody
Changeset: 6901
70 nodes configured with PBE
Application: Nway configured on all the nodes
Issues Observed:
> Cluster reset happened during switchover due to AMF director heart beat
> timeout.
Steps Performed:
* AMF (Nway) application brought up on the nodes.
* Some operations are performed on Nway application hosted on PL-65 to PL-68.
* Stopped opensaf on the nodes PL-65 to PL-68.
* Two switchover performed on Cluster. First switchover succeded without any
issue. During second switchover old standby controller (SC-2) rebooted when
it is being promoted to ACTIVE state.
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmd[2505]: WA IMMD not re-electing coord
for switch-over (si-swap) coord at (2020f)
........
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafimmnd[2516]: NO Implementer (applier)
connected: 130 (@OpenSafImmReplicatorA) <10675, 2020f>
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: NO
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: ER
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Oct 22 15:45:10 SLES-64BIT-SLOT2 osafamfnd[2580]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131599, SupervisionTime = 60
Oct 22 15:45:10 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
* After SC-2 went for reboot, SC-1 tried to become active during witch AMF
director heart beat timeout and cluster reset happened.
Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfd[2557]: NO
'safRankedSu=safSu=dummy_NWay_1Norm_4\,safSg=SG_dummy_n\,safApp=N_6,safSi=dummy_NWay_1Norm_6,safApp=N_6'
Oct 22 15:54:53 SLES-64BIT-SLOT1 osafamfnd[2567]: ER AMF director heart beat
timeout, generating core for amfd
Oct 22 15:54:54 SLES-64BIT-SLOT1 osafamfnd[2567]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131343,
SupervisionTime = 60
Oct 22 15:54:54 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
timeout=60
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA MDS Send Failed
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: WA Error code 2 returned for
message type 16 - ignoring
Oct 22 15:54:55 SLES-64BIT-SLOT1 osafimmnd[2503]: NO Implementer locally
disconnected. Marking it as doomed 136 <4871, 2010f> (@safAmfService2010f)
* Traces are not availbale
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets