Hi Nags,
Do you agree with the point I added to this ticket?:
The likely cause is that an RT update is attempted by AMFD using
the oi-handle after it has released implementer and before it has
restored that implementer. An saImmOiRtObjectUpdate with an oi-handle
that has no implementer-name or an applier-name will result in error.
The AMFD should maintain a state variable describing the detailed
state of its oi-handle. Not simply initialized or not but also primary-
implementer-is-set or applier-is-set or no-implementer-is-set or
handle-not-initialized.
That is, yes you probably do need some new state associated with the oi-handle,
to keep track of if the handle has implementer-set on it or not yet.
I assume that the AMF thread for handling BAD_HANDLE only re-initializes the
oi-handle
and does not set-implementer. So the only thing that the bad-handle thread
needs to do
is to set your new handle-info-state-for-imple-set-or-not to false.
This it should do even if it is the bad-handle thread that continues with the
task of setting implementer.
I am not sure what you mean by "Imm should(rather must) not give Bad_handle/TO
in regular cases".
Currently the imm returns BAD_HANLDE for.
1 - the interface-violation case specified by SAF for invalid handle (e.g.
handle was closed or never initialized).
2 - the interface violation case specified by SAF where the handle is valid
but not in correct state
(the case of this ticket, handle is initialized but implementer has
not been set for the oi-handle when an
oi operation is done).
3 - the "handle closed by server side" case needed for OpenSAF, i.e. IMMND
restarted.
I am not sure which of these you call "regular" and that you dont want to get
bad-handle for :-)
I added in this ticket a reference to ticket #1064 (enhancement) indicating
that for the state error case (2)
we should instead return one of the unambiguous state error codes:
BAD_OPERATION.
But this ticket is not really about 'handling* case 2.
It is about fixing AMFD so that case (2) never happens.
After all, both case 1 and case 2 are application bug cases, i.e. cases where
it makes no sense to
write code for "handling" the cases.
The interface violation caes should be eliminated so that the AMFD can assume
that ALL cases of BAD_HANDLE
are of type (3) and not a bug in AMFD that it tries to compensate for.
Does this make more sense ?
/AndersBj
________________________________
From: Nagendra Kumar [mailto:nagendr...@users.sf.net]
Sent: den 18 september 2014 12:55
To: [opensaf:tickets]
Subject: [opensaf:tickets] #707 Quiesced controller failed to become Active
when the standby controller rebooted in middle of switchover
Hi Anders,
This ticket needs synchronization between Amfd thread and thread being spawned
for imm apis for handling bad_handle.
I am not sure whether to keep mutex as it will make any way Amfd thread waiting.
Since most of the flows hits imm interactions, it is bound to delay Amfd HA.
So, what is the advantages of reinitializing imm in a separate thread for Amf ?
Rather, Imm should(rather must) not give Bad_handle/TO in regular cases.
-Nagu
________________________________
[tickets:#707]<http://sourceforge.net/p/opensaf/tickets/707> Quiesced
controller failed to become Active when the standby controller rebooted in
middle of switchover
Status: unassigned
Milestone: 4.3.3
Created: Fri Jan 03, 2014 03:34 PM UTC by Sirisha Alla
Last Updated: Thu Sep 11, 2014 01:29 PM UTC
Owner: Nagendra Kumar
The issue is observed on changeset 4733 + #220 patches corresponding to cs 4741
and cs 4742. The test setup is a 4 node SLES 64bit VMs.The setup is single PBE
enabled loaded with 25k objects.
Following is the steps followed to reproduce the issue.
1) Trigger middleware switchover. Make sure that the IMMND coordinator is on
standby controller before triggering switchover
2) Reboot the standby controller when the active has just moved to quiesced
The test is tried multiple times and different errors are seen each time
1) AMFD received BAD_HANDLE from IMM. Here SLOT2(SC-2) is the active controller
at the beginning of the test
Jan 3 14:42:13 SLES-64BIT-SLOT2 osafimmpbed: NO Successfully opened
pre-existing sqlite pbe file /home/sirisha/immsv/immpbe/imm.db
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Failed to stop cluster
tracking 5
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ClmTrack stop failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafrded[2375]: NO rde_rde_set_role: role set
to 3
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Node 'SC-1' left the cluster
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8884]: NO exiting on signal 15
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 74
<445, 2020f> (@OpenSafImmReplicatorB)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfd[2430]: NO handle_state_ntfimcn:
osafntfimcnd process terminated. State change
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 77
(safMsgGrpService) <320, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 78
(safCheckPointService) <304, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 79
(safEvtService) <305, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 80
(safLckService) <303, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Backup create cmd =
/usr/lib64/opensaf/smf-backup-create
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Bundle check cmd =
/usr/lib64/opensaf/smf-bundle-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO FAILOVER Quiesced --> Active
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 81
(MsgQueueService131343) <451, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node check cmd =
/usr/lib64/opensaf/smf-node-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ncs_mbcsv_svc
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally
disconnected. Marking it as doomed 81 <451, 2020f> (MsgQueueService131343)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF repository check cmd =
/usr/lib64/opensaf/smf-repository-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 81
<451, 2020f> (MsgQueueService131343)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cluster reboot cmd =
/usr/lib64/opensaf/smf-cluster-reboot
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer (applier)
connected: 82 (@OpenSafImmReplicatorA) <453, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Admin Op Timeout =
600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected 59
<11, 2020f> (safAmfService)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cli Timeout = 600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Re-initializing with IMM
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8918]: NO Started
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Reboot Timeout = 600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 83
(safAmfService) <11, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF will use the STEP
standard set of actions.
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Impl Set Failed for
SaAmfCompBaseType, returned 9
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO DN for si_swap operation =
safSi=SC-2N,safApp=OpenSAF
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER exiting since
avd_imm_impl_set failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SI si_swap operation max
retry = 200
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Max num of campaign restarts
= 10
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO IMM persist command =
immdump /etc/opensaf/imm.xml
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node reboot cmd = reboot
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Turn PBE off during upgrade
= 1
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Enable = 0
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Timeout = 100000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 84
(safSmfService) <299, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: ER AMF director unexpectedly
crashed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally
disconnected. Marking it as doomed 83 <11, 2020f> (safAmfService)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131599, SupervisionTime = 60
2) AMFD received ERR_LIBRARY from IMM. Here SLOT2(SC-2) is the active
controller at the beginning of the test
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafrded[2359]: NO rde_rde_set_role: role set
to 3
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Node 'SC-1' left the cluster
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafntfimcnd[2991]: NO exiting on signal 15
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 30
(safMsgGrpService) <315, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 31
(safCheckPointService) <332, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 26
<453, 2020f> (@OpenSafImmReplicatorA)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafntfd[2418]: NO handle_state_ntfimcn:
osafntfimcnd process terminated. State change
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 32
(safLckService) <316, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 33
(safEvtService) <331, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Backup create cmd =
/usr/lib64/opensaf/smf-backup-create
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Bundle check cmd =
/usr/lib64/opensaf/smf-bundle-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node check cmd =
/usr/lib64/opensaf/smf-node-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF repository check cmd =
/usr/lib64/opensaf/smf-repository-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cluster reboot cmd =
/usr/lib64/opensaf/smf-cluster-reboot
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Admin Op Timeout =
600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cli Timeout = 600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Reboot Timeout = 600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF will use the STEP
standard set of actions.
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO DN for si_swap operation =
safSi=SC-2N,safApp=OpenSAF
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SI si_swap operation max
retry = 200
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Max num of campaign restarts
= 10
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO IMM persist command =
immdump /etc/opensaf/imm.xml
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node reboot cmd = reboot
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Turn PBE off during upgrade
= 1
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Enable = 0
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Timeout = 100000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO FAILOVER Quiesced --> Active
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER ncs_mbcsv_svc
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 34
(MsgQueueService131343) <456, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer locally
disconnected. Marking it as doomed 34 <456, 2020f> (MsgQueueService131343)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 4
<22, 2020f> (safAmfService)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Re-initializing with IMM
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER saImmOiImplementerSet failed
2
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER exiting since
avd_imm_impl_set failed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: ER AMF director unexpectedly
crashed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131599, SupervisionTime = 60
Jan 3 15:28:28 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node; timeout=60
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 34
<456, 2020f> (MsgQueueService131343)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: WA IMMND - Client Node Get
Failed for cli_hdl 94489412111
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init
Becoming an applier failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init() Fail
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for
message type 6 - ignoring
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA ERR_BAD_HANDLE: Client
1967095153167 not found in server
Jan 3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init() Fail
Jan 3 15:28:31 SLES-64BIT-SLOT2 kernel: [ 198.527931] md: stopping all md
devices.
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for
message type 40 - ignoring
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init() Fail
3) AMFD received ERR_TIMEOUT from IMM. Here SLOT1(SC-1) is the active
controller at the beginning of the test
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmd[3806]: NO Coord re-elected, resides at
2010f
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO This IMMND re-elected coord
redundantly, failover ?
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected 25
<4, 2010f> (@safLogService)
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 28
(safClmService) <15, 2010f>
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 29
(safLogService) <4, 2010f>
Jan 3 15:25:06 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafrded[3787]: NO rde_rde_set_role: role set
to 1
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafclmd[3860]: NO ACTIVE request
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: ER FAILOVER Active --> Quiesced
FAILED, ImplementerClear failed 5
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: role.cc:583:
avd_mds_qsd_role_evh: Assertion '0' failed.
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: ER AMF director unexpectedly
crashed
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131343, SupervisionTime = 60
Jan 3 15:25:13 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node; timeout=60
Jan 3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer locally
disconnected. Marking it as doomed 4 <21, 2010f> (safAmfService)
Jan 3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected 4
<21, 2010f> (safAmfService)
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1471.089956] md: stopping all md
devices.
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.120172] sd 0:0:0:0: [sda]
Synchronizing SCSI cache
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.219424] ohci_hcd 0000:00:06.0:
PCI INT A disabled
Jan 3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER clms_mds_msg_send FAILED: 2
Jan 3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER clms_clma_api_msg_dispatcher
FAILED: type 0
There are no traces enabled when issue(1) is observed. Issue(3) could be the
same issue as #405.
________________________________
Sent from sourceforge.net because you indicated interest in
https://sourceforge.net/p/opensaf/tickets/707/<https://sourceforge.net/p/opensaf/tickets/707>
To unsubscribe from further messages, please visit
https://sourceforge.net/auth/subscriptions/<https://sourceforge.net/auth/subscriptions>
---
** [tickets:#707] Quiesced controller failed to become Active when the standby
controller rebooted in middle of switchover**
**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Fri Jan 03, 2014 03:34 PM UTC by Sirisha Alla
**Last Updated:** Thu Sep 18, 2014 10:54 AM UTC
**Owner:** Nagendra Kumar
The issue is observed on changeset 4733 + #220 patches corresponding to cs 4741
and cs 4742. The test setup is a 4 node SLES 64bit VMs.The setup is single PBE
enabled loaded with 25k objects.
Following is the steps followed to reproduce the issue.
1) Trigger middleware switchover. Make sure that the IMMND coordinator is on
standby controller before triggering switchover
2) Reboot the standby controller when the active has just moved to quiesced
The test is tried multiple times and different errors are seen each time
1) AMFD received BAD_HANDLE from IMM. Here SLOT2(SC-2) is the active controller
at the beginning of the test
Jan 3 14:42:13 SLES-64BIT-SLOT2 osafimmpbed: NO Successfully opened
pre-existing sqlite pbe file /home/sirisha/immsv/immpbe/imm.db
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Failed to stop cluster
tracking 5
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ClmTrack stop failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafrded[2375]: NO rde_rde_set_role: role set
to 3
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Node 'SC-1' left the cluster
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8884]: NO exiting on signal 15
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected
74 <445, 2020f> (@OpenSafImmReplicatorB)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfd[2430]: NO handle_state_ntfimcn:
osafntfimcnd process terminated. State change
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 77
(safMsgGrpService) <320, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 78
(safCheckPointService) <304, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 79
(safEvtService) <305, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 80
(safLckService) <303, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Backup create cmd =
/usr/lib64/opensaf/smf-backup-create
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Bundle check cmd =
/usr/lib64/opensaf/smf-bundle-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO FAILOVER Quiesced --> Active
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 81
(MsgQueueService131343) <451, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node check cmd =
/usr/lib64/opensaf/smf-node-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER ncs_mbcsv_svc
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally
disconnected. Marking it as doomed 81 <451, 2020f> (MsgQueueService131343)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF repository check cmd =
/usr/lib64/opensaf/smf-repository-check
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected
81 <451, 2020f> (MsgQueueService131343)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cluster reboot cmd =
/usr/lib64/opensaf/smf-cluster-reboot
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer (applier)
connected: 82 (@OpenSafImmReplicatorA) <453, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Admin Op Timeout =
600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer disconnected
59 <11, 2020f> (safAmfService)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Cli Timeout = 600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: NO Re-initializing with IMM
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafntfimcnd[8918]: NO Started
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Reboot Timeout =
600000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 83
(safAmfService) <11, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SMF will use the STEP
standard set of actions.
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER Impl Set Failed for
SaAmfCompBaseType, returned 9
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO DN for si_swap operation =
safSi=SC-2N,safApp=OpenSAF
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfd[2463]: ER exiting since
avd_imm_impl_set failed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO SI si_swap operation max
retry = 200
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Max num of campaign
restarts = 10
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO IMM persist command =
immdump /etc/opensaf/imm.xml
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Node reboot cmd = reboot
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Turn PBE off during upgrade
= 1
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Enable = 0
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafsmfd[2492]: NO Verify Timeout =
100000000000
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer connected: 84
(safSmfService) <299, 2020f>
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: ER AMF director unexpectedly
crashed
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafimmnd[2404]: NO Implementer locally
disconnected. Marking it as doomed 83 <11, 2020f> (safAmfService)
Jan 3 14:42:14 SLES-64BIT-SLOT2 osafamfnd[2473]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131599, SupervisionTime = 60
2) AMFD received ERR_LIBRARY from IMM. Here SLOT2(SC-2) is the active
controller at the beginning of the test
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafrded[2359]: NO rde_rde_set_role: role set
to 3
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Node 'SC-1' left the cluster
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafntfimcnd[2991]: NO exiting on signal 15
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 30
(safMsgGrpService) <315, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 31
(safCheckPointService) <332, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected
26 <453, 2020f> (@OpenSafImmReplicatorA)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafntfd[2418]: NO handle_state_ntfimcn:
osafntfimcnd process terminated. State change
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 32
(safLckService) <316, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 33
(safEvtService) <331, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Backup create cmd =
/usr/lib64/opensaf/smf-backup-create
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Bundle check cmd =
/usr/lib64/opensaf/smf-bundle-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node check cmd =
/usr/lib64/opensaf/smf-node-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF repository check cmd =
/usr/lib64/opensaf/smf-repository-check
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cluster reboot cmd =
/usr/lib64/opensaf/smf-cluster-reboot
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Admin Op Timeout =
600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Cli Timeout = 600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Reboot Timeout =
600000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SMF will use the STEP
standard set of actions.
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO DN for si_swap operation =
safSi=SC-2N,safApp=OpenSAF
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO SI si_swap operation max
retry = 200
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Max num of campaign
restarts = 10
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO IMM persist command =
immdump /etc/opensaf/imm.xml
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Node reboot cmd = reboot
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Turn PBE off during upgrade
= 1
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Enable = 0
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafsmfd[2485]: NO Verify Timeout =
100000000000
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO FAILOVER Quiesced --> Active
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER ncs_mbcsv_svc
NCS_MBCSV_OP_CHG_ROLE 1 failed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer connected: 34
(MsgQueueService131343) <456, 2020f>
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer locally
disconnected. Marking it as doomed 34 <456, 2020f> (MsgQueueService131343)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected 4
<22, 2020f> (safAmfService)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: NO Re-initializing with IMM
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER saImmOiImplementerSet
failed 2
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfd[2454]: ER exiting since
avd_imm_impl_set failed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: ER AMF director unexpectedly
crashed
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafamfnd[2468]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131599, SupervisionTime = 60
Jan 3 15:28:28 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: NO Implementer disconnected
34 <456, 2020f> (MsgQueueService131343)
Jan 3 15:28:28 SLES-64BIT-SLOT2 osafimmnd[2388]: WA IMMND - Client Node Get
Failed for cli_hdl 94489412111
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init
Becoming an applier failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafntfimcnd[3021]: ER ntfimcn_imm_init() Fail
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for
message type 6 - ignoring
Jan 3 15:28:29 SLES-64BIT-SLOT2 osafimmnd[2388]: WA ERR_BAD_HANDLE: Client
1967095153167 not found in server
Jan 3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:30 SLES-64BIT-SLOT2 osafntfimcnd[3042]: ER ntfimcn_imm_init() Fail
Jan 3 15:28:31 SLES-64BIT-SLOT2 kernel: [ 198.527931] md: stopping all md
devices.
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA MDS Send Failed
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafimmnd[2388]: WA Error code 2 returned for
message type 40 - ignoring
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init
saImmOiInitialize_2 failed SA_AIS_ERR_TIMEOUT (5)
Jan 3 15:28:31 SLES-64BIT-SLOT2 osafntfimcnd[3045]: ER ntfimcn_imm_init() Fail
3) AMFD received ERR_TIMEOUT from IMM. Here SLOT1(SC-1) is the active
controller at the beginning of the test
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmd[3806]: NO Coord re-elected, resides
at 2010f
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO This IMMND re-elected
coord redundantly, failover ?
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected
25 <4, 2010f> (@safLogService)
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 28
(safClmService) <15, 2010f>
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer connected: 29
(safLogService) <4, 2010f>
Jan 3 15:25:06 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting remote node in the
absence of PLM is outside the scope of OpenSAF
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafrded[3787]: NO rde_rde_set_role: role set
to 1
Jan 3 15:25:06 SLES-64BIT-SLOT1 osafclmd[3860]: NO ACTIVE request
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: ER FAILOVER Active -->
Quiesced FAILED, ImplementerClear failed 5
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfd[3882]: role.cc:583:
avd_mds_qsd_role_evh: Assertion '0' failed.
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: ER AMF director unexpectedly
crashed
Jan 3 15:25:13 SLES-64BIT-SLOT1 osafamfnd[3892]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
received, OwnNodeId = 131343, SupervisionTime = 60
Jan 3 15:25:13 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node;
timeout=60
Jan 3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer locally
disconnected. Marking it as doomed 4 <21, 2010f> (safAmfService)
Jan 3 15:25:14 SLES-64BIT-SLOT1 osafimmnd[3816]: NO Implementer disconnected 4
<21, 2010f> (safAmfService)
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1471.089956] md: stopping all md
devices.
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.120172] sd 0:0:0:0: [sda]
Synchronizing SCSI cache
Jan 3 15:25:17 SLES-64BIT-SLOT1 kernel: [ 1472.219424] ohci_hcd 0000:00:06.0:
PCI INT A disabled
Jan 3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER clms_mds_msg_send FAILED: 2
Jan 3 15:25:17 SLES-64BIT-SLOT1 osafclmd[3860]: ER
clms_clma_api_msg_dispatcher FAILED: type 0
There are no traces enabled when issue(1) is observed. Issue(3) could be the
same issue as #405.
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets