- **summary**: Imm: Immfind failed with TRY_AGAIN after immnd is killed on 
payload PL-3 and on active controller --> Imm: Imm service is down FOREVER on 
the nodes after IMMND restart , due to system issues
- **Comment**:

Changing the ticket heading for better understanding



---

** [tickets:#1780] Imm: Imm service is down FOREVER on the nodes after IMMND 
restart , due to system issues**

**Status:** invalid
**Milestone:** 5.0.RC2
**Created:** Mon Apr 25, 2016 11:46 AM UTC by Madhurika Koppula
**Last Updated:** Tue Apr 26, 2016 06:54 AM UTC
**Owner:** Neelakanta Reddy
**Attachments:**

- [imm.tgz](https://sourceforge.net/p/opensaf/tickets/1780/attachment/imm.tgz) 
(13.2 kB; application/octet-stream)


Setup:
Changeset- 7436
OS: Oracle Linux Server release 6.4 (x86_64)
Version - opensaf 5.0
4 nodes configured with single PBE

Immfind is failed with TRY_AGAINS after immnd is killed on PL-3 and on active 
controller.
Imm admin operations are still failing forever on PL-3 and SC-1 (Active) even 
though immnd got restarted properly on PL-3 and SC-1.
(Initialize itself is failing ).

Steps To reproduce:

1) Kill Immnd on Active and PL-3
2)Perform any imm admin operations.

Here is the snippet.
[root@OEL_M-SLOT-3 log]# immfind
error - saImmOmInitialize FAILED: SA_AIS_ERR_TRY_AGAIN (6)
[root@OEL_M-SLOT-3 log]#

1st killed IMMND on ACTIVE controller at below timestamp:

Apr 25 11:48:52 OEL_M-SLOT-1 osafntfimcnd[9124]: NO saImmOiDispatch() Fail 
SA_AIS_ERR_BAD_HANDLE (9)
Apr 25 11:48:52 OEL_M-SLOT-1 osafimmd[1716]: WA IMMND coordinator at 2010f 
apparently crashed => electing new coord
Apr 25 11:48:52 OEL_M-SLOT-1 osafimmd[1716]: NO New coord elected, resides at 
2020f
Apr 25 11:48:53 OEL_M-SLOT-1 osafamfnd[1796]: NO Restarting a component of 
'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Apr 25 11:48:53 OEL_M-SLOT-1 osafamfnd[1796]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: Started
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmd[1716]: NO New IMMND process is on ACTIVE 
Controller at 2010f

2nd killed IMMND on ACTIVE controller at below timestamp:

Apr 25 14:44:52 OEL_M-SLOT-1 osafamfnd[1796]: NO 
'safComp=IMMND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: Started
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO Persistent Back-End 
capability configured, Pbe file:imm.db (suffix may get added)
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO IMMD service is UP ... 
ScAbsenseAllowed?:0 introduced?:0
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: NO New IMMND process is on ACTIVE 
Controller at 2010f
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: NO Extended intro from node 2010f
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SETTING COORD TO 0 CLOUD PROTO
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmd[1716]: WA IMMND on controller (not 
currently coord) requests sync
Apr 25 14:44:53 OEL_M-SLOT-1 osafimmnd[15848]: NO NODE STATE-> IMM_NODE_ISOLATED


Killed IMMND on PL-3 at below time stamp:

Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 60000000000 ns)
Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO Restarting a component of 
'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Apr 25 12:11:26 OEL_M-SLOT-3 osafamfnd[2415]: NO 
'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' 
: Recovery is 'componentRestart'

Attaching the log snippets of Immnd active and immnd PL-3 and /var/log/messages.

This issue might be related to the ticket #1735, because node state of immnd of 
PL-3 is also observed as IMM_NODE_ISOLATED.  But immfind did not suceed for 
ever on SC-1 Active even though immnd restarted successfully on SC-1 at below 
timestamp

Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmnd[10126]: NO NODE STATE-> IMM_NODE_ISOLATED
Apr 25 11:48:53 OEL_M-SLOT-1 osafimmd[1716]: NO Node 2010f request sync 
sync-pid:10126 epoch:0
Apr 25 11:48:54 OEL_M-SLOT-1 osafimmnd[10126]: NO NODE STATE-> 
IMM_NODE_W_AVAILABLE
Apr 25 11:48:54 OEL_M-SLOT-1 osafimmd[1716]: NO Successfully announced sync. 
New ruling epoch:536





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to