Hans, Adding onto Phani's comment. I don't think the patch has to do anything with the behavior you are seeing. It appears like process starvation. Could you try a grep for "HB loss" on all the OpenSAF log files and see if you get?
Thanks Sayan > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Marella P-G19460 > Sent: Wednesday, January 16, 2008 7:20 AM > To: Hans Feldt; [email protected] > Subject: Re: [Users] Concurrent AMF healthcheck timeouts > using 1.0-4withoutpatch > > > Hans, > > For problem isolation, another thing worth trying may be to > increase (or > decrease) the component healthcheck timeouts. For example, > the healtcheck timeout configuration (which is distinct from > "rcvHbInt") looks like the following for MQD: > > <csiPrototypeName>safCsi=Csi_MQD</csiPrototypeName> > ... > <healthCheck key="E5F6"> > <period>2000</period> > <maxDuration>1000</maxDuration> > ... > > Phani > > PS: Some testing/experimentation is on at our side as well. > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Hans Feldt > Sent: Wednesday, January 16, 2008 2:57 PM > To: [email protected] > Subject: [Users] Concurrent AMF healthcheck timeouts using > 1.0-4 withoutpatch > > We have seen a problem that looks pretty bad. AMF reports > health check timeout for a couple of components > simultaneously. Since there is probably nothing wrong with > the components the possible reasons for this could be: > - the missing patch (soon to be integrated) > - AMF stops sending health checks > - MDS/TIPS hang-up > > Syslog excerpt: > > Jan 10 11:12:52 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot > -safComp=CompT_MQD,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6 > -rcvr=9 > Jan 10 11:12:52 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot > -safComp=CompT_GLD,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6 > -rcvr=9 > Jan 10 11:12:57 SC_2_1 ncs_scap: NCS_AvSv: Card going for > reboot - Some one has reset this card Jan 10 11:12:58 SC_2_1 > shutdown[12802]: shutting down for system reboot Jan 10 > 11:13:04 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot > -safComp=CompT_MAS,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6 > -rcvr=9 > Jan 10 11:13:06 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot > -safComp=CompT_EDS,safSu=SuT_EDS,safNode=SC_2_1 faulted due > to 6 -rcvr=9 Jan 10 11:13:39 SC_2_1 ncs_scap: NCS_AvSv: Card > going for reboot > -safComp=CompT_DTS,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6 > -rcvr=9 > Jan 10 10:13:53 SC_2_1 init: Switching to runlevel: 6 Jan 10 > 11:13:54 SC_2_1 shutdown: THE SYSTEM IS SHUTTING DOWN > > No core dumps, nothing more interesting than this. > > The problem has been seen once, maybe twice. Our application > was running on the payloads using check points and events as > mentioned before. The processor load was probably 50-60% on > all processors (controllers and payloads). In order to be > able to run with 60% load, we doubled the rcHbInt to 6s in BOM.xml. > > I will try to generate debug info in the > /etc/opt/opensaf/reboot script (change from symlink to > script) that is called by OpenSAF. This would be helpful if > the problem is seen again. > > What is your opinion? > > Regards, > Hans > _______________________________________________ > Users mailing list > [email protected] > http://list.opensaf.org/maillist/listinfo/users > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit > http://www.messagelabs.com/email > ______________________________________________________________________ > _______________________________________________ > Users mailing list > [email protected] > http://list.opensaf.org/maillist/listinfo/users > ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ _______________________________________________ Users mailing list [email protected] http://list.opensaf.org/maillist/listinfo/users
