Hans,

Adding onto Phani's comment. I don't think the patch has to do anything
with the behavior you are seeing. It appears like process starvation.
Could you try a grep for "HB loss" on all the OpenSAF log files and see
if you get?

Thanks
Sayan

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Marella P-G19460
> Sent: Wednesday, January 16, 2008 7:20 AM
> To: Hans Feldt; [email protected]
> Subject: Re: [Users] Concurrent AMF healthcheck timeouts 
> using 1.0-4withoutpatch
> 
> 
> Hans, 
> 
> For problem isolation, another thing worth trying may be to 
> increase (or
> decrease) the component healthcheck timeouts. For example, 
> the healtcheck timeout configuration (which is distinct from 
> "rcvHbInt") looks like the following for MQD:
> 
>     <csiPrototypeName>safCsi=Csi_MQD</csiPrototypeName> 
>     ... 
>     <healthCheck key="E5F6"> 
>     <period>2000</period> 
>     <maxDuration>1000</maxDuration>
>     ... 
> 
> Phani 
> 
> PS: Some testing/experimentation is on at our side as well. 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Hans Feldt
> Sent: Wednesday, January 16, 2008 2:57 PM
> To: [email protected]
> Subject: [Users] Concurrent AMF healthcheck timeouts using 
> 1.0-4 withoutpatch
> 
> We have seen a problem that looks pretty bad. AMF reports 
> health check timeout for a couple of components 
> simultaneously. Since there is probably nothing wrong with 
> the components the possible reasons for this could be:
> - the missing patch (soon to be integrated)
> - AMF stops sending health checks
> - MDS/TIPS hang-up
> 
> Syslog excerpt:
> 
> Jan 10 11:12:52 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
> -safComp=CompT_MQD,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6
> -rcvr=9
> Jan 10 11:12:52 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
> -safComp=CompT_GLD,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6
> -rcvr=9
> Jan 10 11:12:57 SC_2_1 ncs_scap: NCS_AvSv: Card going for 
> reboot - Some one has reset this card Jan 10 11:12:58 SC_2_1 
> shutdown[12802]: shutting down for system reboot Jan 10 
> 11:13:04 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
> -safComp=CompT_MAS,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6
> -rcvr=9
> Jan 10 11:13:06 SC_2_1 ncs_scap: NCS_AvSv: Card going for reboot
> -safComp=CompT_EDS,safSu=SuT_EDS,safNode=SC_2_1 faulted due 
> to 6 -rcvr=9 Jan 10 11:13:39 SC_2_1 ncs_scap: NCS_AvSv: Card 
> going for reboot
> -safComp=CompT_DTS,safSu=SuT_NCS_CNTLR,safNode=SC_2_1 faulted due to 6
> -rcvr=9
> Jan 10 10:13:53 SC_2_1 init: Switching to runlevel: 6 Jan 10 
> 11:13:54 SC_2_1 shutdown: THE SYSTEM IS SHUTTING DOWN
> 
> No core dumps, nothing more interesting than this.
> 
> The problem has been seen once, maybe twice. Our application 
> was running on the payloads using check points and events as 
> mentioned before. The processor load was probably 50-60% on 
> all processors (controllers and payloads). In order to be 
> able to run with 60% load, we doubled the rcHbInt to 6s in BOM.xml.
> 
> I will try to generate debug info in the 
> /etc/opt/opensaf/reboot script (change from symlink to 
> script) that is called by OpenSAF. This would be helpful if 
> the problem is seen again.
> 
> What is your opinion?
> 
> Regards,
> Hans
> _______________________________________________
> Users mailing list
> [email protected]
> http://list.opensaf.org/maillist/listinfo/users
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit 
> http://www.messagelabs.com/email 
> ______________________________________________________________________
> _______________________________________________
> Users mailing list
> [email protected]
> http://list.opensaf.org/maillist/listinfo/users
> 

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________
_______________________________________________
Users mailing list
[email protected]
http://list.opensaf.org/maillist/listinfo/users

Reply via email to