Thanks, see below.

/Hans

> -----Original Message-----
> From: Kumar Nagendra-G20235 [mailto:[EMAIL PROTECTED] 

>        The log string "faulted due to 6 -rcvr=9" transltes to 
> -  6 being AVND_ERR_SRC_CBK_HC_TIMEOUT(AMF health check 
> callback times out),  9 being AVSV_ERR_RCVR_SU_FAILOVER. It 
> says that since the component was not able to respond to the 
> health check response, the SU of that componentt failed. This 
> will only happen when the system is heavily loaded and MAS 
> and other components are not getting time to respond in 2 sec 
> (health check value of MAS,MQD components) especially in a PC 
> environment. You need to fine tune these components in the 
> BOM file as per you configurations. It may happen during failover. 

But what if the component is "hanging" in an I/O operation? An operation
that will not succeed before the healtcheck timeout since the replicated
parition is not available.

I could try to increase the healthcheck timeout for MQD a lot and see
what happens.

> There should n't be any connection between DRBD and OpenSAF 
> failover timings.

But the replicated partition is unavailable for some time! OpenSAF
configuration (pssv_store) and logs are stored there. A lot of writing
to the replicated partition will take place during fail-over I assume. I
think there is connection between DRBD OpenSAF fail-over.

> Actually, openSAF has control over DRBD for 
> failover using PDRBD.

Not currently in our setup.

> Send all the logs using the script attached.

I have no interesting logs to send. The DTS logs are empty from the time
of failover (since the replicated partition is unavailable?).

> I would like to know what is your approach to test failover.

I just did 'pkill cpd' on the active controller.

_______________________________________________
Users mailing list
[email protected]
http://list.opensaf.org/maillist/listinfo/users

Reply via email to