Thanks, see below. /Hans
> -----Original Message----- > From: Kumar Nagendra-G20235 [mailto:[EMAIL PROTECTED] > The log string "faulted due to 6 -rcvr=9" transltes to > - 6 being AVND_ERR_SRC_CBK_HC_TIMEOUT(AMF health check > callback times out), 9 being AVSV_ERR_RCVR_SU_FAILOVER. It > says that since the component was not able to respond to the > health check response, the SU of that componentt failed. This > will only happen when the system is heavily loaded and MAS > and other components are not getting time to respond in 2 sec > (health check value of MAS,MQD components) especially in a PC > environment. You need to fine tune these components in the > BOM file as per you configurations. It may happen during failover. But what if the component is "hanging" in an I/O operation? An operation that will not succeed before the healtcheck timeout since the replicated parition is not available. I could try to increase the healthcheck timeout for MQD a lot and see what happens. > There should n't be any connection between DRBD and OpenSAF > failover timings. But the replicated partition is unavailable for some time! OpenSAF configuration (pssv_store) and logs are stored there. A lot of writing to the replicated partition will take place during fail-over I assume. I think there is connection between DRBD OpenSAF fail-over. > Actually, openSAF has control over DRBD for > failover using PDRBD. Not currently in our setup. > Send all the logs using the script attached. I have no interesting logs to send. The DTS logs are empty from the time of failover (since the replicated partition is unavailable?). > I would like to know what is your approach to test failover. I just did 'pkill cpd' on the active controller. _______________________________________________ Users mailing list [email protected] http://list.opensaf.org/maillist/listinfo/users
