We are using version 5.3.0 with a shared file system master slave configuration 
and using persistence messaging with client acknowledgements. A NFSV4 mount 
point is used for both the lock file and the persistent storage. KahaDB is 
being used as the persistence adaptor.

We have encountered issues where the broker does not failover gracefully 
whenever there is a problem with the NFS server. The most reliable test case I 
have come up with is starting and stopping the NFS server. When the NFS server 
is restarted one of the slaves acquires the lock and become a master, but the 
original master stays active and listening for connections. Clients can 
successfully connect to it and subscribe to queues (but no messages get 
dispatched) and enqueues hang until there is a timeout on the socket. 
Connections that go to the new master work. Hence the questions:

        Why was the lock released? Shouldn't it have been retained?

       Why isn't the original master dispatching messages and blocking sends?

I have seen other issues but have not been able to reproduce them reliably,

        * NFS timeout due to a DNS issue
        * Possible Linux kernel bug. Problem arrises when /var/log/messages: 
kernel: decode_op_hdr: reply buffer overflowed in line 2121.<6>      blocks= 
585871964 block_size= 512

Any help would be appreciated.

Thanks

Josh

Reply via email to