Public bug reported: When an IO error is encountered the server state is reset to MEMCACHED_SERVER_STATE_NEW even if it is currently MEMCACHED_SERVER_STATE_IN_TIMEOUT. The call to memcached_mark_server_for_timeout will then incorrectly push the next connection retry time further back and further increment the server failure counter. This throws out the connection back-off handling as it appears there has been another failure when in fact we're just dealing with an in-progress failure so to speak.
This may only manifest itself as a problem when using consistent distribution due to the point at which the continuum is recalculated - I haven't tested with any of the other distribution options. It's probably also more obviously a problem when making use of the dead server retry behaviour included in 1.0.3+. In a nutshell it should be possible to observe that retries do not occur at the expected intervals and failure counts are not accurate after a server in the pool is taken offline. I patched io.cc and quit.cc to work around this as part of the following commit to my branch: http://bazaar.launchpad.net/~trevor/libmemcached /dead-retry/revision/978 This may well be fixing the symptom rather than the cause, but I have had the change running in production for quite some time now with no apparent side-effects. I do understand that those changes cause at least some of the tests to fail though which certainly warrants further investigation. I've been meaning to find the time to put together a proper example test case and results for this but that has been proving impossible of late. I still wanted to get the issue logged though - please let me know if I've not been clear enough here or can provide any more useful information. ** Affects: libmemcached (Ubuntu) Importance: Undecided Status: New ** Description changed: When an IO error is encountered the server state is reset to MEMCACHED_SERVER_STATE_NEW even if it is currently MEMCACHED_SERVER_STATE_IN_TIMEOUT. The call to memcached_mark_server_for_timeout will then incorrectly push the next - connection retry time further back and increment the server failure - counter to be further incremented. This throws out the connection back- - off handling as it appears there has been another failure when in fact - we're just dealing with an in-progress failure so to speak. + connection retry time further back and further increment the server + failure counter. This throws out the connection back-off handling as it + appears there has been another failure when in fact we're just dealing + with an in-progress failure so to speak. This may only manifest itself as a problem when using consistent distribution due to the point at which the continuum is recalculated - I haven't tested with any of the other distribution options. It's probably also more obviously a problem when making use of the dead server retry behaviour included in 1.0.3+. In a nutshell it should be possible to observe that retries do not occur at the expected intervals and failure counts are not accurate after a server in the pool is taken offline. I patched io.cc and quit.cc to work around this as part of the following commit to my branch: http://bazaar.launchpad.net/~trevor/libmemcached /dead-retry/revision/978 This may well be fixing the symptom rather than the cause, but I have had the change running in production for quite some time now with no apparent side-effects. I do understand that those changes cause at least some of the tests to fail though which certainly warrants further investigation. I've been meaning to find the time to put together a proper example test case and results for this but that has been proving impossible of late. I still wanted to get the issue logged though - please let me know if I've not been clear enough here or can provide any more useful information. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/931696 Title: Server timeout state reset on IO error To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libmemcached/+bug/931696/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs