Now that we've established that the TTL passed into the server-create call is for reaping idle connections and not individual operation timeouts, I want to ask about timing out individual operations.
If memcached freezes, then it appears my calls to 'get' will block until memcached wakes up. Is there any way to set a timeout for that call? I can repro this in my unit tests by sending a SIGSTOP to memcached before doing a 'get'. -Josh On Thu, Sep 27, 2012 at 4:37 PM, Joshua Marantz <jmara...@google.com> wrote: > This helps a lot. I think 600 seconds seems like a fine idle-reap timeout. > > I need to investigate why some lookups take a second or more. Maybe > there's a mutex contention on my end somewhere. > > Thanks! > -Josh > > > > On Thu, Sep 27, 2012 at 2:08 PM, Jeff Trawick <traw...@gmail.com> wrote: > >> On Thu, Sep 27, 2012 at 1:55 PM, Joshua Marantz <jmara...@google.com> >> wrote: >> > That one call-site is HTTP_24/src/modules/cache/mod_socache_memcache.c, >> > right? That was where I stole my args from. >> >> no, subversion >> >> > As the TCP/IP layer is a lower level abstraction than bathe apr_memcache >> > interface, I'm still not clear on exactly what that means. Does a >> value of >> > 600 mean that a single multiget must complete in 600 microseconds >> otherwise >> > it fails with APR_TIMEUP? >> >> ttl only affects connections which are not currently used; it does not >> control I/O timeouts >> >> >> > That might explain the behavior I saw. >> > >> > I've now jacked that up by x1e6 to 600 seconds and I don't see >> timeouts, >> > but I'm hoping someone can bridge the gap between the socket-level >> > explanation and the apr_memcache API call. >> > >> > I was assuming that apr_memcache created the TCP/IP connection when I >> called >> > apr_memcache_server_create, and there even 600 seconds seems too short. >> Is >> > the functionality more like it will create connections on-demand and >> leave >> > them running for N microseconds, re-using the connection for multiple >> > requests until TTL microseconds have elapsed since creation? >> >> create on demand >> reuse existing idle connections when possible >> when performing maintenance on the idle connections, clean up any >> which were idle for N microseconds >> >> If a connection is always reused before it is idle for N microseconds, >> it will live as long as memcached allows. >> >> > If that's the case then I guess that every 10 minutes one of my cache >> > lookups may have high latency to re-establish the connection, is that >> right? >> > I've been histogramming this under load and seeing some long tail >> requests >> > with very high latency. My median latency is only 143us which is great. >> > My 90%, 95% and 99% are all around 5ms, which is fine as well. But >> I've got >> > a fairly significant number of long-tail lookups that take hundreds of >> ms or >> > even seconds to finish, and one crazy theory is that this is all >> reconnect >> > cost. >> > >> > It would be nice if the TTL were interpreted as a maximum idle time >> before >> > the connection is reaped, rather than stuttering response-time on a very >> > active channel. >> >> It is. The ttl is interpreted by the reslist layer, which won't touch >> objects until they're returned to the list. >> >> > >> > This testing is all using a single memcached running on localhost. >> > >> > -Josh >> > >> > >> > On Thu, Sep 27, 2012 at 11:24 AM, Jeff Trawick <traw...@gmail.com> >> wrote: >> >> >> >> On Thu, Sep 27, 2012 at 11:15 AM, Joshua Marantz <jmara...@google.com> >> >> wrote: >> >> > On Thu, Sep 27, 2012 at 10:58 AM, Ben Noordhuis <i...@bnoordhuis.nl> >> >> > wrote: >> >> >> >> >> >> If dlsym() is called with the special handle NULL, it is >> interpreted >> >> >> as >> >> >> a >> >> >> reference to the executable or shared object from which the call >> is >> >> >> being >> >> >> made. Thus a shared object can reference its own symbols. >> >> >> >> >> >> And that's how it works on Linux, Solaris, NetBSD and probably >> OpenBSD >> >> >> as >> >> >> well. >> >> > >> >> > >> >> > Cool, thanks. >> >> >> >> >> >> > Do you have a feel for the exact meaning of that TTL parameter to >> >> >> > apr_memcache_server_create? >> >> >> >> >> >> You mean what units it uses? Microseconds (at least, in 2.4). >> >> > >> >> > >> >> > Actually what I meant was what that value is used for in the library. >> >> > The >> >> > phrase "time to live of client connection" confuses me. Does it >> really >> >> > mean >> >> > "the maximum number of seconds apr_memcache is willing to wait for a >> >> > single >> >> > operation? Or does it mean *both*, implying that a fresh TCP/IP >> >> > connection >> >> > is made for every new operation, but will stay alive for only a >> certain >> >> > number of seconds. >> >> >> >> TCP/IP connections, once created, will be retained for the specified >> >> (ttl) number of seconds. They'll be created when needed. >> >> >> >> The socket connect timeout is hard-coded to 1 second, and there's no >> >> timeout for I/O. >> >> >> >> > >> >> > >> >> > It is a little disturbing from a module-developer perspective to have >> >> > the >> >> > meaning of that parameter change by a factor of 1M between versions. >> >> > Would >> >> > it be better to revert the recent change and instead change the doc >> to >> >> > match >> >> > the current behavior? >> >> >> >> The doc was already changed to match the behavior, but I missed that. >> >> The caller I know of used the wrong unit, and I'll submit a patch to >> >> fix that in the caller, as well as revert my screw-up from yesterday. >> >> >> >> > >> >> > -Josh >> >> > >> >> >> >> >> >> >> >> -- >> >> Born in Roswell... married an alien... >> >> http://emptyhammock.com/ >> > >> > >> >> >> >> -- >> Born in Roswell... married an alien... >> http://emptyhammock.com/ >> > >