On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jmara...@google.com> wrote:
> ping! > > Please don't hesitate to push back and tell me if I can supply the patch or > update in some easier-to-digest form. In particular, while I have > rigorously stress-tested this change using mod_pagespeed's unit test, > system-test, and load-test framework, I don't really understand what the > testing flow is for APR. I'd be happy to add unit-tests for that if > someone points me to a change-list or patch-file that does it properly. > > -Josh > I'll try hard to work on this in the next couple of days. It would be great to have fixes in APR-Util 1.5.x, which we hope to work on later this week. > > On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jmara...@google.com> > wrote: > > > I have completed a solution to this problem, which can be a drop-in > update > > for the existing apr_memcache.c. It is now checked in for my module as > > > http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c > > . > > > > It differs from the solution in > > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that: > > > > - It doesn't require an API change; it but it enforces the 50ms > > timeout that already exists for apr_multgetp for all operations. > > - It works under my load test (which I found is not true of the patch > > in 51065). > > > > For my own purposes, I will be shipping my module with apr_memcache2 so I > > get the behavior I want regardless of what version of Apache is > installed. > > But I'd like to propose my patch for apr_memcache.c. The patch is > > attached, and I've also submitted it as an alternative patch to bug > 51065. > > > > If you agree with the strategy I used to solve this problem, then please > > let me know if I can help with any changes required to get this into the > > main distribution, > > > > > > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jmara...@google.com > >wrote: > > > >> I've had some preliminary success with my own variant of apr_memcache.c > >> (creatively called apr_memcache2.c). Rather than setting the socket > >> timeout, I've been mimicing the timeout strategy I saw in > >> apr_memcache_multgetp, by adding a new helper method: > >> > >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool, > >> apr_memcache2_conn_t* > >> conn) { > >> apr_pollset_t* pollset; > >> apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0); > >> if (rv == APR_SUCCESS) { > >> apr_pollfd_t pollfd; > >> pollfd.desc_type = APR_POLL_SOCKET; > >> pollfd.reqevents = APR_POLLIN; > >> pollfd.p = temp_pool; > >> pollfd.desc.s = conn->sock; > >> pollfd.client_data = NULL; > >> apr_pollset_add(pollset, &pollfd); > >> apr_int32_t queries_recvd; > >> const apr_pollfd_t* activefds; > >> rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd, > >> &activefds); > >> if (rv == APR_SUCCESS) { > >> assert(queries_recvd == 1); > >> assert(activefds->desc.s == conn->sock); > >> assert(activefds->client_data == NULL); > >> } > >> } > >> return rv; > >> } > >> > >> And calling that before many of the existing calls to get_server_line > as: > >> > >> rv = wait_for_server_or_timeout_no_pool(conn); > >> if (rv != APR_SUCCESS) { > >> ms_release_conn(ms, conn); > >> return rv; > >> } > >> > >> This is just an experiment; I think I can streamline this by > >> pre-populating the pollfd structure as part of the apr_memcache_conn_t > >> (actually now apr_memcache2_conn_t). > >> > >> I have two questions about this: > >> > >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is > >> somewhat different from the one that ships with Apache 2.2. In > particular > >> the 2.4 version cannot be compiled against the headers that come with a > 2.2 > >> distribution. Is there any downside to taking my hacked 2.2 > apr_memcache.c > >> and running it in Apache 2.4? Or should I maintain two hacks? > >> > >> 2. This seems wasteful in terms of system calls. I am making an extra > >> call to poll, rather than relying on the socket timeout. The socket > >> timeout didn't work as well as this though. Does anyone have any > theories > >> as to why, or what could be done to the patch in > >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work? > >> > >> -Josh > >> > >> > >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jmara...@google.com > >wrote: > >> > >>> Following up: I tried doing what I suggested above: patching that > change > >>> into my own copy of apr_memcache.c It was first of all a bad idea to > pull > >>> in only part of apr_memcache.c because that file changed slightly > between > >>> 2.2 and 2.4 and our module works in both. > >>> > >>> I was successful making my own version of apr_memcache (renaming > >>> entry-points apr_memcache2*) that I could hack. But if I changed the > >>> socket timeout from -1 to 10 seconds, then the system behaved very > poorly > >>> under load test (though it worked fine in our unit-tests and > system-tests). > >>> In other words, I think the proposed patch that Jeff pointed to above > is > >>> not really working (as he predicted). This test was done without > >>> SIGSTOPing the memcached; it would timeout under our load anyway and > >>> thereafter behave poorly. > >>> > >>> I'm going to follow up on that bugzilla entry, but for now I'm going to > >>> pursue my own complicated mechanism of timing out the calls from my > side. > >>> > >>> -Josh > >>> > >>> > >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jmara...@google.com > >wrote: > >>> > >>>> Thanks Jeff, that is very helpful. We are considering a course of > >>>> action and before doing any work toward this, I'd like to understand > the > >>>> pitfalls from people that understand Apache better than us. > >>>> > >>>> Here's our reality: we believe we need to incorporate memcached for > >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very > >>>> large sites & hosting providers. We are fairly close (we think) to > >>>> releasing this functionality as beta. However, in such large sites, > stuff > >>>> goes wrong: machines crash, power failure, fiber cut, etc. When it > does we > >>>> want to fall back to serving partially unoptimized sites rather than > >>>> hanging the servers. > >>>> > >>>> I understand the realities of backward-compatible APIs. My > expectation > >>>> is that this would take years to make it into an APR distribution we > could > >>>> depend on. We want to deploy this functionality in weeks. The > workarounds > >>>> we have tried backgrounding the apr_memcache calls in a thread and > timing > >>>> out in mainline are complex and even once they work 100% will be very > >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on > >>>> 'apachectl stop') if this happens more than (say) once a month. > >>>> > >>>> Our plan is to copy the patched implementation of > >>>> apr_memcache_server_connect and the static methods it calls into a > new .c > >>>> file we will link into our module, naming the new entry-point > something > >>>> else (apr_memcache_server_connect_with_timeout seems good). From a > >>>> CS/SE perspective this is offensive and we admit it, but from a > product > >>>> quality perspective we believe this beats freezes and > complicated/imperfect > >>>> workarounds with threads. > >>>> > >>>> So I have two questions for the Apache community: > >>>> > >>>> 1. What are the practical problems with this approach? Note that > >>>> in any case a new APR rev would require editing/ifdefing our code > anyway, > >>>> so I think immunity from APR updates such as this patch being > applied is > >>>> not a distinguishing drawback. > >>>> 2. Is there an example of the correct solution to the technical > >>>> problem Jeff highlighted: "it is apparently missing a call to > >>>> adjust the socket timeout and to discard the connection if the > >>>> timeout is reached". That sounds like a pattern that might be > >>>> found elsewhere in the Apache HTTPD code base. > >>>> > >>>> Thanks in advance for your help! > >>>> -Josh > >>>> > >>>> > >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <traw...@gmail.com > >wrote: > >>>> > >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jmara...@google.com > > > >>>>> wrote: > >>>>> > Is there a mechanism to time out individual operations? > >>>>> > >>>>> No, the socket connect timeout is hard-coded at 1 second and the > >>>>> socket I/O timeout is disabled. > >>>>> > >>>>> Bugzilla bug > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 > >>>>> has a patch, though it is apparently missing a call to adjust the > >>>>> socket timeout and to discard the connection if the timeout is > >>>>> reached. More importantly, the API can only be changed in future APR > >>>>> 2.0; alternate, backwards-compatible API(s) could be added in future > >>>>> APR-Util 1.6. > >>>>> > >>>>> > > >>>>> > If memcached freezes, then it appears my calls to 'get' will block > >>>>> until > >>>>> > memcached wakes up. Is there any way to set a timeout for that > call? > >>>>> > > >>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached > >>>>> before > >>>>> > doing a 'get'? > >>>>> > > >>>>> > Here are my observations: > >>>>> > > >>>>> > apr_memcache_multgetp seems to time out in bounded time if I > SIGSTOP > >>>>> the > >>>>> > memcached process. Yes! > >>>>> > > >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the > >>>>> memcached > >>>>> > process. > >>>>> > > >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the > >>>>> memcached > >>>>> > process. > >>>>> > > >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the > >>>>> memcached > >>>>> > process. > >>>>> > > >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the > >>>>> memcached > >>>>> > process. > >>>>> > > >>>>> > That last one really sucks as I am using that to print the status > of > >>>>> all my > >>>>> > cache shards to the log file if I detected a problem :( > >>>>> > > >>>>> > > >>>>> > Why does apr_memcache_multgetp do what I want and not the others? > >>>>> Can I > >>>>> > induce the others to have reasonable timeout behavior? > >>>>> > > >>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache, > >>>>> at > >>>>> > least with graceful-stop. > >>>>> > > >>>>> > > >>>>> > On a related note, the apr_memcache > >>>>> > documentation< > >>>>> > http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html > >>>>> >is > >>>>> > very thin. I'd be happy to augment it with my observations on its > >>>>> > usage > >>>>> > and the meaning of some of the arguments if that was desired. How > >>>>> would I > >>>>> > go about that? > >>>>> > >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the > >>>>> include files, build (make dox) and inspect the results, submit a > >>>>> patch to dev@apr.apache.org. > >>>>> > >>>>> > > >>>>> > -Josh > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Born in Roswell... married an alien... > >>>>> http://emptyhammock.com/ > >>>>> > >>>> > >>>> > >>> > >> > > > -- Born in Roswell... married an alien... http://emptyhammock.com/