On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jmara...@google.com> wrote:

> ping!
>
> Please don't hesitate to push back and tell me if I can supply the patch or
> update in some easier-to-digest form.  In particular, while I have
> rigorously stress-tested this change using mod_pagespeed's unit test,
> system-test, and load-test framework, I don't really understand what the
> testing flow is for APR.  I'd be happy to add unit-tests for that if
> someone points me to a change-list or patch-file that does it properly.
>
> -Josh
>

I'll try hard to work on this in the next couple of days.  It would be
great to have fixes in APR-Util 1.5.x, which we hope to work on later this
week.


>
> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jmara...@google.com>
> wrote:
>
> > I have completed a solution to this problem, which can be a drop-in
> update
> > for the existing apr_memcache.c.  It is now checked in for my module as
> >
> http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
> > .
> >
> > It differs from the solution in
> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
> >
> >    - It doesn't require an API change; it but it enforces the 50ms
> >    timeout that already exists for apr_multgetp for all operations.
> >    - It works under my load test (which I found is not true of the patch
> >    in 51065).
> >
> > For my own purposes, I will be shipping my module with apr_memcache2 so I
> > get the behavior I want regardless of what version of Apache is
> installed.
> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
> > attached, and I've also submitted it as an alternative patch to bug
> 51065.
> >
> > If you agree with the strategy I used to solve this problem, then please
> > let me know if I can help with any changes required to get this into the
> > main distribution,
> >
> >
> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jmara...@google.com
> >wrote:
> >
> >> I've had some preliminary success with my own variant of apr_memcache.c
> >> (creatively called apr_memcache2.c).  Rather than setting the socket
> >> timeout, I've been mimicing the timeout strategy I saw in
> >> apr_memcache_multgetp, by adding a new helper method:
> >>
> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
> >>                                                apr_memcache2_conn_t*
> >> conn) {
> >>     apr_pollset_t* pollset;
> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
> >>     if (rv == APR_SUCCESS) {
> >>         apr_pollfd_t pollfd;
> >>         pollfd.desc_type = APR_POLL_SOCKET;
> >>         pollfd.reqevents = APR_POLLIN;
> >>         pollfd.p = temp_pool;
> >>         pollfd.desc.s = conn->sock;
> >>         pollfd.client_data = NULL;
> >>         apr_pollset_add(pollset, &pollfd);
> >>         apr_int32_t queries_recvd;
> >>         const apr_pollfd_t* activefds;
> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd,
> >>                               &activefds);
> >>         if (rv == APR_SUCCESS) {
> >>             assert(queries_recvd == 1);
> >>             assert(activefds->desc.s == conn->sock);
> >>             assert(activefds->client_data == NULL);
> >>         }
> >>     }
> >>     return rv;
> >> }
> >>
> >> And calling that before many of the existing calls to get_server_line
> as:
> >>
> >>     rv = wait_for_server_or_timeout_no_pool(conn);
> >>     if (rv != APR_SUCCESS) {
> >>         ms_release_conn(ms, conn);
> >>         return rv;
> >>     }
> >>
> >> This is just an experiment; I think I can streamline this by
> >> pre-populating the pollfd structure as part of the apr_memcache_conn_t
> >> (actually now apr_memcache2_conn_t).
> >>
> >> I have two questions about this:
> >>
> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is
> >> somewhat different from the one that ships with Apache 2.2.  In
> particular
> >> the 2.4 version cannot be compiled against the headers that come with a
> 2.2
> >> distribution.  Is there any downside to taking my hacked 2.2
> apr_memcache.c
> >> and running it in Apache 2.4?  Or should I maintain two hacks?
> >>
> >> 2. This seems wasteful in terms of system calls.  I am making an extra
> >> call to poll, rather than relying on the socket timeout.  The socket
> >> timeout didn't work as well as this though.  Does anyone have any
> theories
> >> as to why, or what could be done to the patch in
> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
> >>
> >> -Josh
> >>
> >>
> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jmara...@google.com
> >wrote:
> >>
> >>> Following up: I tried doing what I suggested above: patching that
> change
> >>> into my own copy of apr_memcache.c  It was first of all a bad idea to
> pull
> >>> in only part of apr_memcache.c because that file changed slightly
> between
> >>> 2.2 and 2.4 and our module works in both.
> >>>
> >>> I was successful making my own version of apr_memcache (renaming
> >>> entry-points apr_memcache2*) that I could hack.  But if I changed the
> >>> socket timeout from -1 to 10 seconds, then the system behaved very
> poorly
> >>> under load test (though it worked fine in our unit-tests and
> system-tests).
> >>>  In other words, I think the proposed patch that Jeff pointed to above
> is
> >>> not really working (as he predicted).  This test was done without
> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
> >>> thereafter behave poorly.
> >>>
> >>> I'm going to follow up on that bugzilla entry, but for now I'm going to
> >>> pursue my own complicated mechanism of timing out the calls from my
> side.
> >>>
> >>> -Josh
> >>>
> >>>
> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jmara...@google.com
> >wrote:
> >>>
> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
> >>>> action and before doing any work toward this, I'd like to understand
> the
> >>>> pitfalls from people that understand Apache better than us.
> >>>>
> >>>> Here's our reality: we believe we need to incorporate memcached for
> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
> >>>> large sites & hosting providers.  We are fairly close (we think) to
> >>>> releasing this functionality as beta.  However, in such large sites,
> stuff
> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
> does we
> >>>> want to fall back to serving partially unoptimized sites rather than
> >>>> hanging the servers.
> >>>>
> >>>> I understand the realities of backward-compatible APIs.  My
> expectation
> >>>> is that this would take years to make it into an APR distribution we
> could
> >>>> depend on.  We want to deploy this functionality in weeks.  The
> workarounds
> >>>> we have tried backgrounding the apr_memcache calls in a thread and
> timing
> >>>> out in mainline are complex and even once they work 100% will be very
> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
> >>>> 'apachectl stop') if this happens more than (say) once a month.
> >>>>
> >>>> Our plan is to copy the patched implementation of
> >>>> apr_memcache_server_connect and the static methods it calls into a
> new .c
> >>>> file we will link into our module, naming the new entry-point
> something
> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
> >>>> CS/SE perspective this is offensive and we admit it, but from a
> product
> >>>> quality perspective we believe this beats freezes and
> complicated/imperfect
> >>>> workarounds with threads.
> >>>>
> >>>> So I have two questions for the Apache community:
> >>>>
> >>>>    1. What are the practical problems with this approach?  Note that
> >>>>    in any case a new APR rev would require editing/ifdefing our code
> anyway,
> >>>>    so I think immunity from APR updates such as this patch being
> applied is
> >>>>    not a distinguishing drawback.
> >>>>    2. Is there an example of the correct solution to the technical
> >>>>    problem Jeff highlighted: "it is apparently missing a call to
> >>>>    adjust the socket timeout and to discard the connection if the
> >>>>    timeout is reached".  That sounds like a pattern that might be
> >>>>    found elsewhere in the Apache HTTPD code base.
> >>>>
> >>>> Thanks in advance for your help!
> >>>> -Josh
> >>>>
> >>>>
> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <traw...@gmail.com
> >wrote:
> >>>>
> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jmara...@google.com
> >
> >>>>> wrote:
> >>>>> > Is there a mechanism to time out individual operations?
> >>>>>
> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
> >>>>> socket I/O timeout is disabled.
> >>>>>
> >>>>> Bugzilla bug
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
> >>>>> has a patch, though it is apparently missing a call to adjust the
> >>>>> socket timeout and to discard the connection if the timeout is
> >>>>> reached.  More importantly, the API can only be changed in future APR
> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
> >>>>> APR-Util 1.6.
> >>>>>
> >>>>> >
> >>>>> > If memcached freezes, then it appears my calls to 'get' will block
> >>>>> until
> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
> call?
> >>>>> >
> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached
> >>>>> before
> >>>>> > doing a 'get'?
> >>>>> >
> >>>>> > Here are my observations:
> >>>>> >
> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
> SIGSTOP
> >>>>> the
> >>>>> > memcached process. Yes!
> >>>>> >
> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
> >>>>> memcached
> >>>>> > process.
> >>>>> >
> >>>>> > That last one really sucks as I am using that to print the status
> of
> >>>>> all my
> >>>>> > cache shards to the log file if I detected a problem :(
> >>>>> >
> >>>>> >
> >>>>> > Why does apr_memcache_multgetp do what I want and not the others?
> >>>>>  Can I
> >>>>> > induce the others to have reasonable timeout behavior?
> >>>>> >
> >>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache,
> >>>>> at
> >>>>> > least with graceful-stop.
> >>>>> >
> >>>>> >
> >>>>> > On a related note, the apr_memcache
> >>>>> > documentation<
> >>>>>
> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
> >>>>> >is
> >>>>> > very thin.  I'd be happy to augment it with my observations on its
> >>>>> > usage
> >>>>> > and the meaning of some of the arguments if that was desired.  How
> >>>>> would I
> >>>>> > go about that?
> >>>>>
> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
> >>>>> include files, build (make dox) and inspect the results, submit a
> >>>>> patch to dev@apr.apache.org.
> >>>>>
> >>>>> >
> >>>>> > -Josh
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Born in Roswell... married an alien...
> >>>>> http://emptyhammock.com/
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Reply via email to