Re: apr_memcache operation timeouts

Jeff Trawick Thu, 14 Mar 2013 07:46:20 -0700

On Tue, Mar 12, 2013 at 1:56 PM, Jeff Trawick <traw...@gmail.com> wrote:
> On Mon, Mar 11, 2013 at 3:50 PM, Joshua Marantz <jmara...@google.com> wrote:
>>
>> ping!
>>
>> Please don't hesitate to push back and tell me if I can supply the patch
>> or
>> update in some easier-to-digest form.  In particular, while I have
>> rigorously stress-tested this change using mod_pagespeed's unit test,
>> system-test, and load-test framework, I don't really understand what the
>> testing flow is for APR.  I'd be happy to add unit-tests for that if
>> someone points me to a change-list or patch-file that does it properly.
>>
>> -Josh
>
>
> I'll try hard to work on this in the next couple of days.  It would be great
> to have fixes in APR-Util 1.5.x, which we hope to work on later this week.


Attached is a first pass at getting your patch to apr trunk.  Can you
confirm that I didn't break your tests?  (It passes the apr tests, but
that may not mean much.)



>
>>
>>
>> On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jmara...@google.com>
>> wrote:
>>
>> > I have completed a solution to this problem, which can be a drop-in
>> > update
>> > for the existing apr_memcache.c.  It is now checked in for my module as
>> >
>> > http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
>> > .
>> >
>> > It differs from the solution in
>> > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:
>> >
>> >    - It doesn't require an API change; it but it enforces the 50ms
>> >    timeout that already exists for apr_multgetp for all operations.
>> >    - It works under my load test (which I found is not true of the patch
>> >    in 51065).
>> >
>> > For my own purposes, I will be shipping my module with apr_memcache2 so
>> > I
>> > get the behavior I want regardless of what version of Apache is
>> > installed.
>> >  But I'd like to propose my patch for apr_memcache.c.  The patch is
>> > attached, and I've also submitted it as an alternative patch to bug
>> > 51065.
>> >
>> > If you agree with the strategy I used to solve this problem, then please
>> > let me know if I can help with any changes required to get this into the
>> > main distribution,
>> >
>> >
>> > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz
>> > <jmara...@google.com>wrote:
>> >
>> >> I've had some preliminary success with my own variant of apr_memcache.c
>> >> (creatively called apr_memcache2.c).  Rather than setting the socket
>> >> timeout, I've been mimicing the timeout strategy I saw in
>> >> apr_memcache_multgetp, by adding a new helper method:
>> >>
>> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
>> >>                                                apr_memcache2_conn_t*
>> >> conn) {
>> >>     apr_pollset_t* pollset;
>> >>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
>> >>     if (rv == APR_SUCCESS) {
>> >>         apr_pollfd_t pollfd;
>> >>         pollfd.desc_type = APR_POLL_SOCKET;
>> >>         pollfd.reqevents = APR_POLLIN;
>> >>         pollfd.p = temp_pool;
>> >>         pollfd.desc.s = conn->sock;
>> >>         pollfd.client_data = NULL;
>> >>         apr_pollset_add(pollset, &pollfd);
>> >>         apr_int32_t queries_recvd;
>> >>         const apr_pollfd_t* activefds;
>> >>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT,
>> >> &queries_recvd,
>> >>                               &activefds);
>> >>         if (rv == APR_SUCCESS) {
>> >>             assert(queries_recvd == 1);
>> >>             assert(activefds->desc.s == conn->sock);
>> >>             assert(activefds->client_data == NULL);
>> >>         }
>> >>     }
>> >>     return rv;
>> >> }
>> >>
>> >> And calling that before many of the existing calls to get_server_line
>> >> as:
>> >>
>> >>     rv = wait_for_server_or_timeout_no_pool(conn);
>> >>     if (rv != APR_SUCCESS) {
>> >>         ms_release_conn(ms, conn);
>> >>         return rv;
>> >>     }
>> >>
>> >> This is just an experiment; I think I can streamline this by
>> >> pre-populating the pollfd structure as part of the apr_memcache_conn_t
>> >> (actually now apr_memcache2_conn_t).
>> >>
>> >> I have two questions about this:
>> >>
>> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4
>> >> is
>> >> somewhat different from the one that ships with Apache 2.2.  In
>> >> particular
>> >> the 2.4 version cannot be compiled against the headers that come with a
>> >> 2.2
>> >> distribution.  Is there any downside to taking my hacked 2.2
>> >> apr_memcache.c
>> >> and running it in Apache 2.4?  Or should I maintain two hacks?
>> >>
>> >> 2. This seems wasteful in terms of system calls.  I am making an extra
>> >> call to poll, rather than relying on the socket timeout.  The socket
>> >> timeout didn't work as well as this though.  Does anyone have any
>> >> theories
>> >> as to why, or what could be done to the patch in
>> >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
>> >>
>> >> -Josh
>> >>
>> >>
>> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz
>> >> <jmara...@google.com>wrote:
>> >>
>> >>> Following up: I tried doing what I suggested above: patching that
>> >>> change
>> >>> into my own copy of apr_memcache.c  It was first of all a bad idea to
>> >>> pull
>> >>> in only part of apr_memcache.c because that file changed slightly
>> >>> between
>> >>> 2.2 and 2.4 and our module works in both.
>> >>>
>> >>> I was successful making my own version of apr_memcache (renaming
>> >>> entry-points apr_memcache2*) that I could hack.  But if I changed the
>> >>> socket timeout from -1 to 10 seconds, then the system behaved very
>> >>> poorly
>> >>> under load test (though it worked fine in our unit-tests and
>> >>> system-tests).
>> >>>  In other words, I think the proposed patch that Jeff pointed to above
>> >>> is
>> >>> not really working (as he predicted).  This test was done without
>> >>> SIGSTOPing the memcached; it would timeout under our load anyway and
>> >>> thereafter behave poorly.
>> >>>
>> >>> I'm going to follow up on that bugzilla entry, but for now I'm going
>> >>> to
>> >>> pursue my own complicated mechanism of timing out the calls from my
>> >>> side.
>> >>>
>> >>> -Josh
>> >>>
>> >>>
>> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz
>> >>> <jmara...@google.com>wrote:
>> >>>
>> >>>> Thanks Jeff, that is very helpful.  We are considering a course of
>> >>>> action and before doing any work toward this, I'd like to understand
>> >>>> the
>> >>>> pitfalls from people that understand Apache better than us.
>> >>>>
>> >>>> Here's our reality: we believe we need to incorporate memcached for
>> >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
>> >>>> large sites & hosting providers.  We are fairly close (we think) to
>> >>>> releasing this functionality as beta.  However, in such large sites,
>> >>>> stuff
>> >>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it
>> >>>> does we
>> >>>> want to fall back to serving partially unoptimized sites rather than
>> >>>> hanging the servers.
>> >>>>
>> >>>> I understand the realities of backward-compatible APIs.  My
>> >>>> expectation
>> >>>> is that this would take years to make it into an APR distribution we
>> >>>> could
>> >>>> depend on.  We want to deploy this functionality in weeks.  The
>> >>>> workarounds
>> >>>> we have tried backgrounding the apr_memcache calls in a thread and
>> >>>> timing
>> >>>> out in mainline are complex and even once they work 100% will be very
>> >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
>> >>>> 'apachectl stop') if this happens more than (say) once a month.
>> >>>>
>> >>>> Our plan is to copy the patched implementation of
>> >>>> apr_memcache_server_connect and the static methods it calls into a
>> >>>> new .c
>> >>>> file we will link into our module, naming the new entry-point
>> >>>> something
>> >>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
>> >>>> CS/SE perspective this is offensive and we admit it, but from a
>> >>>> product
>> >>>> quality perspective we believe this beats freezes and
>> >>>> complicated/imperfect
>> >>>> workarounds with threads.
>> >>>>
>> >>>> So I have two questions for the Apache community:
>> >>>>
>> >>>>    1. What are the practical problems with this approach?  Note that
>> >>>>    in any case a new APR rev would require editing/ifdefing our code
>> >>>> anyway,
>> >>>>    so I think immunity from APR updates such as this patch being
>> >>>> applied is
>> >>>>    not a distinguishing drawback.
>> >>>>    2. Is there an example of the correct solution to the technical
>> >>>>    problem Jeff highlighted: "it is apparently missing a call to
>> >>>>    adjust the socket timeout and to discard the connection if the
>> >>>>    timeout is reached".  That sounds like a pattern that might be
>> >>>>    found elsewhere in the Apache HTTPD code base.
>> >>>>
>> >>>> Thanks in advance for your help!
>> >>>> -Josh
>> >>>>
>> >>>>
>> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick
>> >>>> <traw...@gmail.com>wrote:
>> >>>>
>> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz
>> >>>>> <jmara...@google.com>
>> >>>>> wrote:
>> >>>>> > Is there a mechanism to time out individual operations?
>> >>>>>
>> >>>>> No, the socket connect timeout is hard-coded at 1 second and the
>> >>>>> socket I/O timeout is disabled.
>> >>>>>
>> >>>>> Bugzilla bug
>> >>>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
>> >>>>> has a patch, though it is apparently missing a call to adjust the
>> >>>>> socket timeout and to discard the connection if the timeout is
>> >>>>> reached.  More importantly, the API can only be changed in future
>> >>>>> APR
>> >>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
>> >>>>> APR-Util 1.6.
>> >>>>>
>> >>>>> >
>> >>>>> > If memcached freezes, then it appears my calls to 'get' will block
>> >>>>> until
>> >>>>> > memcached wakes up.  Is there any way to set a timeout for that
>> >>>>> > call?
>> >>>>> >
>> >>>>> > I can repro this in my unit tests by sending a SIGSTOP to
>> >>>>> > memcached
>> >>>>> before
>> >>>>> > doing a 'get'?
>> >>>>> >
>> >>>>> > Here are my observations:
>> >>>>> >
>> >>>>> > apr_memcache_multgetp seems to time out in bounded time if I
>> >>>>> > SIGSTOP
>> >>>>> the
>> >>>>> > memcached process. Yes!
>> >>>>> >
>> >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
>> >>>>> memcached
>> >>>>> > process.
>> >>>>> >
>> >>>>> > That last one really sucks as I am using that to print the status
>> >>>>> > of
>> >>>>> all my
>> >>>>> > cache shards to the log file if I detected a problem :(
>> >>>>> >
>> >>>>> >
>> >>>>> > Why does apr_memcache_multgetp do what I want and not the others?
>> >>>>>  Can I
>> >>>>> > induce the others to have reasonable timeout behavior?
>> >>>>> >
>> >>>>> > When I SIGSTOP memcached this makes it hard to even restart
>> >>>>> > Apache,
>> >>>>> at
>> >>>>> > least with graceful-stop.
>> >>>>> >
>> >>>>> >
>> >>>>> > On a related note, the apr_memcache
>> >>>>> > documentation<
>> >>>>>
>> >>>>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
>> >>>>> >is
>> >>>>> > very thin.  I'd be happy to augment it with my observations on its
>> >>>>> > usage
>> >>>>> > and the meaning of some of the arguments if that was desired.  How
>> >>>>> would I
>> >>>>> > go about that?
>> >>>>>
>> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
>> >>>>> include files, build (make dox) and inspect the results, submit a
>> >>>>> patch to dev@apr.apache.org.
>> >>>>>
>> >>>>> >
>> >>>>> > -Josh
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Born in Roswell... married an alien...
>> >>>>> http://emptyhammock.com/
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>
>
>
>
> --
> Born in Roswell... married an alien...
> http://emptyhammock.com/



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Index: memcache/apr_memcache.c
===================================================================
--- memcache/apr_memcache.c     (revision 1456414)
+++ memcache/apr_memcache.c     (working copy)
@@ -30,8 +30,18 @@
     apr_bucket_brigade *bb;
     apr_bucket_brigade *tb;
     apr_memcache_server_t *ms;
+    apr_pollset_t *pollset; /* Set-up to poll only for this one connection */
 };                                                          
 
+/**
+ * Indicates whether a lock is held when calling helper functions from either
+ * state.
+ */
+typedef enum {
+    LOCK_NOT_HELD,
+    LOCK_HELD
+} caller_lock_status_t;
+
 /* Strings for Client Commands */
 
 #define MC_EOL "\r\n"
@@ -109,15 +119,20 @@
 
 #define MULT_GET_TIMEOUT 50000
 
-static apr_status_t make_server_dead(apr_memcache_t *mc, apr_memcache_server_t 
*ms)
+static apr_status_t mark_server_dead(apr_memcache_server_t *ms,
+                                     caller_lock_status_t caller_lock_status)
 {
 #if APR_HAS_THREADS
-    apr_thread_mutex_lock(ms->lock);
+    if (caller_lock_status == LOCK_NOT_HELD) {
+        apr_thread_mutex_lock(ms->lock);
+    }
 #endif
     ms->status = APR_MC_SERVER_DEAD;
     ms->btime = apr_time_now();
 #if APR_HAS_THREADS
-    apr_thread_mutex_unlock(ms->lock);
+    if (caller_lock_status == LOCK_NOT_HELD) {
+        apr_thread_mutex_unlock(ms->lock);
+    }
 #endif
     return APR_SUCCESS;
 }
@@ -143,7 +158,7 @@
     return rv;
 }
 
-static apr_status_t mc_version_ping(apr_memcache_server_t *ms);
+static apr_status_t mc_version_ping_lock_held(apr_memcache_server_t *ms);
 
 APR_DECLARE(apr_memcache_server_t *) 
 apr_memcache_find_server_hash(apr_memcache_t *mc, const apr_uint32_t hash)
@@ -181,10 +196,10 @@
 #if APR_HAS_THREADS
             apr_thread_mutex_lock(ms->lock);
 #endif
-            /* Try the dead server, every 5 seconds */
+            /* Try the dead server, every 5 seconds, keeping the lock */
             if (curtime - ms->btime >  apr_time_from_sec(5)) {
                 ms->btime = curtime;
-                if (mc_version_ping(ms) == APR_SUCCESS) {
+                if (mc_version_ping_lock_held(ms) == APR_SUCCESS) {
                     make_server_live(mc, ms);
 #if APR_HAS_THREADS
                     apr_thread_mutex_unlock(ms->lock);
@@ -282,9 +297,48 @@
 
 APR_DECLARE(apr_status_t) apr_memcache_disable_server(apr_memcache_t *mc, 
apr_memcache_server_t *ms)
 {
-    return make_server_dead(mc, ms);
+    return mark_server_dead(ms, LOCK_NOT_HELD);
 }
 
+/*
+ * Cleans up connections and/or bad servers as required.
+ *
+ * This function should only be called after a fatal error communicating
+ * with a server.
+ */
+static void disable_server_and_connection(apr_memcache_server_t *ms,
+                                          caller_lock_status_t lock_status,
+                                          apr_memcache_conn_t *conn) {
+    if (conn != NULL) {
+        ms_bad_conn(ms, conn);
+    }
+    mark_server_dead(ms, lock_status);
+}
+
+/*
+ * Polls a socket to see whether the next I/O operation call will block.
+ * The status returned is based on apr_pollset_poll:
+ * 
http://apr.apache.org/docs/apr/trunk/group__apr__poll.html#ga6b31d7b3a7b2d356370403dd2b79ecf3
+ *
+ * In practice this appears to return APR_TIMEUP on timeout (hardcoded to 
50ms).
+ *
+ * Returns APR_SUCCESSS if it's OK to read.  Otherwise it releases the
+ * connection and returns the status.
+ */
+static apr_status_t 
poll_server_releasing_connection_on_failure(apr_memcache_server_t *ms,
+                                                                
caller_lock_status_t lock_status,
+                                                                
apr_memcache_conn_t *conn) {
+    apr_int32_t queries_recvd;
+    const apr_pollfd_t *activefds;
+
+    apr_status_t rv = apr_pollset_poll(conn->pollset, MULT_GET_TIMEOUT,
+                                       &queries_recvd, &activefds);
+    if (rv != APR_SUCCESS) {
+        disable_server_and_connection(ms, lock_status, conn);
+    }
+    return rv;
+}
+
 static apr_status_t conn_connect(apr_memcache_conn_t *conn)
 {
     apr_status_t rv = APR_SUCCESS;
@@ -339,6 +393,7 @@
     conn->p = np;
     conn->tp = tp;
 
+
     rv = apr_socket_create(&conn->sock, APR_INET, SOCK_STREAM, 0, np);
 
     if (rv != APR_SUCCESS) {
@@ -346,6 +401,26 @@
         return rv;
     }
 
+    rv = apr_pollset_create(&conn->pollset, 1, np, 0);
+    if (rv == APR_SUCCESS) {
+        /*
+         * Populate an 'is readable' pollset for this socket to allow early
+         * timeout detection.
+         */
+        apr_pollfd_t pollfd;
+
+        pollfd.desc_type = APR_POLL_SOCKET;
+        pollfd.reqevents = APR_POLLIN;
+        pollfd.p = np;
+        pollfd.desc.s = conn->sock;
+        pollfd.client_data = NULL;
+        rv = apr_pollset_add(conn->pollset, &pollfd);
+    }
+    if (rv != APR_SUCCESS) { /* apr_pollset_create or apr_pollset_add */
+        apr_pool_destroy(np);
+        return rv;
+    }
+
     conn->buffer = apr_palloc(conn->p, BUFFER_SIZE);
     conn->blen = 0;
     conn->ms = ms;
@@ -588,6 +663,31 @@
     return apr_brigade_cleanup(conn->tb);
 }
 
+static apr_status_t 
sendv_and_get_server_line_with_timeout(apr_memcache_server_t *ms,
+                                                           apr_memcache_conn_t 
*conn,
+                                                           struct iovec* vec,
+                                                           int vec_size,
+                                                           
caller_lock_status_t lock_status) {
+    apr_size_t written;
+    apr_status_t rv = apr_socket_sendv(conn->sock, vec, vec_size, &written);
+    if (rv != APR_SUCCESS) {
+        disable_server_and_connection(ms, lock_status, conn);
+        return rv;
+    }
+
+    rv = poll_server_releasing_connection_on_failure(ms, lock_status, conn);
+    if (rv != APR_SUCCESS) {
+        return rv;
+    }
+
+    rv = get_server_line(conn);
+    if (rv != APR_SUCCESS) {
+        disable_server_and_connection(ms, LOCK_NOT_HELD, conn);
+    }
+
+    return rv;
+}
+
 static apr_status_t storage_cmd_write(apr_memcache_t *mc,
                                       char *cmd,
                                       const apr_size_t cmd_size,
@@ -601,7 +701,6 @@
     apr_memcache_server_t *ms;
     apr_memcache_conn_t *conn;
     apr_status_t rv;
-    apr_size_t written;
     struct iovec vec[5];
     apr_size_t klen;
 
@@ -641,22 +740,13 @@
     vec[4].iov_base = MC_EOL;
     vec[4].iov_len  = MC_EOL_LEN;
 
-    rv = apr_socket_sendv(conn->sock, vec, 5, &written);
+    rv = sendv_and_get_server_line_with_timeout(ms, conn, vec, 5,
+                                                LOCK_NOT_HELD);
 
     if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        apr_memcache_disable_server(mc, ms);
         return rv;
     }
 
-    rv = get_server_line(conn);
-
-    if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        apr_memcache_disable_server(mc, ms);
-        return rv;
-    }
-
     if (strcmp(conn->buffer, MS_STORED MC_EOL) == 0) {
         rv = APR_SUCCESS;
     }
@@ -730,7 +820,6 @@
     apr_memcache_server_t *ms;
     apr_memcache_conn_t *conn;
     apr_uint32_t hash;
-    apr_size_t written;
     apr_size_t klen = strlen(key);
     struct iovec vec[3];
 
@@ -756,21 +845,12 @@
     vec[2].iov_base = MC_EOL;
     vec[2].iov_len  = MC_EOL_LEN;
 
-    rv = apr_socket_sendv(conn->sock, vec, 3, &written);
-
+    rv = sendv_and_get_server_line_with_timeout(ms, conn, vec, 3,
+                                                LOCK_NOT_HELD);
     if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        apr_memcache_disable_server(mc, ms);
         return rv;
     }
 
-    rv = get_server_line(conn);
-    if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        apr_memcache_disable_server(mc, ms);
-        return rv;
-    }
-
     if (strncmp(MS_VALUE, conn->buffer, MS_VALUE_LEN) == 0) {
         char *flags;
         char *length;
@@ -862,7 +942,6 @@
     apr_memcache_server_t *ms;
     apr_memcache_conn_t *conn;
     apr_uint32_t hash;
-    apr_size_t written;
     struct iovec vec[3];
     apr_size_t klen = strlen(key);
 
@@ -890,21 +969,12 @@
     vec[2].iov_base = conn->buffer;
     vec[2].iov_len  = klen;
 
-    rv = apr_socket_sendv(conn->sock, vec, 3, &written);
-
+    rv = sendv_and_get_server_line_with_timeout(ms, conn, vec, 3,
+                                                LOCK_NOT_HELD);
     if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        apr_memcache_disable_server(mc, ms);
         return rv;
     }
 
-    rv = get_server_line(conn);
-    if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        apr_memcache_disable_server(mc, ms);
-        return rv;
-    }
-
     if (strncmp(MS_DELETED, conn->buffer, MS_DELETED_LEN) == 0) {
         rv = APR_SUCCESS;
     }
@@ -931,7 +1001,6 @@
     apr_memcache_server_t *ms;
     apr_memcache_conn_t *conn;
     apr_uint32_t hash;
-    apr_size_t written;
     struct iovec vec[3];
     apr_size_t klen = strlen(key);
 
@@ -959,21 +1028,12 @@
     vec[2].iov_base = conn->buffer;
     vec[2].iov_len  = klen;
 
-    rv = apr_socket_sendv(conn->sock, vec, 3, &written);
-
+    rv = sendv_and_get_server_line_with_timeout(ms, conn, vec, 3,
+                                                LOCK_NOT_HELD);
     if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        apr_memcache_disable_server(mc, ms);
         return rv;
     }
 
-    rv = get_server_line(conn);
-    if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        apr_memcache_disable_server(mc, ms);
-        return rv;
-    }
-
     if (strncmp(MS_ERROR, conn->buffer, MS_ERROR_LEN) == 0) {
         rv = APR_EGENERAL;
     }
@@ -1030,7 +1090,6 @@
 {
     apr_status_t rv;
     apr_memcache_conn_t *conn;
-    apr_size_t written;
     struct iovec vec[2];
 
     rv = ms_find_conn(ms, &conn);
@@ -1046,19 +1105,12 @@
     vec[1].iov_base = MC_EOL;
     vec[1].iov_len  = MC_EOL_LEN;
 
-    rv = apr_socket_sendv(conn->sock, vec, 2, &written);
-
+    rv = sendv_and_get_server_line_with_timeout(ms, conn, vec, 2,
+                                                LOCK_NOT_HELD);
     if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
         return rv;
     }
 
-    rv = get_server_line(conn);
-    if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
-        return rv;
-    }
-
     if (strncmp(MS_VERSION, conn->buffer, MS_VERSION_LEN) == 0) {
         *baton = apr_pstrmemdup(p, conn->buffer+MS_VERSION_LEN+1, 
                                 conn->blen - MS_VERSION_LEN - 2);
@@ -1073,10 +1125,9 @@
     return rv;
 }
 
-apr_status_t mc_version_ping(apr_memcache_server_t *ms)
+static apr_status_t mc_version_ping_lock_held(apr_memcache_server_t *ms)
 {
     apr_status_t rv;
-    apr_size_t written;
     struct iovec vec[2];
     apr_memcache_conn_t *conn;
 
@@ -1093,14 +1144,11 @@
     vec[1].iov_base = MC_EOL;
     vec[1].iov_len  = MC_EOL_LEN;
 
-    rv = apr_socket_sendv(conn->sock, vec, 2, &written);
-
+    rv = sendv_and_get_server_line_with_timeout(ms, conn, vec, 2, LOCK_HELD);
     if (rv != APR_SUCCESS) {
-        ms_bad_conn(ms, conn);
         return rv;
     }
 
-    rv = get_server_line(conn);
     ms_release_conn(ms, conn);
     return rv;
 }
@@ -1664,6 +1712,11 @@
         return rv;
     }
 
+    rv = poll_server_releasing_connection_on_failure(ms, LOCK_NOT_HELD, conn);
+    if (rv != APR_SUCCESS) {
+        return rv;
+    }
+
     ret = apr_pcalloc(p, sizeof(apr_memcache_stats_t));
 
     do {

Re: apr_memcache operation timeouts

Reply via email to