On 07/16/2014 06:46 PM, Amos Jeffries wrote: > On 17/07/2014 11:10 a.m., Alex Rousskov wrote: >> On 07/16/2014 02:38 PM, Rainer Weikusat wrote: >>> Alex Rousskov <rouss...@measurement-factory.com> writes: >>>> On 07/16/2014 11:11 AM, Rainer Weikusat wrote: >>>>> This is broken because it re-enables readable notifications even if no >>>>> connection was accepted. >> >>>> In other words, it re-enables a check for new connections when >>>> >>>> * we know that we currently cannot accept any new connections, >>>> * Squid is under stress already, and >>>> * our code does not handle multiple deferences of the same TcpAcceptor >>>> object well. >> >> >>> There's an important 4th point: The check is re-enabled when it is known >>> that it will immediately return true again because the connection which >>> was not accepted is still pending (hence, the socket is still readable), >>> more details below. >> >> Agreed! >> >> >>> I changed the TcpAcceptor/ AcceptLimiter code to act as if >>> only a single file descriptor was available for client connections and >>> created two connections via netcat. What it should do in this case is to >>> ignore the second connection until the first one was closed. Instead, it >>> went into an endless 'defer this connection' loop, as mentioned above, >>> because the pending 2nd connection meant the socket remained readable. >> >> OK, I think I now better understand what you meant by "one connection at >> the same time". And even if I do not, we seem to agree on the fix to >> solve the problem, which is even more important :-). >> >> >>> updated patch: >>> >>> --- mad-squid/src/comm/TcpAcceptor.cc 8 Jan 2013 17:36:44 -0000 >>> 1.1.1.2 >>> +++ mad-squid/src/comm/TcpAcceptor.cc 16 Jul 2014 20:36:35 -0000 >>> @@ -204,7 +204,6 @@ >>> } else { >>> afd->acceptNext(); >>> } >>> - SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, >>> 0); >>> >>> } catch (const std::exception &e) { >>> fatalf("FATAL: error while accepting new client connection: %s\n", >>> e.what()); >>> @@ -262,6 +261,8 @@ >>> debugs(5, 5, HERE << "Listener: " << conn << >>> " accepted new connection " << newConnDetails << >>> " handler Subscription: " << theCallSub); >>> + >>> + if (!isLimited) SetSelect(conn->fd, COMM_SELECT_READ, >>> Comm::TcpAcceptor::doAccept, this, 0); >>> notify(flag, newConnDetails); >>> } >> >> >> I am happy to commit the above (after splitting the new if-statement >> into two lines), but since this is not an emergency, I would prefer to >> wait a little for any objections or second opinions. > > Looks like a reasonable temporary solution to me for the stable > releases. I have no problems with it going in if we continue to work on > the queue fixes for trunk.
Hi Amos, Since I cannot promise to satisfy your "work on the queue fixes for trunk" precondition, committing this fix to trunk (or removing the precondition) becomes your call. > Regarding that queue I think it can be cleaned up once and for all by > making doAccept() a proper Call and using the new comm/Read.h API. Yes, but with some adjustments: Calling acceptOne() or a similar method asynchronously is indeed the right way to do this. However, the method that AcceptLimiter will call should not use the comm/Read API unless AcceptLimiter is willing to provide the same arguments that a regular read callback gets. I do not think this complication (and a slight lie that the socket was just selected for reading) are worth it. We should just call a parameterless TcpAcceptor method. acceptOne() may be directly usable for this purpose. doAccept() provides okToAccept() checks that are harmful in this we-have-already-decided-to-resume-listening context and replacements for job protections already offered by async calls API. If doAccept() remains as it is, we should copy its fatal()-related stuff into TcpAcceptor::callException() though. > The queue becomes a std::queue<AsyncCall::Pointer> Yes, or, if one worries about efficiency in this context, this can be optimized further by reusing (and adjusting) AsyncCallQueue instead. I probably would not worry about efficiency in this context, but let me know if I should detail that optimized solution. > and the nasty > removeDead() drops in favour of a TcpAcceptor internal Call::cancel(). Remembering the call for the sole purpose of canceling it is harmful when such cancellations are rare: The async calls API already provides protection from calling dead/gone jobs, and we should just use that in this case. Another fix if somebody is going to work on queue optimization is removal of isLimited: AFAICT, after all the changes, a "limited" acceptor cannot have a read scheduled, and an acceptor with a read scheduled cannot be limited. HTH, Alex.