Thanks, Alex. My bad. Let me clarify the issue. I have Squid 3.2.1 (transparent) + c-icap (e.g. clamav). Currently I'm testing performance of this setup with httperf tool.
When the number of client http requests is so big that c-icap becomes overloaded, some connections from Squid to icap are staying in SYN_SENT tcp state because c-icap listen backlog is full. Corresponding Xaction objects are not destructed after client request timeout (I use 5 secs for httperf requests) causing a lot of noteCommRead and noteCommWrote exceptions at the same time (in 2-4 mins after the test) - when r/w operations on the socket time out. As a consequence, Squid leaks FD's (for a while) and meaninglessly switches icap status in minutes after the test. I found out that all Xaction objects not being destructed are initiated by HttpStateData ('avi' adaptation check). Looks like Squid is missing HttpStateData cleanup when client request timeouts. On Tue, 2012-08-28 at 09:26 -0600, Alex Rousskov wrote: > On 08/28/2012 08:56 AM, Alexander Komyagin wrote: > > > It seems that I've found the problem related to the case when listen > > queue of ICAP server is full and it can accept new connections no more. > > Hi Alexander, > > It may help if you summarize the problem before (or after) giving > the details. It is clear that the ICAP server is overloaded, but I > assume you are referring to some Squid-specific problem instead, and it > is not clear which of the Squid events you describe below are a problem > you are trying to address. > > > > This way Squid has a bunch (like 800 or so) of connections in SYN_SENT > > state, and after their timeout we encounter exceptions in noteCommRead > > (errno 110 and 104 for some) and noteCommWrote (errno 32) in the Xaction > > object. And then Squids temporarily turns off the icap av. > > I assume the above is _not_ the problem you are trying to describe but > the correct behavior (i.e., the overloaded ICAP server is marked as > "bad" and is avoided for a while, as expected). Please correct me if I > am wrong. > > > > I noticed two VERY strange things: > > > > 1) in noteCommRead connection->isOpen() returns true, and Xaction thinks > > that the connection is established, but prior to the exception that > > connection was marked as SYN_SENT in `netstat -lant` output. > > Connection::isOpen() means that the connection has a valid/opened socket > descriptor and nothing more. If you prefer, it means that the connection > is "not closed". It is more of a socket-level check rather than a > TCP-level check. In Xaction new connections are created with ConnOpener job. ConnOpener sets connection FD iff he _really_ thinks the connection is now established (comm_connect_addr returned COMM_OK). But according to `netstat` that connection was always in SYN_SENT state. Maybe I just missed the point where it became ESTABLISHED. I will try to check it later. > > > > 2) client (httperf) has request timeout set to 5 sec, so all Xaction > > objects should be destroyed by the time I got those exceptions (through > > noteInitiatorAborted() ), right? > > Yes (if your ICAP transaction timeout is much longer than 5 seconds), > provided the HTTP client closes the connection in a way that Squid can > notice immediately. > > > > Then why I still do get noteComm* callbacks and it looks like > > noteInitiatorAborted() was never called for corresponding "nasty" > > Xaction objects? > > > > It worth mentioning that most Xaction objects are "good" and I can see > > noteInititiatorAborted() being successfully called for them. But those > > "nasty" objects are really breaking the stuff. > > I do not know what you mean by "nasty objects" or "breaking stuff" but > it is possible that there are Squid bugs in detecting client termination > conditions, propagating client termination notifications to ICAP, and/or > processing received notifications, of course. > > To triage, trace a single HTTP client transaction and a single > "nasty"(?) ICAP transaction using debug_options ALL,9 to see why > noteInitiatorAborted() is not scheduled or is not acted upon. Does the > HTTP code detect that the client has closed the connection? If yes, does > it sent an abort notification to ICAP? If yes, does the ICAP code > correctly reacts to that notification? > > Post that trace here if you need help analyzing it (but, as usual, there > is no guarantee somebody will help, especially if the trace requires a > lot of work to isolate the interesting bits). > > > Thank you, > > Alex. > > -- Best wishes, Alexander Komyagin