On Sun, Feb 29, 2004 at 11:43:31PM -0800, Matt Roper wrote: > On Sun, Feb 29, 2004 at 08:56:42PM -0500, Mike Simons wrote: > > pps: if you move the RCVBUF setting to change the "accept_sock" > > instead of the "file", then the problem goes away regardless > > of what size. > > Hmm, I didn't think it was possible to change the RCVBUF and SNDBUF > settings after you had already accepted the connection. I just checked > the tcp(7) manpage, and it contains the following: > > "On individual connections, the socket buffer size must be set prior > to the listen() or connect() calls in order to have it take > effect." [...] > So I think you want to move the RCVBUF setting up to where you set the > REUSEADDR option.
Yes, thanks for pointing that out, older man pages do not have that phrase. I agree that it should be done once before accept instead of after every single accept. Unfortunately the new man page is wrong. Changing the buffer size with SO_RCVBUF or SO_SNDBUF does have "an effect" after a socket is accepted. > Unfortunately I'm not sure why this would cause the > delays you're experiencing; that sounds almost like something to do with > the Nagle Algorithm (although that's a sending issue, not a receiving > issue). Err nope, Nagle should not be causing this sort of behavior. Nagle basicly says ... if you can't send a full TCP frame and you have any outstanding sends that have not yet been ACK'd, then wait until the ACK arrives. Both of those conditions fail in this case, there are many full frames worth of data available to send... and there is no outstanding data to ack. === RFC1122 TRANSPORT LAYER -- TCP October 1989 Internet Engineering Task Force [Page 98] DISCUSSION: The Nagle algorithm is generally as follows: If there is unacknowledged data (i.e., SND.NXT > SND.UNA), then the sending TCP buffers all user data (regardless of the PSH bit), until the outstanding data has been acknowledged or until the TCP can send a full-sized segment (Eff.snd.MSS bytes; see Section 4.2.2.6). === Right around the paste above in rfc1122 they describe a "senders Silly Window Syndrome avoidance algorithm"... this looks like it could be the reason. The Max(SND.WND) was 16k... right at the beginning of the connection. The D is big 64k or more... (as seen in netstat) lots to send. The U is small 1.5k (in the trace provided), but depends on what RCVBUF was reduced to. I don't know what Fs is in linux, but it appears to be 1/3rd (based on observation). I don't know what Timeout is, but it appears to be .2 seconds. === 4.2.3.4 When to Send Data A TCP MUST include a SWS avoidance algorithm in the sender. [...] IMPLEMENTATION: The sender's SWS avoidance algorithm is more difficult than the receivers's, because the sender does not know (directly) the receiver's total buffer space RCV.BUFF. An approach which has been found to work well is for the sender to calculate Max(SND.WND), the maximum send window it has seen so far on the connection, and to use this value as an estimate of RCV.BUFF. Unfortunately, this can only be an estimate; the receiver may at any time reduce the size of RCV.BUFF. To avoid a resulting deadlock, it is necessary to have a timeout to force transmission of data, overriding the SWS avoidance algorithm. In practice, this timeout should seldom occur. [...] Send data [only if]: [...] (3) or if at least a fraction Fs of the maximum window can be sent, i.e., if: [SND.NXT = SND.UNA and] min(D.U) >= Fs * Max(SND.WND); (4) or if data is PUSHed and the override timeout occurs. Here Fs is a fraction whose recommended value is 1/2. The override timeout should be in the range 0.1 - 1.0 seconds. It may be convenient to combine this timer with the timer used to probe zero windows (Section 4.2.2.17). Finally, note that the SWS avoidance algorithm just specified is to be used instead of the sender-side algorithm contained in [TCP:5]. === It appears to be a combination of the Really Big loopback MTU (16k), The application lowering RCVBUF after accept instead of before, and this senders side SWS which are leading to the problem. _______________________________________________ vox-tech mailing list [EMAIL PROTECTED] http://lists.lugod.org/mailman/listinfo/vox-tech