On Fri, Mar 12, 2010 at 9:34 AM, Justin Johnson <jus...@honesthacker.com>wrote:
> Hi, > > I'm trying to understand why the following error occurs. > > svn: REPORT request failed on '/svn/reponame/!svn/vcc/default' > svn: REPORT of '/svn/reponame/!svn/vcc/default': Could not read response > body: An existing connection was forcibly closed by the remote host. ( > http://HOSTNAME <http://hostname/>) > command exit code: 1 > > I've seen this error in a couple of scenarios: > 1) when performing a checkout on a Windows box with the working copy stored > on a drive mapped to a NAS share > 2) when performing a checkout on a Windows box and the server is an F5 > content switch that just redirects traffic to the Subversion server > > The first scenario is of less concern to me, but I mention it anyway since > I think it is the same problem. > > For the second scenario, I worked with someone on our networking team to > understand the problem. What he discovered and how he "resolved" it with > our F5 content switch can be found below. The server is running Solaris 10, > Subversion 1.6.6, Apache 2.2.11, and repositories are served via HTTP. The > client is running Windows XP SP3 and Subversion 1.6.7 (error occurs with > TortoiseSVN as well), but the error also occurs on Windows Server 2003. I > haven't tested any other Windows client OSes and haven't seen the error on > UNIX, but suspect the underlying problem may exist there and the OS handles > it more gracefully. Here is the explanation by my networking contact. > > **** > The problem that is presenting is that the client's receive buffer is > filling up and staying full for a long period of time. When this occurs, he > advertises a tcp window size of 0 in packets he sends to the destination > F5. This also happens when he goes directly against a server. The server > seems to tolerate it while the F5 does not. > > Last year, I took traces of the traffic against the server by the client > directly, and through the F5, and saw that the server was seeing different > MTU and options from the F5. I modified the standard TCP profile on the F5 > to have it proxy the TCP options the client offered so the server would get > them. I also set it to proxy the MTU setting the client offered. This > seemed to have fixed the problem at that time. But your current testing > failed. > > Upon closer inspection, I determined that the F5 was resetting the > connections, not the server as I had previously thought. This time, I > turned off those two options from last year and increased the Maximum > Segment Retransmissions from the default of 8 to 16. This controls the > number of times the F5 resends a packet after it gets no response. This > also controls the zero window probes he sends to see if the client can > receive data yet. TCP uses a back-off algorithm and increases the time > between retries. With 8 attempts, the total retry time is just over a > minute. I suspect retries of 16 will cause it to retry for 5 or 10 minutes. > > I would really like to get this in front of SVN developers, because > something is getting hosed on the client that causes him to stop pulling off > the receive buffer. If the zero window lasted 10 seconds or so, it would > not be a problem. But for him to in effect go offline for over a minute is, > I believe, a bug. We can just assume that the reason the error does not > occur when you hit the server directly is that the Sun box handles the zero > window issue differently, or it might just retry more than 8 times by > default. Might be a question for the UNIX team as to the retry count. If > we get some time, we could do some packet captures and find out for certain. > > Yesterday and today, I did a few other things that *did not *help. I > increased the TCP receive buffers on the client side sessions, then on the > server side sessions, then both. I then turned off all of the tcp options > in the F5 default TCP profile. > **** > > So, in summary, my problem is currently "resolved" by increasing the > Maximum Segment Retransmissions from the default of 8 to 16 on the F5. > However, as I mentioned above I've seen this problem when connecting > directly to the Subversion server and storing the working copy on a network > drive. > > Does anyone have any ideas? Is this something that can be fixed in the > Subversion code itself? > > Thanks. > Justin > > No responses? This seems like something more for the dev list, but I wanted to follow protocol and wait for a response from the users list first. Thanks. Justin