Public bug reported: Summary:
When nginx has a server that is configured in the 'deferred' listen mode and it takes the client more than approx. 1 second to send data, nginx will hang and eventually time the connection out. Tested Versions: lsb_release -rd: Description: Ubuntu 14.04.5 LTS Release: 14.04 apt-cache policy nginx nginx: Installed: 1.4.6-1ubuntu3.5 Candidate: 1.4.6-1ubuntu3.5 Version table: *** 1.4.6-1ubuntu3.5 0 500 http://mirrors.digitalocean.com/ubuntu/ trusty-updates/main amd64 Packages 500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages 100 /var/lib/dpkg/status 1.4.6-1ubuntu3 0 500 http://mirrors.digitalocean.com/ubuntu/ trusty/main amd64 Packages Reproduction Guidelines: 1. Install nginx on a vanilla system: apt-get update && apt-get upgrade -y && apt-get install -y nginx 2. Configure nginx with a basic SSL setup. A self-signed certificate is fine. The only divergence from the regular nginx SSL setup should be that the 'listen' line must contain the "deferred" directive. 3. Confirm nginx functions correctly using curl (`curl -k https://localhost`). This should print the HTML of the default welcome page. 4. Open a file called test.py. Copy the following script into it: import socket import ssl import sys import time TARGET = ('localhost', 443) def connect_slowly(): print "connecting...", sys.stdout.flush() s = socket.create_connection(TARGET) print "connected" print "sleeping...", sys.stdout.flush() time.sleep(2) print "slept" print "beginning ssl handshake...", sys.stdout.flush() s = ssl.wrap_socket(s) print "complete" s.sendall( b'GET / HTTP/1.1\r\n' b'Host: localhost\r\n' b'\r\n' ) print s.recv(8192) connect_slowly() 5. Execute `python test.py` Expected behaviour: Expect the script to run fairly promptly, and to print: connecting... connected sleeping... slept beginning ssl handshake...complete followed by some portion (possibly all) of the default landing page. Actual behaviour: The script produces the following output: connecting... connected sleeping... slept beginning ssl handshake... and then hangs for approximately a minute, after which it prints the following traceback: Traceback (most recent call last): File "test.py", line 33, in <module> connect_slowly() File "test.py", line 23, in connect_slowly s = ssl.wrap_socket(s) File "/usr/lib/python2.7/ssl.py", line 487, in wrap_socket ciphers=ciphers) File "/usr/lib/python2.7/ssl.py", line 243, in __init__ self.do_handshake() File "/usr/lib/python2.7/ssl.py", line 405, in do_handshake self._sslobj.do_handshake() socket.error: [Errno 104] Connection reset by peer Explanation: Some misbehaviour appears to be occurring in nginx around the TCP_DEFER_ACCEPT timeout and TLS handshakes. Specifically, nginx seems to "lose" the connection once the deferred timeout fires. Observing a packet capture demonstrates the problem more clearly (though note that for obvious reasons you can't do this on localhost!) 13:59:06.546468 IP 51.6.64.223.65442 > 67.205.133.224.443: Flags [S], seq 4198376207, win 65535, options [mss 1452,nop,wscale 5,nop,nop,TS val 146528037 ecr 0,sackOK,eol], length 0 13:59:06.546543 IP 67.205.133.224.443 > 51.6.64.223.65442: Flags [S.], seq 3994111434, ack 4198376208, win 28960, options [mss 1460,sackOK,TS val 729780 ecr 146528037,nop,wscale 6], length 0 13:59:06.835143 IP 51.6.64.223.65442 > 67.205.133.224.443: Flags [.], ack 3994111435, win 4140, options [nop,nop,TS val 146528326 ecr 729780], length 0 13:59:07.542989 IP 67.205.133.224.443 > 51.6.64.223.65442: Flags [S.], seq 3994111434, ack 4198376208, win 28960, options [mss 1460,sackOK,TS val 730030 ecr 146528326,nop,wscale 6], length 0 13:59:07.657420 IP 51.6.64.223.65442 > 67.205.133.224.443: Flags [.], ack 3994111435, win 4140, options [nop,nop,TS val 146529110 ecr 729780], length 0 13:59:08.943065 IP 51.6.64.223.65442 > 67.205.133.224.443: Flags [P.], seq 4198376208:4198376725, ack 3994111435, win 4140, options [nop,nop,TS val 146530274 ecr 729780], length 517 13:59:08.943087 IP 67.205.133.224.443 > 51.6.64.223.65442: Flags [.], ack 4198376725, win 470, options [nop,nop,TS val 730380 ecr 146530274], length 0 14:00:07.693962 IP 67.205.133.224.443 > 51.6.64.223.65442: Flags [R.], seq 3994111435, ack 4198376725, win 470, options [nop,nop,TS val 745067 ecr 146530274], length 0 In this capture, you can see that the TCP 3-way handshake is completed almost immediately: within 300ms. The client then pauses for 2 seconds. 1 second after the original SYN packet was received, the SYN+ACK from the server is retransmitted: this is the characteristic sign of the timeout from TCP_DEFER_ACCEPT firing. The client retransmits the ACK from packet 3. Two seconds after the original client ACK was sent in packet 3, the client sends the SSL Client Hello in a PSH+ACK segment that's 517 bytes long. This is ACK'd by the server (showing that it received it), but no Server Hello is ever sent back: nginx has *lost* the TLS handshake. 61 seconds after the connection was initiated, 60 seconds after the DEFER_ACCEPT timer fired, nginx tears the connection down as idle. This causes our Python traceback. It's not yet clear to me where in nginx this problem is occurring, and I'm not enough of an expert to dive in. I also haven't checked whether this is patched in a future release: I'll investigate in Xenial shortly. Insight welcome! ** Affects: nginx (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1619687 Title: nginx incorrectly handles slow connection in deferred mode To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/1619687/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs