Will Pierce created THRIFT-1737:
-----------------------------------

             Summary: UDP socket support for python
                 Key: THRIFT-1737
                 URL: https://issues.apache.org/jira/browse/THRIFT-1737
             Project: Thrift
          Issue Type: New Feature
          Components: Python - Library
            Reporter: Will Pierce


This patch adds support for UDP socket servers and clients in python.  This 
reduces overhead and network latency due to TCP handshaking, _especially_ for 
"oneway" service methods.

One useful feature of a UDP service is that the clients don't need to rebuild 
their connection to the server when a UDP packet is lost, so the "blast radius" 
of the timeout exception is limited to a single service call, not the entire 
"connection".  Also, framing is not necessary because UDP packets have length 
encoded in their header.

This transport is not suitable for large messages because UDP is inherently 
limited to 64 KB packet lengths, and often much smaller (500 - 1500 bytes) 
depending on intermediate links and whether UDP fragments are reassembled.  

Avoid large query/response payloads with this transport.


h2. Implementation
UDP support is implemented by subclassing TSocket and TServerSocket into 
TUDPSocket and TServerUDPSocket, and adding a TDatagramTransport.  The server's 
accept() method actually receives an entire inbound request packet.  An inbound 
packet is wrapped as a stream with StringIO, and the response "connection" 
records the sender's host+port so responses are delivered from the server's 
socket back to the client.

The TDatagramTransport converts the EOFError raised after reaching the end of 
the packet into a TTransport exception, to accomodate TServers.

h2. Testing:
The unit tests now have a TestUDP.py script which runs a UDP server and client, 
and exercises several of the ThriftTest service calls, and verifies that 
responses match expectations.  It ensures that "oneway" method calls are truly 
non-blocking, 1 packet "send and forget".  It also forces a timeout in the 
middle of a sequence of blocking RPC calls, which confirms that a timeout only 
breaks a single RPC, not the entire client.

I haven't used this with server types other than TThreadedServer, or in a big 
environment yet.  There may be edge-cases that haven't surfaced yet.

Tested with IPv4 and IPv6 on localhost and python2.7 (dev box is fedora17).

h2. Minor bugfix:
The python RunClientServer.py test script had a 1-line bug where it ran some 
other test scripts twice by mistake (probably a cut and paste error).

h2. General warnings for posterity:
* UDP packets are *easily*spoofed*!
** don't use this on public-internet facing interfaces
** spoofed client IP attacks may turn your server into an attack vector
* UDP is not reliable
** clients will have to handle socket.timeout exceptions for every RPC call
** UDP may be _more_ unreliable during network congestion
* No retries.
** this library doesn't do any retries
** there's only one timeout setting per client, which applies to every method 
call
** but the timeout may be changed with the existing .setTimeout(msec) call
* Compression
** I haven't tested using TZlibTransport wrapping this to compress the packets, 
but it ought to work (unless there are bugs)


h2. Tuning to avoid Timeouts:

Linux hosts tend to have small default values for the kernel's memory buffers 
used to queue up UDP packets.  When that buffer fills up with packets that the 
server process hasn't yet processed, then the kernel drops the packet, even 
though it's been fully decoded and pulled off the NIC.

This would show up as lots of "socket.timeout" exceptions raised in client 
code, and no sign of an inbound method call at the server.

If you run "netstat -s" and see increasing "packet receive errors" in the *Udp* 
section of output, that is strong evidence that you need to increase your 
hosts' receive buffers.

As root, you can raise the UDP buffer receive (and send) space to 4MB with:
{noformat}
  sysctl -w net.core.rmem_default=4194304
  sysctl -w net.core.wmem_default=4194304
{noformat}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to