I've seen something similar (I think) with Zyre, where dealer sockets connecting with the same identity do weird things. Try setting ZMQ_ROUTER_HANDOVER on the router socket, see if that helps (you'll need libzmq master).
On Thu, Jun 12, 2014 at 4:15 AM, Sash Nagarkar <s...@dronedeploy.com> wrote: > Hello ZMQ devs, > > We're using PyZMQ 14.3.0 and libzmq 4.0.4 with a ROUTER-DEALER pattern > for a service we're providing. Sorry if this is too verbose, and I > hope this is the right place to ask the question. > > TL;DR: ROUTER socket doesn't receive messages from a DEALER even > though netstat shows several megabytes in the TCP receive queue > (nothing in the send queue). Other connected DEALERs work fine. > > The ROUTER socket is running on a server with ample CPU & memory > headroom, with several DEALER clients that connect, exchange messages, > and can abruptly disconnect repeatedly. We're exclusively using > multipart messages with the first part always being the ZMQ socket > identity, which persists across DEALER connect/disconnects. In other > words, each DEALER client uses the same socket identity across many > connects and disconnects. > > Most of the time, things hum along smoothly (several thousand messages > exchanged, several dozen connect/disconnects). However, every once in > a rare while, we see that one of the DEALER clients connects and sends > messages to the ROUTER that end up never making it to the ROUTER > process. The ROUTER process continues to receive messages from other > DEALER clients. > > Further debugging on the ROUTER server shows one (or more) TCP > connections from the client DEALER that are in the CLOSE_WAIT state > with several megabytes of data sitting in the receive queue to the > ROUTER. We also see one connection from the client DEALER in the > ESTABLISHED state with a receive queue that is growing. > > It's clear that the DEALER client died abruptly once, but then > returned with the same identity and resumed sending messages to the > ROUTER. However, none of the subsequent messages are delivered to the > ROUTER process. Any ideas on why this would be the case? > > I would have provided a test case, but we aren't able to consistently > reproduce the issue. I've copied the output from netstat (with > obfuscated IPs) below, in case it helps. > > > Questions: > - What would cause the receive queue to fill up like this on a ROUTER > while it continues to receive messages from other clients? It's clear > that the messages are all making it to the ROUTER machine. > - Is it safe for DEALER sockets to abruptly disconnect and then reuse > their socket identity? > - How can we mitigate this situation? The closest thing I see is > ZMQ_LINGER, but that applies only to the outgoing queue and not the > incoming one. > - Is there anything I could investigate myself to figure out whether > this is an issue in PyZMQ vs. libzmq? Where should I start? > > > Other potentially relevant info: > - The ROUTER uses PyZMQ's zmq.Poller() to receive messages from the > problem socket and some others. All other nodes in the system > continue to send and receive messages just fine. > - The ROUTER's send queues are pretty much empty. > - We see the same behavior with libzmq 4.0.4 and libzmq 2.2.x, on Ubuntu > 14.04. > > > $ netstat -a > Active Internet connections (servers and established) > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp 0 0 *:12501 *:* LISTEN > tcp 1816956 0 server-ip.:12501 clientA-ip:42571 CLOSE_WAIT > tcp 1551036 0 server-ip.:12501 clientA-ip:42858 CLOSE_WAIT > tcp 0 0 server-ip.:12501 clientB-ip:34000 ESTABLISHED > tcp 5265541 0 server-ip.:12501 clientA-ip:43469 ESTABLISHED > > > Please let me if further information would help. Thank you for > helping build ZMQ, it's been a huge pleasure to work with so far. > > Cheers, > Sash > _______________________________________________ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev _______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev