Thanks Pieter! I'll try that and see if we encounter it again. Love the work you guys are doing with ZMQ.
On Thu, Jun 12, 2014 at 7:53 AM, Pieter Hintjens <p...@imatix.com> wrote: > I've seen something similar (I think) with Zyre, where dealer sockets > connecting with the same identity do weird things. Try setting > ZMQ_ROUTER_HANDOVER on the router socket, see if that helps (you'll > need libzmq master). > > On Thu, Jun 12, 2014 at 4:15 AM, Sash Nagarkar <s...@dronedeploy.com> wrote: >> Hello ZMQ devs, >> >> We're using PyZMQ 14.3.0 and libzmq 4.0.4 with a ROUTER-DEALER pattern >> for a service we're providing. Sorry if this is too verbose, and I >> hope this is the right place to ask the question. >> >> TL;DR: ROUTER socket doesn't receive messages from a DEALER even >> though netstat shows several megabytes in the TCP receive queue >> (nothing in the send queue). Other connected DEALERs work fine. >> >> The ROUTER socket is running on a server with ample CPU & memory >> headroom, with several DEALER clients that connect, exchange messages, >> and can abruptly disconnect repeatedly. We're exclusively using >> multipart messages with the first part always being the ZMQ socket >> identity, which persists across DEALER connect/disconnects. In other >> words, each DEALER client uses the same socket identity across many >> connects and disconnects. >> >> Most of the time, things hum along smoothly (several thousand messages >> exchanged, several dozen connect/disconnects). However, every once in >> a rare while, we see that one of the DEALER clients connects and sends >> messages to the ROUTER that end up never making it to the ROUTER >> process. The ROUTER process continues to receive messages from other >> DEALER clients. >> >> Further debugging on the ROUTER server shows one (or more) TCP >> connections from the client DEALER that are in the CLOSE_WAIT state >> with several megabytes of data sitting in the receive queue to the >> ROUTER. We also see one connection from the client DEALER in the >> ESTABLISHED state with a receive queue that is growing. >> >> It's clear that the DEALER client died abruptly once, but then >> returned with the same identity and resumed sending messages to the >> ROUTER. However, none of the subsequent messages are delivered to the >> ROUTER process. Any ideas on why this would be the case? >> >> I would have provided a test case, but we aren't able to consistently >> reproduce the issue. I've copied the output from netstat (with >> obfuscated IPs) below, in case it helps. >> >> >> Questions: >> - What would cause the receive queue to fill up like this on a ROUTER >> while it continues to receive messages from other clients? It's clear >> that the messages are all making it to the ROUTER machine. >> - Is it safe for DEALER sockets to abruptly disconnect and then reuse >> their socket identity? >> - How can we mitigate this situation? The closest thing I see is >> ZMQ_LINGER, but that applies only to the outgoing queue and not the >> incoming one. >> - Is there anything I could investigate myself to figure out whether >> this is an issue in PyZMQ vs. libzmq? Where should I start? >> >> >> Other potentially relevant info: >> - The ROUTER uses PyZMQ's zmq.Poller() to receive messages from the >> problem socket and some others. All other nodes in the system >> continue to send and receive messages just fine. >> - The ROUTER's send queues are pretty much empty. >> - We see the same behavior with libzmq 4.0.4 and libzmq 2.2.x, on Ubuntu >> 14.04. >> >> >> $ netstat -a >> Active Internet connections (servers and established) >> Proto Recv-Q Send-Q Local Address Foreign Address State >> tcp 0 0 *:12501 *:* LISTEN >> tcp 1816956 0 server-ip.:12501 clientA-ip:42571 CLOSE_WAIT >> tcp 1551036 0 server-ip.:12501 clientA-ip:42858 CLOSE_WAIT >> tcp 0 0 server-ip.:12501 clientB-ip:34000 ESTABLISHED >> tcp 5265541 0 server-ip.:12501 clientA-ip:43469 ESTABLISHED >> >> >> Please let me if further information would help. Thank you for >> helping build ZMQ, it's been a huge pleasure to work with so far. >> >> Cheers, >> Sash >> _______________________________________________ >> zeromq-dev mailing list >> zeromq-dev@lists.zeromq.org >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev > _______________________________________________ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev _______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev