Hi *!
I found the "bug" that I >>>thought<<< to be caused by EINTR: The EINTR-thingy was gone, when I fixed some illegal refference usages, that the gcc compiler didn't complain about (but "clang" of LLVM did ;o) ... the lost messages problem was still there. BUT: when I switched from ZMQ2 to ZMQ4, I used the same logic for socket and context creation. I allways used one context for each service type for being able to tear down a service or several services of the same type by killing sockets or contexts when the service should happen to be stuck (just as an option). In ZMQ4 I found the context creation to be hidden behind socket creation and used old functions to still stick to different contexts for different service types and for clients. So during initialization, I used a context for the client side, that gets destroyed when everything is started properly. This seems to cause the loss of message for the first few hundered messages ... sounds queer? Indeed! With different contexts (and the "one for startup phase" being destroyed) I lost about 1 of 10 messages. When continuing to run, the losses got fewer and fewer and where gone, when about 600-1000 messages where send. During the search of this bug, I placed sleep(1) after each send() for giving ZMQ the chance to get it on the wire before some (unknown) bug destroyed the data before being sent, but with no success: the messages were still lost. What is really queer and brought me on the right way to solve the bug is, that when placing the sleep(1) BEFORE the send() calls, the messages were NOT lost anymore, all could be received. As soon as I switched the logic to allways use the SAME CONTEXT and never destroy it, the message loosing was solved. Question: is it "forbidden" to use different contexts now in ZMQ Version 4? I'm thinking about writing a short demo when there's time and hope, that this quirk shows up. But I wanted to inform you about this behaviour of loosing messages when contexts are destroyed during run (yes, I disconnected and destroyed the involved sockets before destroying the context, so it's possible, that disconnection is the cause instead of context destruction). Used socket types were: REQ/(X)REP and SUB/PUB Am 2015-01-09 14:47, schrieb sven.koebn...@t-online.de: > I now use some code doublicate of your cmzq code, that does frame send()ing with REUSE and retries in case of EINTR: > > I copied zmsg_recv() and wrapped the frame receiving in a loop checking EINTR. > > zmsg_t *zmsg_recv (void *source) > { > assert (source); > zmsg_t *self = zmsg_new (); > if (!self) > return NULL; > void *handle = zsock_resolve (source); > while (true) { > zframe_t *frame = ZMQ_TEMP_FAILURE_RETRY_F(zframe_recv (handle)); > if (!frame) { > zmsg_destroy (&self); > logFatal("data loss while receiving frame"); > break; // Interrupted or terminated > } > if (zmsg_append (self, &frame)) { > zmsg_destroy (&self); > logFatal("data loss while appending frame"); > break; > } > if (!zsock_rcvmore (handle)) > break; // Last message frame > } > return self; > } > with ZMQ_TEMP_FAILURE_RETRY_F being a short macro that handles EINTR, EAGAIN and throw()ing with errno==EINTR or EAGAIN > > likewise with zmsg_send(), which had to be modified deeper because I cannot look into zmsg_t and so used zframe_send() and zmsg_next(): > > int zmsg_send (zmsg_t **self_p, void *dest) > { > assert (self_p); > assert (dest); > zmsg_t *self = *self_p; > int rc = 0; > void *handle = zsock_resolve (dest); > if (self) { > assert (zmsg_is (self)); > zframe_t *frame = zmsg_first (self); > while (frame) { > zframe_t *next_frame=zmsg_next(self); > rc = ZMQ_TEMP_FAILURE_RETRY(zframe_send (&frame, handle, > next_frame ? ZFRAME_MORE+ZFRAME_REUSE : ZFRAME_REUSE)); > if (rc != 0) { > logFatal("data loss while sending frame"); > break; > } > frame = next_frame; > } > zmsg_destroy (self_p); > } > return rc; > } > > THAT MADE THE BEHAVIOUR MUCH BETTER ;O))) but not fully functional. > > I now loose messages only in Dispatcher Service until all (in testing environment) five Dispatchers are "dead" (waiting for answers that won't come ... yes, I have harder code that keeps Dispatchers working in such cases, but that code is currently disabled for easier testing). > > FOR THE MOMENT, I ASSUME, THE REST TO BE A BUG INSIDE MY DISPATCHER and check, which message configurations cause the vanishing ... this definitely does not happen for all types, so THIS must be a bug inside the application. > > I'll inform you as soon as I have a hint (or prove) that there is a bug on ZMQ side ;o) > > In any case, I'll go deeper in checking when and why I get EINTRs ... maybe some smart pointer (boost) is missused by me and deletes messages (at the end of a block) that are still waiting for delivery by ZMQ in another thread. > > Am 2015-01-09 13:37, schrieb Pieter Hintjens: > >> On Fri, Jan 9, 2015 at 1:25 PM, <sven.koebn...@t-online.de> wrote: >> >>> I get that error only during debuging inside Eclipse C++ (gdb). >> >> Makes sense. The debugger is sending interrupt signals. It's going to >> make a mess of any logic that uses them. I don't think you can make >> the code robust against this, nor would it be a good idea to make the >> code more complex just so it will work under a debugger. >> >>> If you know, what else causes EINTRs beside Ctrl-C and likewise, just tell me ... I just don't know. Using ZMQ2, I NEVER had EINTR, even if single stepping the application. >> >> Hmm. A lot changed from ZMQ v2 to v4. Also, CZMQ is doing default >> signal handling that you might want to modify (it has hooks so you can >> switch it off). > > _______________________________________________ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev [1] Links: ------ [1] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev