I'll try again, but the system ist already rather complex and it will be much effort to strip it down (maybe impossible). So I'd have to build the testcase from scratch.
Current Scenario is the following:

normal devices:
- a modified queue device, implemented as LRU queue (not exactly as described in your docu, but same algo)
- polls on
   - a connected REP-socket (service port)
   - a connected REP-socket (worker end of queue device)
   - a connected SUB-socket (subscription to dispatcher devices outer PUB-end)

normal workers:
- running behind their queue device
- polling on REQ-socket
- polling on SUB-socket (connected to the inner PUB-end of their device queue)

dispatcher device:
- same LRU-like queue device, but
- the PUB/SUB direction goes the other way: "dispatcher" workers publish things, that are SUB'ed by "normal" workers
- so it has an inner SUB-socket to the workers, shuffling messages to the outer PUB-end

dispatcher workers:
- also running behind the queue, but
- polling on REQ-socket and their own devices PUB-end

The problem occurs in the dispatcher device:
- one dispatcher gets a message, which should get published
- the message gets recoded and definitely and  errorfree send to the queue's inner SUB-end
- the poll(... ,-1)ing dispatcher queue >>>should<<< receive it and suffle to it's outer PUB-socket, but poll does not return ;o(
- when I send another message, I get a return code "2" from poll !!!
- when I use a timeout, I get retcode 0 and the next loop polls again and immediately returns with "1" for the waiting message.

I expect, that this only occurs, when using poll() on different socket types (2xREQ and 1xSUB), and it also seems to be a timing problem: according to the logs, the "delayed" message could arrive exactly at the time, when the queue calls poll(). Sometimes it works fine, but not often.
A good hint for the bug finding could be, that poll(...,timeout) also returns "0", but the next call to it immediately returns "1".

Nonetheless, I'l try to friggle some testcase over xmas, hoping, that I have a chance to easily show the buggy behaviour.

^5
Sven
---------------------------------------------------------
E = mc² ± 2dBA    ----- everything is relative
---------------------------------------------------------


-----Original Message-----
Date: Fri, 17 Dec 2010 12:07:21 +0100
Subject: Re: [zeromq-dev] poll does not return on a SUB socket
From: Martin Sustrik <[email protected]>
To: ZeroMQ development list <[email protected]>

Hi Sven,

> the main issue is, that poll(... ,-1) does not at all return despite
> the fact that there is a message available!
> The usage of timeout value is a current workaround, because the second
> call to poll (the first returned zero) successfully returns a number
> greater one.

Can you provide a minimal test case?

Martin
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to