Hi Fabien, > So, may be my specific need is not the best example to generalize, > but I'm still thinking that a dispatch and collect pattern can be > quite useful in a grid-oriented network.
Definitely. I am actually pretty enthusiastic about it as it seems to be the third (aside of pub/sub and req/rep) fully scalable pattern I've ever seen. > I understand that. My usage of XREP sockets is more a pratical > issue. Semantically, it is nearer from the REQ socket. I just > cannot remove the strict send/recv policy of REQ to allow it to recv > multiple replies and that's why I use a XREP socket to implement it, > but I would clearly prefer a dedicate socket with a clear endpoint > semantic for both. Ack. >> The obvious problem with any one request many replies model is that >> the requester has no idea whether it have got all the answers or >> not yet. Specifically, think of large distributed topologies where >> at least a part of topology is likely to be offline at any given >> moment. >> >> The only solution seems to be to set a deadline for the replies. >> The user code could then look something like this: >> >> s = socket (SURVEYOR); zmq_setsockopt (s, ZMQ_DEADLINE, 10); // >> create a request... zmq_send (s, request, 0); while (true) { >> zmq_msg_t reply; rc = zmq_recv (s,&reply, 0); if (rc< 0&& errno = >> EDEADLINE) break; // process reply here... } > > Personnaly, and it is really only a matter of taste, I don't like the > idea of the socket handling the deadline itself. If I would like to > have a lock-steps approach of the problem, I would said that all > PATCH socket required all connected sockets to send a reply before > processing a new request. Since it doesn't know how many sub > connections are below each socket, the protocol would required to > send back a signal telling so. > > So, the pseudo-code for the patch socket would be something like: > > on_send(request): for each socket in out_: send(socket, request); > push(wait_queue, socket); end. while not empty?(wait_queue): socket > := poll(wait_queue, POLLIN); reply := recv(socket); if > getsockopt(socket, RCVEND): pop(wait_queue, socket); end. flags := > 0; if getsockopt(socket, RCVMORE): flags := SNDMORE; end. if > empty?(wait_queue): flags := flags | SNDEND; end. send(_in, reply, > flags); end. end. This won't work. The peer can be a device rather than an endpoint. In such case you should expect to get arbitrary number of responses from a single connection. > 2- It lock the patch socket until all reply came back. May be it's > better this way, given that a single request can generate 1000 > replies but it can also completly lockdown the full tree if one > socket downstream fail to answer. In this case, setting a maximum > timeout (in the poll call above)is the only viable solution Bingo! That's the main problem. A pattern that allows single failed or mis-behaved node to block the whole topology cannot be really called scalable. That's why the pattern really needs the timeout/deadline to be an inherent part of it. Martin _______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev