Brian Candler wrote:
> On Sun, Apr 18, 2010 at 02:28:04AM -0500, Joe Holloway wrote:
>> I guess this is an age old problem of fault tolerant service
>> discovery.   The service registry can't know whether or not a service
>> is reachable until a client has tried to connect.
> 
> I am a newbie (so take what I say with a pinch of salt :-), but AFAIK one of
> the features of a 0MQ socket is that if the far end is down, it will keep
> trying to reconnect until it is available.  This takes place behind the
> scenes.
> 
> So maybe the solution is for the socket to have a list of endpoints to try,
> instead of a single one?  It could just walk around the list until one
> connects.  If you randomize this list first, then you get load-balancing
> too.
> 
> The API docs don't make it clear what happens if you call zmq_connect
> multiple times on the same socket.

Yes, you pretty much right.

You can connect your client to multiple instances of a service:

zmq_connect (c, "tcp://svr001.example.com:5555");
zmq_connect (c, "tcp://svr002.example.com:5555");
zmq_connect (c, "tcp://svr003.example.com:5555");

What happens is that the requests are load balanced among the servers.

When one of them fails, up to ZMQ_HWM requests is queued for it and once 
the limit (HWM) is reached it no more request will be sent to that 
server. Once it gets back online queued requests will be sent to it and 
load-balancing starts to dispatch new requests to it automatically.

What's missing is some kind of timeout. Requests queued for a specific 
service instance should be discarded/sent to dead letter queue when the 
instance is not available for some time.

Martin
_______________________________________________
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to