Re: [zeromq-dev] on scalability of PUB/SUB and PUSH/PULL

2014-06-12 Thread Goswin von Brederlow
On Wed, Jun 11, 2014 at 02:05:13AM -0700, Jun Li wrote:
> Hi,
> 
> I am using PUB/SUB socket pattern to distribute commands from the
> coordinator to the many worker processes, and I also have the PUSH/PULL to
> have each worker process to push the processing results to the coordinator.
> The coordinator is bound to the PUB socket and also the PULL socket, with
> the current context to set to 1 thread.   In my test environment, there
> would be one single coordinator process and up to 200 worker processes.
> 
> I have just started the scalability testing. But it seems that with 15
> worker processes, the end-to-end communication latency is about 15 ms, for
> the coordinator to distribute (via PUB) the commands and finally aggregate
> the results back (via PULL) from the worker processes. But when I increased
> the number of worker processes to 50, I then observed the end-to-end
> communication latency of about 80 ms. This implies that as the number of
> the worker processes grow, the latency also grows and thus brings up the
> scalability issue.

You can hardly say anything with just to points. Is that a linear
increase? exponential? logarithmic? Does is jump between 49 and 50? 
Does it stay at 80ms up to 10 workers?

> The message size communicated between the coordinator and the worker
> processes are not that big, less than 100 Bytes.
> 
> While I am planning to measure the latency spent on each hop, I would like
> to seek suggestions:
> 
> *for a large number of the worker processes to be handled by a single
> coordinator with low latency, should the context at the coordinator be set
> to >  1 thread?
> 
> *Should I use the other socket pattern such as Router/Dealer, instead of
> pub/sub and push/pull, in order to address the scalability issue?
> 
> Regards,
> 
> Jun

Personally I think that if you depend on latency then you always have
a problem. That will be your bottleneck and seriously harm
scalability. You need to pipeline your work, send out more jobs ahead
of time while the workers are still busy with the last job. That way
the latency gets combletly absorbed and becomes irelevant.

MfG
Goswin
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


[zeromq-dev] on scalability of PUB/SUB and PUSH/PULL

2014-06-11 Thread Jun Li
Hi,

I am using PUB/SUB socket pattern to distribute commands from the
coordinator to the many worker processes, and I also have the PUSH/PULL to
have each worker process to push the processing results to the coordinator.
The coordinator is bound to the PUB socket and also the PULL socket, with
the current context to set to 1 thread.   In my test environment, there
would be one single coordinator process and up to 200 worker processes.

I have just started the scalability testing. But it seems that with 15
worker processes, the end-to-end communication latency is about 15 ms, for
the coordinator to distribute (via PUB) the commands and finally aggregate
the results back (via PULL) from the worker processes. But when I increased
the number of worker processes to 50, I then observed the end-to-end
communication latency of about 80 ms. This implies that as the number of
the worker processes grow, the latency also grows and thus brings up the
scalability issue.

The message size communicated between the coordinator and the worker
processes are not that big, less than 100 Bytes.

While I am planning to measure the latency spent on each hop, I would like
to seek suggestions:

*for a large number of the worker processes to be handled by a single
coordinator with low latency, should the context at the coordinator be set
to >  1 thread?

*Should I use the other socket pattern such as Router/Dealer, instead of
pub/sub and push/pull, in order to address the scalability issue?

Regards,

Jun
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev