Hi all,
I'm continuing to see some less-than-desirable latency behavior in PUB/SUB 
networks despite HWM=1. I first posted about this back in May. See 
http://www.mail-archive.com/[email protected]/msg08943.html. I've 
picked this issue up again and have done quite a bit more testing, now with 
libzmq-3.0. The behavior is still present, and I have some slightly stronger 
results.
= Methods =
I construct a PUB/SUB network with 1 message source, N processing nodes, and 1 
sink. The source is multi-casting to the N processing nodes, and they are 
running in parallel. The nodes are "dummies;" they just sleep() for some amount 
of time and then send the message on to the sink. The sink prints out some 
message metrics: arrival time, transmission latency -- total time in ZMQ; does 
not include the sleep() time at the intermediate node -- message id, and route.
I try to simulate the video processing system that I'm working on throughout. 
The source sends at 30 msgs/sec, the nodes have variable processing time, and 
the sink works as fast as it can. My "variable processing time" is achieved by 
sleep()ing for some time X where X is normally distributed around mean u and 
variance q. Each processing node gets its own u and q. (This will be relevant 
later.)
You can find my simulation code here: http://pastebin.com/dSBtD1u7. It's about 
100 lines and I would really appreciate it if somebody would sanity-check it 
for me. There is also a utility for graphing the data: 
http://pastebin.com/EBwdxu6M. A simple 30-second run looks something like 
"python lat_test.py 30 > foo && python plot_lat.py foo"
= Results =
The following hold both for TCP and IPC protocols:1) In my tests, ZMQ never 
dropped any messages despite HWM=1. This was the case even when the processing 
nodes were made to run much slower than 30 msg/sec.2) Total transmission 
latency (time in ZMQ) is much greater with asynchronous networks than with 
synchronous ones.
Both of these results are startling, but the second deserves some explanation.  
Remember I said that I was simulating processing time by sampling random sleep 
times from a normal distribution? Well, nodes with greater *variance* ("q" 
parameter) in their sleep times were much more likely to suffer from high 
transmission latency. (Again, the "transmission latency" does not include the 
sleep time itself; just the time on the wire.) This is a crazy result! One 
expects transmission latency to vary with things like message size and number 
of connections, but not the processing jitter.
In the case of my real-time video processing system, this relationship between 
processing jitter and transmission latency wouldn't be a deal-breaker if it 
weren't for the *magnitude* of the resulting latency. We're consistently seeing 
upwards of 5 seconds of transmission latency on network paths that go via 
certain "jittery" system components, and this is easy to reproduce in my 
simulation.
= Thanks! =
I hope this little testing effort that I've undertaken proves useful to the 
project. I encourage everybody who is interested in QoS under ZMQ to run my 
code and check the results for themselves. I am also eager to hear from people 
that are using ZMQ under hard real-time constraints. Feedback and technical 
direction are appreciated. If this is a bug, I'm keen to squash it! And I am 
certainly hoping to continue with ZMQ on my real-time video project.
Cheers!~Brian                                     
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to