Hi all, I'm continuing to see some less-than-desirable latency behavior in PUB/SUB networks despite HWM=1. I first posted about this back in May. See http://www.mail-archive.com/[email protected]/msg08943.html. I've picked this issue up again and have done quite a bit more testing, now with libzmq-3.0. The behavior is still present, and I have some slightly stronger results. = Methods = I construct a PUB/SUB network with 1 message source, N processing nodes, and 1 sink. The source is multi-casting to the N processing nodes, and they are running in parallel. The nodes are "dummies;" they just sleep() for some amount of time and then send the message on to the sink. The sink prints out some message metrics: arrival time, transmission latency -- total time in ZMQ; does not include the sleep() time at the intermediate node -- message id, and route. I try to simulate the video processing system that I'm working on throughout. The source sends at 30 msgs/sec, the nodes have variable processing time, and the sink works as fast as it can. My "variable processing time" is achieved by sleep()ing for some time X where X is normally distributed around mean u and variance q. Each processing node gets its own u and q. (This will be relevant later.) You can find my simulation code here: http://pastebin.com/dSBtD1u7. It's about 100 lines and I would really appreciate it if somebody would sanity-check it for me. There is also a utility for graphing the data: http://pastebin.com/EBwdxu6M. A simple 30-second run looks something like "python lat_test.py 30 > foo && python plot_lat.py foo" = Results = The following hold both for TCP and IPC protocols:1) In my tests, ZMQ never dropped any messages despite HWM=1. This was the case even when the processing nodes were made to run much slower than 30 msg/sec.2) Total transmission latency (time in ZMQ) is much greater with asynchronous networks than with synchronous ones. Both of these results are startling, but the second deserves some explanation. Remember I said that I was simulating processing time by sampling random sleep times from a normal distribution? Well, nodes with greater *variance* ("q" parameter) in their sleep times were much more likely to suffer from high transmission latency. (Again, the "transmission latency" does not include the sleep time itself; just the time on the wire.) This is a crazy result! One expects transmission latency to vary with things like message size and number of connections, but not the processing jitter. In the case of my real-time video processing system, this relationship between processing jitter and transmission latency wouldn't be a deal-breaker if it weren't for the *magnitude* of the resulting latency. We're consistently seeing upwards of 5 seconds of transmission latency on network paths that go via certain "jittery" system components, and this is easy to reproduce in my simulation. = Thanks! = I hope this little testing effort that I've undertaken proves useful to the project. I encourage everybody who is interested in QoS under ZMQ to run my code and check the results for themselves. I am also eager to hear from people that are using ZMQ under hard real-time constraints. Feedback and technical direction are appreciated. If this is a bug, I'm keen to squash it! And I am certainly hoping to continue with ZMQ on my real-time video project. Cheers!~Brian
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
