Hi Martin, > > So we have 11000 bytes transferred in both cases. > > > > What's the flow in the above reasoning? > > Err, you caught me being a doink.. :) I'm mixing up some of the > historic systems I've used and not sticking to the raw tcp stream portion > of > this. It's tough to keep them separate after a number of years getting > used > to the higher level wrappers. I'm headed home now so will follow up in a > bit, there is still a benefit to be had to the virtual connections, just > at > a different point.
So, I went ahead and spent a bit of time doing an experiment and making sure I was not mixing up old items with 0mq.. As mentioned, one of my tests was pushing about 200mb's a second which is unacceptable. I switched off the no_delay options on the sockets and that cut the bandwidth down to about 150mb/s, not a bad savings, but still not acceptable. So, I went a bit further in order to look at the system in regards to the multiple connection sharing. I can't do this in the real cluster as I don't know if the services will end up on the same machine in real code. What I did for the test though was to inproc transfer all the messages over to a single thread which went down a single push pipe to the unit test fixture. Given this use case and how the data is generated, the bandwidth dropped to a level comparable to my old implementation, 75-80mb/s. So, latency being part of this, I'll just post my results as things get interesting: Initial conversion: 180mb/s rms, 22mb/s deviation. 42ms latency rms, 17ms deviation. Nagle enabled: 151mb/s rms, 35mb/s deviation. 41ms latency rms, 14ms deviation. Single connection: 72mb/s rms, 5mb/s deviation. 31ms latency rms, 7ms deviation. I ran the tests for 10 minutes each 3 times in a row since I didn't quite believe the numbers the first time, but they were within 5% each run so I'm fairly certain this is an accurate representation. Let me explain the type of data I'm sending and then I'll leave it there as I want to go back and check some of the measurements and make sure I'm not somehow getting completely invalid numbers out of the experiment. The data being sent is usually one 32 bit flag and two 32bit id's, so 12 bytes. The data in question is stored in a quadtree with 2 divisions. Each node which is on the outer surface of the quadtree replicates data to another quadtree node, currently simulated by the test fixture. So, in this case each face of the cube has 16 connections for a total of 96 connections. Note, the "real" system doesn't keep an entire hierarchy on a single machine, the division level is dynamic and moves heavily utilized portions to other machines where they can be subdivided further or merged as pressure is removed. The test simply keeps the entire thing on one machine and forces it to 2 divisions. Also note, I'm running the system at 10x normal rate, the same item should actually only push about 7mb/s second (using the shared connection case) in real world usage. That's even with all 800,000 data points in the system, fairly low bandwidth for the amount of data. So, looking at the above, I would "guess" that I'm probably sending a lot of single entry messages, so packet wastage was 12/40 = ~300%. With nagle, I benefited a little bit in terms that when multiple messages went out, they merged into a single packet, but as expected, most of the data was on different connections so rarely shared the packets. The latency numbers were a bit odd though, one of the things I want to check. The single connection case was of course the big win as there is not a "lot" of data per connection, just a lot of data in aggregate. I won't analyze this any further till I get home and do some more testing, but it should give you an idea of why I like the connection sharing ability of the other libraries. Kb _______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev