To explain point two, you can't easily impose an order of messages across multiple connections, even to the same peer. It's sort of a fundamental limit: the only reason a single ZMQ connection can provide in-order delivery is because it leans on TCP to correct the multiple delivery, out-of-order delivery, random bit-flips, lost segments, and other chaos that goes on at the IP level. If all your application needs to worry about is getting many messages to the other end as fast as possible then by all means open multiple connections, and similarly if you have urgent messages that need to be processed asynchronously and dodge head-of-line blocking in the high volume channel, but if you need to make sure your messages are processed in some global order you're better off using a single connection and not needing to reinvent half of TCP all over again.
You also can't use credit based flow control here. PUSH and PULL are unidirectional, you can't send credit from the PULL socket to the PUSH. Replace PUSH with DEALER and PULL with ROUTER and you can, but even then credit based flow control is about limiting the amount of messages/bytes/other unit of work currently in flight in the system. If your goal is solely to saturate a link then it's actually the opposite of what you want. Now. My experience is sort of at the extreme other end of the spectrum from your use case but if I were inclined to optimize for Maximum Fast, at this point two lines of investigation occur to me. First, what is happening at the network level once throughput hits that plateau? I'd take a pcap and then open it in wireshark after the test is over; look at the times payload-carrying packets go out and when the corresponding ACK packets come back. If they bunch up in any way, some tuning of TCP options at the socket option or possibly kernel sysctl level may be called for. Secondly, where is that CPU time actually being spent? Intuitively I expect this would take more effort to bear fruit which is why I'd save it for after poking at the network, but I'd make a flamegraph[1] (you can substitute your favorite profiler here) and look for hotspots I might be able to optimize. Past the threshold where cpu decreases things get harder, since that looks like time waiting on locks or hardware starts to dominate. There's ways to interrogate waits into and past the kernel, but I've never had to do it so I can't tell you how painful it might be. Good luck, friend. [1]: https://github.com/brendangregg/FlameGraph On Thu, Oct 24, 2019, 9:38 PM Brett Viren via zeromq-dev < zeromq-dev@lists.zeromq.org> wrote: > Hi again, > > Doron Somech <somdo...@gmail.com> writes: > > > You need to create multiple connections to enjoy the multiple io threads. > > > > So in the remote/local_thr connect to the same endpoint 100 times and > create 10 io > > threads. > > > > You don't need to create multiple sockets, just call connect multiple > times with same > > address. > > I keep working on evaluating ZeroMQ against this 100 Gbps network when I > get a spare moment. You can see some initial results in the attached > PNG. As-is, things look pretty decent but there are two effects I see > which I don't fully understand and I think are important to achieving > something closer to saturation. > > 1) As you can see in the PNG, as the message size increases beyond 10 kB > the 10 I/O threads become less and less active. This activity seems > correlated with throughput. But, why the die-off as we go to higher > message size and why the resurgence at ~1 MB? Might there be some > additional tricks to lift the throughput? Are such large messages > simply not reasonable? > > 2) I've instrumented my tester to include a sequence count in each > message and it uncovers that this multi-thread/multi-connect trick may > lead to messages arriving to the receiver out-of-order. Given PUSH is > round-robin and PULL is fair queued, I naively didn't expect this. But > seeing it, I have two guesses. 1) I don't actually know what > "fair-queued" really means :) and 2) if a mute state is getting hit then > maybe all bets are off. I do wonder if adding "credit based" transfers > might solve this ordering. Eg, if N credits (or fewer) are used given N > connects, might the round-robin/fair-queue ordering stay in lock step? > > > Any ideas are welcome. Thanks! > > -Brett. > > _______________________________________________ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > https://lists.zeromq.org/mailman/listinfo/zeromq-dev >
_______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org https://lists.zeromq.org/mailman/listinfo/zeromq-dev