nitpick, but isn't zmq_init() the one that's deprecated, and zmq_ctx_new() its replacement?
2013/1/9 A. Mark <gougol...@gmail.com> > Good guess, but I'm using this one: > > ctx = zmq_init( threads); > > from http://api.zeromq.org/3-2:zmq-init > > > with the number of threads parameter passed as a command line argument to > the test programs. I assume it should have the same effect as zmq_ctx_set() > since zmq_ctx_new is deprecated. So I've tried different number of threads > on each end but it doesn't seem to get better performance with more > threads. BTW to be precise I have 2 command line arguments to client_zmq: > > > usage: client_zmq <connect-to> <message-size> <message-count> > <zmq-threads> <workers> > > > So I can set the internal zmq threads as well as how many workers threads > to spawn in the client. > > In server_zmq I can only set the zmq-threads of course. > > usage: server_zmq <connect-to> <message-size> <message-count> <zmq-threads> > > > And yes I'm using the same context in the programs. > > > > On Tue, Jan 8, 2013 at 7:06 PM, Apostolis Xekoukoulotakis < > xekou...@gmail.com> wrote: > >> Just guessing here. Are you using the same context in all threads and if >> so, maybe you need to increase the threads that the omq uses inside it. >> http://api.zeromq.org/3-2:zmq-ctx-set >> >> >> >> 2013/1/9 A. Mark <gougol...@gmail.com> >> >>> OK, so I went back and I fixed a couple of issues and reattached the >>> two modified test programs, added RCV/SND buffer shaping and now it uses >>> zmq_msg_init_data (zero-copy) for better performance. I'm getting about >>> 2.5GB/s avg at best which is a lot better then with remote_thr local_thr >>> but still a 25% less then what I'm expecting at least 3.4GB/s. >>> >>> When I initiate 4 simultaneous procesess(not threads) for each client >>> and server via separate ports the total does add up to ~3.3GB/s as it >>> should. The trouble is for that to work that way I need to bind 4 ports and >>> the whole point in using accept is to have multiple connections on the same >>> port traditionally. >>> >>> Is there a way to achieve the desired throughput via 0MQ without using >>> separate ports for each socket? I think using multiple connections (via >>> separate threads) on the same ZMQ socket should naturally do it but >>> according to the results it doesn't happen. >>> >>> >>> >>> >>> >>> On Mon, Jan 7, 2013 at 7:16 PM, A. Mark <gougol...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> I'm very interested in porting my current transfer engine to 0MQ. The >>>> current engine is written in pure BSD sockets and has certain limitations >>>> that would be easily overcome by QMQ's intelligent and versatile design. >>>> However my main concern is performance on very long messages in access of >>>> 1MB. The current backbone MT design is the following: >>>> >>>> >>>> control node (client ) <---> server A--- worker node 1 <---> worker >>>> node 1 ------ server B >>>> >>>> | >>>> | >>>> |------------ worker node 2 >>>> <---> worker node 2 -----------| >>>> >>>> | >>>> | >>>> --------------worker node N >>>> <---> worker node N ---------- >>>> >>>> So the control client controls whatever task needs to be performed by >>>> submitting requests to a server, the actual work is done by the worker >>>> nodes in each separate thread on the server. The worker nodes are >>>> synchronized across the two servers but they work independently since they >>>> are working on the same task. Each worker node has it's own FD but connect >>>> to the same TCP address and port. The main task of each node is to perform >>>> some transformation on some large data buffer from a buffer pool then push >>>> the finished result to the other server. My current benchmarks gives me >>>> 3.5GBytes/s using TCP over the local loop when simply pushing the buffers >>>> without doing any work. >>>> >>>> I ran the 0MQ benchmarks local_thr and remote_thr, and the performance >>>> is only 1.5GB/s at best, with large buffers(messages) and lower with small >>>> ones. I'm also concerned looking at the benchmarks for the 10GE test. My >>>> current engine can perform at a steady 1.1GBytes/s with large buffers over >>>> 10GE. >>>> >>>> I've also tried a modified version of the two benchmarks to try to >>>> emulate the above situation, but the performance is about the same. The >>>> modified MT code is attached. >>>> >>>> Is there something else I need to do to get the best performance out of >>>> 0MQ using MT for this work flow engine? >>>> >>>> >>> >>> _______________________________________________ >>> zeromq-dev mailing list >>> zeromq-dev@lists.zeromq.org >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >>> >> >> >> -- >> >> >> Sincerely yours, >> >> Apostolis Xekoukoulotakis >> >> >> _______________________________________________ >> zeromq-dev mailing list >> zeromq-dev@lists.zeromq.org >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> > > _______________________________________________ > zeromq-dev mailing list > zeromq-dev@lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > >
_______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev