Re: [OMPI devel] OFED question

2011-01-27 Thread Paul H. Hargrove
RFE: Could OMPI implement a short-hand for Pasha's following magical incantation? On 1/27/2011 5:34 PM, Shamis, Pavel wrote: --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 -- Paul H. Hargrove phhargr...@lb

Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
Good point Paul. I love XRC :-) You may try to switch default configuration to XRC. --mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 If XRC is not supported on your platform, ompi should report some nice message. BTW, on multi core syste

Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
Brain, I would calculate maximum number of qps for all-to-all connection: 4*num_nodes*num_cores^2 And then compare it to the number reported by : ibv_devinfo -v | grep max_qp If your theoretical maximum is close to ib_devinfo number, then I would suspect the qp limitation. Driver manage some inter

Re: [OMPI devel] OFED question

2011-01-27 Thread Paul H. Hargrove
Brian, As Pasha said: The maximum amount of supported qps you may see in ibv_devinfo. However you'll probably need "-v": {hargrove@cvrsvc05 ~}$ ibv_devinfo | grep max_qp: {hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep max_qp: max_qp: 261056 If you really are run

Re: [OMPI devel] OFED question

2011-01-27 Thread Barrett, Brian W
Pasha - Is there a way to tell which of the two happened or to check the number of QPs available per node? The app likely does talk to a large number of peers from each process, and the nodes are fairly "fat" - it's quad socket, quad core and they are running 16 MPI ranks for each node. Bri

Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
Unfortunately verbose error reports are not so friendly...anyway , I may think about 2 issues: 1. You trying to open open too much QPs. By default ib devices support fairly large amount of QPs and it is quite hard to push it to this corner. But If your job is really huge it may be the case. Or

Re: [OMPI devel] OFED question

2011-01-27 Thread Paul H. Hargrove
Brian, The ability to control the number of available QPs will vary by vendor. Unless things have changed in recent years, Mellanox's firmware tools allow one the modify the limit but at the inconvenience of reburning the firmware. I know of no other way and know nothing about other vendo

[OMPI devel] OFED question

2011-01-27 Thread Barrett, Brian W
All - On one of our clusters, we're seeing the following on one of our applications, I believe using Open MPI 1.4.3: [xxx:27545] *** An error occurred in MPI_Scatterv [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 [xxx:27545] *** MPI_ERR_OTHER: known error not in list [xxx:27545]

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-27 Thread Joshua Hursey
I believe that this is now fixed on the trunk. All the details are in the commit message: https://svn.open-mpi.org/trac/ompi/changeset/24317 In my testing yesterday, I did not test the scenario where the node with mpirun also contains processes (the test cluster I was using does not by default

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-27 Thread Hugo Meyer
Hi Josh. Thanks for your reply. I'll tell you what i'm getting now from the executions in the next lines. When i run without doing a checkpoint i get this output, and the process don' finish: *[hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun -np 2 -am ft-enable-cr-reco