RFE: Could OMPI implement a short-hand for Pasha's following magical
incantation?
On 1/27/2011 5:34 PM, Shamis, Pavel wrote:
--mca btl_openib_receive_queues
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
--
Paul H. Hargrove phhargr...@lb
Good point Paul.
I love XRC :-)
You may try to switch default configuration to XRC.
--mca btl_openib_receive_queues
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
If XRC is not supported on your platform, ompi should report some nice message.
BTW, on multi core syste
Brain,
I would calculate maximum number of qps for all-to-all connection:
4*num_nodes*num_cores^2
And then compare it to the number reported by : ibv_devinfo -v | grep max_qp
If your theoretical maximum is close to ib_devinfo number, then I would suspect
the qp limitation. Driver manage some inter
Brian,
As Pasha said:
The maximum amount of supported qps you may see in ibv_devinfo.
However you'll probably need "-v":
{hargrove@cvrsvc05 ~}$ ibv_devinfo | grep max_qp:
{hargrove@cvrsvc05 ~}$ ibv_devinfo -v | grep max_qp:
max_qp: 261056
If you really are run
Pasha -
Is there a way to tell which of the two happened or to check the number of QPs
available per node? The app likely does talk to a large number of peers from
each process, and the nodes are fairly "fat" - it's quad socket, quad core and
they are running 16 MPI ranks for each node.
Bri
Unfortunately verbose error reports are not so friendly...anyway , I may think
about 2 issues:
1. You trying to open open too much QPs. By default ib devices support fairly
large amount of QPs and it is quite hard to push it to this corner. But If your
job is really huge it may be the case. Or
Brian,
The ability to control the number of available QPs will vary by
vendor. Unless things have changed in recent years, Mellanox's firmware
tools allow one the modify the limit but at the inconvenience of
reburning the firmware. I know of no other way and know nothing about
other vendo
All -
On one of our clusters, we're seeing the following on one of our applications,
I believe using Open MPI 1.4.3:
[xxx:27545] *** An error occurred in MPI_Scatterv
[xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4
[xxx:27545] *** MPI_ERR_OTHER: known error not in list
[xxx:27545]
I believe that this is now fixed on the trunk. All the details are in the
commit message:
https://svn.open-mpi.org/trac/ompi/changeset/24317
In my testing yesterday, I did not test the scenario where the node with mpirun
also contains processes (the test cluster I was using does not by default
Hi Josh.
Thanks for your reply. I'll tell you what i'm getting now from the
executions in the next lines.
When i run without doing a checkpoint i get this output, and the process
don' finish:
*[hmeyer@clus9 whoami]$
/home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun -np 2 -am
ft-enable-cr-reco
10 matches
Mail list logo