Michael,
Could you try to run this again with "--mca mpi_leave_pinned 0" parameter?
I suspect that this might be due to a message size problem - MPI
tries to do RDMA with a message bigger than what HCA supports.
-- YK
On 11-Apr-11 7:44 PM, Michael Di Domenico wrote:
> Here's a chunk of code that
Here's a chunk of code that reproduces the error everytime on my cluster
If you call it with $((2**24)) as a parameter it should run fine, change it
to $((2**27)) and it will stall
On Tue, Apr 5, 2011 at 11:24 AM, Terry Dontje wrote:
> It was asked during the community concall whether the below
It was asked during the community concall whether the below may be
related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722?
--td
On 04/04/2011 10:17 PM, David Zhang wrote:
Any error messages? Maybe the nodes ran out of memory? I know MPI
implement some kind of buffering under
There are no messages being spit out, but i'm not sure i have all the
correct debugs turn on. I turned on -debug-devel -debug-daemons and
mca_verbose. but it appears that the process just hangs.
If it's memory exhaustion its not from the core memory these nodes
have 48GB of memory, it could be a
Any error messages? Maybe the nodes ran out of memory? I know MPI
implement some kind of buffering under the hood, so even though you're
sending array's over 2^26 in size, it may require more than that for MPI to
actually send it.
On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico
wrote:
> Has
Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
messages over 2^26 in size?
For a reason i have not determined just yet machines on my cluster
(OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
array's over 2^26 in size via the AllToAll collective. (user code)