Re: [OMPI users] alltoall messages > 2^26

2011-05-29 Thread Yevgeny Kliteynik
Michael, Could you try to run this again with "--mca mpi_leave_pinned 0" parameter? I suspect that this might be due to a message size problem - MPI tries to do RDMA with a message bigger than what HCA supports. -- YK On 11-Apr-11 7:44 PM, Michael Di Domenico wrote: > Here's a chunk of code that

Re: [OMPI users] alltoall messages > 2^26

2011-04-11 Thread Michael Di Domenico
Here's a chunk of code that reproduces the error everytime on my cluster If you call it with $((2**24)) as a parameter it should run fine, change it to $((2**27)) and it will stall On Tue, Apr 5, 2011 at 11:24 AM, Terry Dontje wrote: > It was asked during the community concall whether the below

Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Terry Dontje
It was asked during the community concall whether the below may be related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722? --td On 04/04/2011 10:17 PM, David Zhang wrote: Any error messages? Maybe the nodes ran out of memory? I know MPI implement some kind of buffering under

Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Michael Di Domenico
There are no messages being spit out, but i'm not sure i have all the correct debugs turn on. I turned on -debug-devel -debug-daemons and mca_verbose. but it appears that the process just hangs. If it's memory exhaustion its not from the core memory these nodes have 48GB of memory, it could be a

Re: [OMPI users] alltoall messages > 2^26

2011-04-04 Thread David Zhang
Any error messages? Maybe the nodes ran out of memory? I know MPI implement some kind of buffering under the hood, so even though you're sending array's over 2^26 in size, it may require more than that for MPI to actually send it. On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico wrote: > Has

[OMPI users] alltoall messages > 2^26

2011-04-04 Thread Michael Di Domenico
Has anyone seen an issue where OpenMPI/Infiniband hangs when sending messages over 2^26 in size? For a reason i have not determined just yet machines on my cluster (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send array's over 2^26 in size via the AllToAll collective. (user code)