[OMPI users] MPI_Alltoall problem: error creating qp

2009-08-21 Thread Shao-Ching Huang
Hi We are getting the following kind of error messages when trying to run MPI_alltoall on 170 nodes with slots=8 on each node (i.e. 170*8=1360 MPI processes in total): $ mpiexec -n 1360 -hostfile ./mach.8 ./a.out ...

Re: [OMPI users] RETRY EXCEEDED ERROR status number 12

2009-08-21 Thread Pavel Shamis (Pasha)
You may try to use ibdiagnet tool: http://linux.die.net/man/1/ibdiagnet The tool is part of OFED (http://www.openfabrics.org/) Pasha. Prentice Bisbal wrote: Several jobs on my cluster just died with the error below. Are there any IB/Open MPI diagnostics I should use to diagnose, should I

[OMPI users] RETRY EXCEEDED ERROR status number 12

2009-08-21 Thread Prentice Bisbal
Several jobs on my cluster just died with the error below. Are there any IB/Open MPI diagnostics I should use to diagnose, should I just reboot the nodes, or should I have the user who submitted these jobs just increase the retry count/timeout paramters?

Re: [OMPI users] Blocking communication a thread better thenasynchronous progress?

2009-08-21 Thread tomek
Is doing blocking communication in a separate thread better then asynchronous progress? (At least as a workaround until the proper implementation gets improved) At the moment, yes. OMPI's asynchronous progress is "loosely tested" (at best). OMPI's threading support is somewhat

Re: [OMPI users] MPI loop problem

2009-08-21 Thread Julia He
Thank you very much for your help. Julia --- On Wed, 8/19/09, Eugene Loh wrote: From: Eugene Loh Subject: Re: [OMPI users] MPI loop problem To: "Open MPI Users" List-Post: users@lists.open-mpi.org Date: Wednesday, August 19, 2009,

[OMPI users] Solution for an old compilation bug

2009-08-21 Thread Robert Schöne
Hello, (I don't know whether this should have been sent to the dev-list, but the last time this error occured, it was posted to the users-list, so I'm doing it too.) The last days I had problems compiling OpenMPI on a Debian and a SuSE Linux. The bug had been already reported in 2007.