Re: [OMPI devel] Intermittent MPI issues with torque/maui

2014-08-26 Thread Ralph Castain
On Aug 26, 2014, at 6:09 PM, Andrej Prsa wrote: > Hi Ralph, > >> I don't know what version of OMPI you're working with, so I can't >> precisely pinpoint the line in question. However, it looks likely to >> be an error caused by not finding the PBS nodefile. > > This is openmpi 1.6.5. > >> We

Re: [OMPI devel] Intermittent MPI issues with torque/maui

2014-08-26 Thread Andrej Prsa
Hi Ralph, > I don't know what version of OMPI you're working with, so I can't > precisely pinpoint the line in question. However, it looks likely to > be an error caused by not finding the PBS nodefile. This is openmpi 1.6.5. > We look in the environment for PBS_NODEFILE to find the directory >

Re: [OMPI devel] Intermittent MPI issues with torque/maui

2014-08-26 Thread Ralph Castain
I don't know what version of OMPI you're working with, so I can't precisely pinpoint the line in question. However, it looks likely to be an error caused by not finding the PBS nodefile. We look in the environment for PBS_NODEFILE to find the directory where the file should be found, and then l

[OMPI devel] Intermittent MPI issues with torque/maui

2014-08-26 Thread Andrej Prsa
Hi all, I asked this question on the torque mailing list, and I found several similar issues on the web, but no definitive solutions. When we run our MPI programs via torque/maui, at random times, in ~50-70% of all cases, the job will fail with the following error message: [node1:51074] [[36074,0