Re: [OMPI users] SGE segfaulting with OpenMPI 1.8.6

2015-07-23 Thread Ralph Castain
I believe qrsh will execute a simple env command across the allocated nodes - have you looked into that? The bottom line is that you simply are not getting the right orted on the remote nodes - you are getting the old one, which doesn’t recognize the new command line option that mpirun is givin

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Gilles Gouaillardet
Dave, On 7/24/2015 1:53 AM, Dave Love wrote: ompio in 1.8 only has pvfs2 (== orangefs) and ufs support -- which might be a good reason to use pvfs2. You'll need an expert to say if you can use ufs correctly over an nfs filesystem. (I assume you are actually picking up the romio nfs support.)

Re: [OMPI users] SGE segfaulting with OpenMPI 1.8.6

2015-07-23 Thread Dave Love
writes: > Since I don't have the hand on the cluster administration, and the > "default" installation of openMPI is an old one, Perhaps there's a good reason for that, as there is here. > I compiled and installed myself Open-MPI 1.8.6 and prepended paths > (general and library) to ensure the us

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Dave Love
"Schlottke-Lakemper, Michael" writes: > Hi Dave, > >> That's probably not a good idea. Have you read about NFS in the romio >> README? It's old, but as far as I know, it's still relevant for NFS3. >> Maybe Rob Latham will see this and be able to comment on NFS4. > No, are you referring to the f

Re: [OMPI users] SGE segfaulting with OpenMPI 1.8.6

2015-07-23 Thread m.delorme
hi, Thanks for the quick answer. I am actually using the module environment, and made my own module for openmpi-1.8.6 prepending the paths. I was so desperate to get the env right that I doubled everything : my script is running with the -V flag, I am loading the modules, and printing the en

Re: [OMPI users] SGE segfaulting with OpenMPI 1.8.6

2015-07-23 Thread John Hearns
You say that you can run the code OK 'by hand' with an mpirun. Are you assuming somehow that the Gridengine jobs will inherit your environment variables, paths etc? If I remember correctly, you should submit wiht the -V option to pass over environment settings. Even better, make sure that the jo

[OMPI users] SGE segfaulting with OpenMPI 1.8.6

2015-07-23 Thread m.delorme
Hello, I have been working on this problem for the last week, browsing the help and the mailing list with no success. While trying to run MPI programs using SGE, I end up with seg faults every time. A bit of information on the system : I am working on a 14 nodes cluster. Every node is an In

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
Hi Gilles, Thanks, and yes, you are right: ompi_info --all | grep "MCA io" | grep priority MCA io: parameter "io_romio_priority" (current value: "20", data source: default, level: 9 dev/all, type: int) MCA io: parameter "io_romio_delete_priority" (current value

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Gilles Gouaillardet
Michael, ROMIO is the default in the 1.8 series you can run ompi_info --all | grep io | grep priority ROMIO priority should be 20 and ompio priority should be 10. Cheers, Gilles On Thursday, July 23, 2015, Schlottke-Lakemper, Michael < m.schlottke-lakem...@aia.rwth-aachen.de> wrote: > Hi Gille

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
Hi Dave, > That's probably not a good idea. Have you read about NFS in the romio > README? It's old, but as far as I know, it's still relevant for NFS3. > Maybe Rob Latham will see this and be able to comment on NFS4. No, are you referring to the file openmpi-1.8.7/ompi/mca/io/romio/romio/README

[OMPI users] DMTCP checkpointing with openib

2015-07-23 Thread Dave Love
Does anyone have experience of checkpointing (and restarting!) OMPI 1.8 over openib using DMTCP, or just know whether it can work? A while ago I saw a some note saying it wouldn't work because of some OMPI mechanism that couldn't be configured (unreliable connexions?) but now that Sourceforge has

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Dave Love
"Schlottke-Lakemper, Michael" writes: > Hi folks, > > We are currently encountering a weird file coherence issue when > running parallel jobs with OpenMPI (1.8.7) and writing files in > parallel to an NFS-mounted file system using Parallel netCDF 1.6.1 > (which internally uses MPI-I/O). Sometimes

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
Hi Gilles, > are you running 1.8.7 or master ? 1.8.7. We recently upgraded our cluster installation from OpenSUSE 11.3/OpenMPI 1.6.5 to OpenSUSE 12.3/OpenMPI 1.8.7. Before the upgrade, we did not encounter this problem. > if not default, which io module are you running ? > (default is ROMIO wit

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Gilles Gouaillardet
Michael, are you running 1.8.7 or master ? if not default, which io module are you running ? (default is ROMIO with 1.8 but ompio with master) by any chance, could you post a simple program that evidences this issue ? Cheers, Gilles On Thursday, July 23, 2015, Schlottke-Lakemper, Michael < m.s

[OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Schlottke-Lakemper, Michael
Hi folks, We are currently encountering a weird file coherence issue when running parallel jobs with OpenMPI (1.8.7) and writing files in parallel to an NFS-mounted file system using Parallel netCDF 1.6.1 (which internally uses MPI-I/O). Sometimes (~30-40% of our samples) we get a file whose co

[OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-23 Thread Schlottke-Lakemper, Michael
Hi folks, recently we’ve been getting a Valgrind error in PMPI_Init for our suite of regression tests: ==5922== Invalid read of size 4 ==5922==at 0x61CC5C0: opal_os_dirpath_create (in /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2) ==5922==by 0x5F207E5: orte_session_dir (in /aia/opt