Hi Ralph, Thanks for taking time to look into my problem. As you can see , it happens when i dont have both exe available on both nodes. When it's the case (test3) , it works. I dont know if my particular libdir causes the problem or not but I 'll try on Monday with a more classical setup.
I ll keep you inform. Geoffroy > > HI Geoffrey > > Hmmm....well, I redid my tests to mirror yours, and still cannot > replicate this problem. I tried it with both slurm and ssh > environments - no difference in the results. > > % make hello > > % cp hello hello2 > > % ls > hello hello2 > > % mpirun -n 1 -host odin038 ./hello : -n 1 -host odin039 ./hello2 > Hello World, I am 0 of 2 > Hello World, I am 1 of 2 > > I have tried a variety of combinations, including giving a fake > executable as one of the apps, and have not been able to replicate > your observed behavior. In all cases, it works correctly. > > It looks like you are using rsh/ssh as you launch environment. All I > can advise at this stage is to again check to ensure that > the .login/.cshrc (or whatever) on your remote nodes isn't setting > your path to point at another OMPI installation. The fact that you can > run at all would seem to indicate that things are okay, but I honestly > have no ideas at this stage as to why you are seeing this behavior. > > Sorry I can't be of more help... > Ralph > > On Jan 23, 2009, at 12:57 AM, Geoffroy Pignot wrote: > > > Hello > > > > I redid few tests with my hello world , here are my results. > > > > First of all my config : > > configure --prefix=/tmp/openmpi-1.3 --libdir=/tmp/openmpi-1.3/lib64 > > --enable-heterogeneous . you will find attached my ompi_info -param > > all all > > compil02 and compil03 are identical Rh43 64 bits nodes. > > > > Test 1 : > > compil02% ls /tmp > > a.out openmpi-1.3 > > > > compil03% ls /tmp > > a.out openmpi-1.3 > > > > /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/a.out : -n 1 > > -host compil02 /tmp/a.out > > WORKS > > > > Test 2 : > > compil02% mv a.out a.out_64 ; ls /tmp > > a.out_64 openmpi-1.3 > > > > compil03% ls /tmp > > a.out openmpi-1.3 > > > > compil03% /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/ > > a.out : -n 1 -host compil02 /tmp/a.out_64 > > [compil03:03774] procdir: /tmp/openmpi-sessions- > > gpignot@compil03_0/20717/0/0 > > [compil03:03774] jobdir: /tmp/openmpi-sessions- > > gpignot@compil03_0/20717/0 > > [compil03:03774] top: openmpi-sessions-gpignot@compil03_0 > > [compil03:03774] tmp: /tmp > > [compil03:03774] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/ > > jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6- > > x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/ > > lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/ > > bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/ > > TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/ > > Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin > > [compil03:03774] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/ > > lib64:/tmp/openmpi-1.3/lib64 > > [compil02:10684] procdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20717/0/1 > > [compil02:10684] jobdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20717/0 > > [compil02:10684] top: openmpi-sessions-gpignot@compil02_0 > > [compil02:10684] tmp: /tmp > > [compil03:03774] [[20717,0],0] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil03:03774] [[20717,0],0] node[1].name compil02 daemon 1 arch > > ffc91200 > > [compil02:10684] [[20717,0],1] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil02:10684] [[20717,0],1] node[1].name compil02 daemon 1 arch > > ffc91200 > > [compil03:03774] Info: Setting up debugger process table for > > applications > > MPIR_being_debugged = 0 > > MPIR_debug_state = 1 > > MPIR_partial_attach_ok = 1 > > MPIR_i_am_starter = 0 > > MPIR_proctable_size = 2 > > MPIR_proctable: > > (i, host, exe, pid) = (0, compil03, /tmp/a.out, 0) > > (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 0) > > > > HANGS : both exe have pid 0 > > > > Test 3 : > > > > compil02% cp a.out_64 a.out ; ls /tmp > > a.out_64 a.out openmpi-1.3 > > > > compil03% ls /tmp > > a.out openmpi-1.3 > > > > [compil03:03777] procdir: /tmp/openmpi-sessions- > > gpignot@compil03_0/20626/0/0 > > [compil03:03777] jobdir: /tmp/openmpi-sessions- > > gpignot@compil03_0/20626/0 > > [compil03:03777] top: openmpi-sessions-gpignot@compil03_0 > > [compil03:03777] tmp: /tmp > > [compil03:03777] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/ > > jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6- > > x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/ > > lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/ > > bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/ > > TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/ > > Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin > > [compil03:03777] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/ > > lib64:/tmp/openmpi-1.3/lib64 > > [compil02:10786] procdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20626/0/1 > > [compil02:10786] jobdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20626/0 > > [compil02:10786] top: openmpi-sessions-gpignot@compil02_0 > > [compil02:10786] tmp: /tmp > > [compil03:03777] [[20626,0],0] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil03:03777] [[20626,0],0] node[1].name compil02 daemon 1 arch > > ffc91200 > > [compil02:10786] [[20626,0],1] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil02:10786] [[20626,0],1] node[1].name compil02 daemon 1 arch > > ffc91200 > > [compil03:03777] Info: Setting up debugger process table for > > applications > > MPIR_being_debugged = 0 > > MPIR_debug_state = 1 > > MPIR_partial_attach_ok = 1 > > MPIR_i_am_starter = 0 > > MPIR_proctable_size = 2 > > MPIR_proctable: > > (i, host, exe, pid) = (0, compil03, /tmp/a.out, 0) > > (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 10787) > > [compil02:10787] procdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20626/1/1 > > [compil02:10787] jobdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20626/1 > > [compil02:10787] top: openmpi-sessions-gpignot@compil02_0 > > [compil02:10787] tmp: /tmp > > [compil02:10787] [[20626,1],1] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil02:10787] [[20626,1],1] node[1].name compil02 daemon 1 arch > > ffc91200 > > > > HANGS : go a little bit further but still one pid = 0 > > > > Test4: > > > > compil02% ls /tmp > > a.out_64 a.out openmpi-1.3 > > > > compil03% cp a.out a.out_64 ; ls /tmp > > a.out_64 a.out openmpi-1.3 > > > > compil03% /tmp/openmpi-1.3/bin/mpirun -d -n 1 -host compil03 /tmp/ > > a.out : -n 1 -host compil02 /tmp/a.out_64 > > [compil03:03789] procdir: /tmp/openmpi-sessions- > > gpignot@compil03_0/20638/0/0 > > [compil03:03789] jobdir: /tmp/openmpi-sessions- > > gpignot@compil03_0/20638/0 > > [compil03:03789] top: openmpi-sessions-gpignot@compil03_0 > > [compil03:03789] tmp: /tmp > > [compil03:03789] mpirun: reset PATH: /tmp/openmpi-1.3/bin:/u/gpignot/ > > jobmgr/bin:.:/cgg/lv5000/jobmgr/bin:/cgg/lv5000/jobmgr/exec/Linux2.6- > > x86_64/PIV:/cgg/jobmgr/bin:/cgg/jobmgr/exec/Linux2.6-x86_64/PIV:/cgg/ > > lv5000/bin:/cgg/lv5000/exec/Linux2.6-x86_64/PIV:/cgg/util:/bin:/usr/ > > bin:/usr/sbin:/etc:/usr/etc:/usr/local/bin:/usr/bin/X11:/nfs/softs/ > > TOOLS/bin:/nfs/netapp1/DEVTOOLS/bin:/nfs/netapp1/DEVTOOLS/free/ > > Linux2.6-x86_64/bin:/cgg/localdev:/cgg/Applis/bin > > [compil03:03789] mpirun: reset LD_LIBRARY_PATH: /tmp/openmpi-1.3/ > > lib64:/tmp/openmpi-1.3/lib64 > > [compil02:10937] procdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20638/0/1 > > [compil02:10937] jobdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20638/0 > > [compil02:10937] top: openmpi-sessions-gpignot@compil02_0 > > [compil02:10937] tmp: /tmp > > [compil03:03789] [[20638,0],0] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil03:03789] [[20638,0],0] node[1].name compil02 daemon 1 arch > > ffc91200 > > [compil02:10937] [[20638,0],1] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil02:10937] [[20638,0],1] node[1].name compil02 daemon 1 arch > > ffc91200 > > [compil03:03789] Info: Setting up debugger process table for > > applications > > MPIR_being_debugged = 0 > > MPIR_debug_state = 1 > > MPIR_partial_attach_ok = 1 > > MPIR_i_am_starter = 0 > > MPIR_proctable_size = 2 > > MPIR_proctable: > > (i, host, exe, pid) = (0, compil03, /tmp/a.out, 3792) > > (i, host, exe, pid) = (1, compil02, /tmp/a.out_64, 10938) > > [compil03:03792] procdir: /tmp/openmpi-sessions- > > gpignot@compil03_0/20638/1/0 > > [compil03:03792] jobdir: /tmp/openmpi-sessions- > > gpignot@compil03_0/20638/1 > > [compil03:03792] top: openmpi-sessions-gpignot@compil03_0 > > [compil03:03792] tmp: /tmp > > [compil03:03792] [[20638,1],0] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil03:03792] [[20638,1],0] node[1].name compil02 daemon 1 arch > > ffc91200 > > [compil02:10938] procdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20638/1/1 > > [compil02:10938] jobdir: /tmp/openmpi-sessions- > > gpignot@compil02_0/20638/1 > > [compil02:10938] top: openmpi-sessions-gpignot@compil02_0 > > [compil02:10938] tmp: /tmp > > [compil02:10938] [[20638,1],1] node[0].name compil03 daemon 0 arch > > ffc91200 > > [compil02:10938] [[20638,1],1] node[1].name compil02 daemon 1 arch > > ffc91200 > > Hello world from process 0 of 2 > > Hello world from process 1 of 2 > > [compil03:03792] sess_dir_finalize: proc session dir not empty - > > leaving > > [compil02:10938] sess_dir_finalize: proc session dir not empty - > > leaving > > [compil03:03789] sess_dir_finalize: proc session dir not empty - > > leaving > > [compil02:10937] sess_dir_finalize: proc session dir not empty - > > leaving > > [compil03:03789] sess_dir_finalize: job session dir not empty - > > leaving > > [compil02:10937] sess_dir_finalize: job session dir not empty - > > leaving > > [compil03:03789] sess_dir_finalize: proc session dir not empty - > > leaving > > orterun: exiting with status 0 > > > > WORKS PERFECTLY > > > > > > I dont understand exactly what is going on , but I am not sure that > > this behavoiur is considered as normal > > > > Thanks in advance for your comments > > > > Geoffroy > > > > > > > > <geoffroy_ompi_info>_______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > End of users Digest, Vol 1127, Issue 8 > ************************************** >