Siegmar -- I'm a bit confused by your final table:
> local machine | -host > | sunpc1 | linpc1 | rs1 > -----------------------------+--------+--------+------- > sunpc1 (Solaris 10, x86_64) | ok | hangs | hangs > linpc1 (Solaris 10, x86_64) | hangs | ok | ok > rs1 (Solaris 10, sparc) | hangs | ok | ok Is linpc1 a Linux machine or Solaris machine? Ralph and I talked about this on the phone, and it seems like sunpc1 is just wrong somehow -- it just doesn't jive with the error message you sent. Can you verify that all 3 versions were built exactly the same way (e.g., debug or not debug)? On May 29, 2013, at 10:31 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hello Ralph, > >> Could you please clarify - are you mixing 32 and 64 bit versions >> in your runs that have a problem? > > No, I have four different versions on each machine. > > tyr fd1026 1250 ls -ld /usr/local/openmpi-1.6.5_* > drwxr-xr-x 7 root root 512 May 23 14:00 /usr/local/openmpi-1.6.5_32_cc > drwxr-xr-x 7 root root 512 May 23 13:55 /usr/local/openmpi-1.6.5_32_gcc > drwxr-xr-x 7 root root 512 May 23 10:12 /usr/local/openmpi-1.6.5_64_cc > drwxr-xr-x 7 root root 512 May 23 12:21 /usr/local/openmpi-1.6.5_64_gcc > > "/usr/local" is a link to machine specific files on a NFS server. > > lrwxrwxrwx 1 root root 25 Jan 10 07:47 local -> /export2/prog/SunOS_sparc > lrwxrwxrwx 1 root root 26 Oct 5 2012 local -> /export2/prog/SunOS_x86_64 > ... > > I can choose a package in my file "$HOME/.cshrc". > > tyr fd1026 1251 more .cshrc > ... > #set MPI = openmpi-1.6.5_32_cc > #set MPI = openmpi-1.6.5_32_gcc > #set MPI = openmpi-1.6.5_64_cc > #set MPI = openmpi-1.6.5_64_gcc > ... > source /opt/global/cshrc > ... > > > "/opt/global/cshrc" determines the processor architecture and operating > system and calls package specific initialization files. > > tyr fd1026 1258 more /opt/global/mpi.csh > ... > case openmpi-1.6.5_32_cc: > case openmpi-1.6.5_32_gcc: > case openmpi-1.6.5_64_cc: > case openmpi-1.6.5_64_gcc: > ... > if (($MPI == openmpi-1.7_32_cc) || ($MPI == openmpi-1.9_32_cc) || \ > ($MPI == ompi-java_32_cc) || ($MPI == ompi-java_32_gcc) || \ > ($MPI == openmpi-1.7_32_gcc) || ($MPI == openmpi-1.9_32_gcc)) then > if ($JDK != jdk1.7.0_07-32) then > echo " " > echo "In '${MPI}' funktioniert 'mpijavac' nur mit" > echo "'jdk1.7.0_07-32'. Waehlen Sie bitte das entsprechende" > echo "Paket in der Datei '${HOME}/.cshrc' aus und melden Sie" > echo "sich ab und wieder an, wenn Sie 'mpiJava' benutzen" > echo "wollen." > echo " " > endif > endif > ... > setenv OPENMPI_HOME ${DIRPREFIX_PROG}/$MPI > ... > set path = ( $path ${OPENMPI_HOME}/bin ) > ... > > Sorry for the german message in my shell script, but mpi.csh sets > all necessary environment variables for the selected package. I > must logout and login again, if I select a different package in > "$HOME/.cshrc", so that I never mix environments for different > packages, because my home directory and "/opt/global" are the > same on all machines (they are provided via an NFS server). > > >> If that isn't the case, then the error message is telling you that >> the system thinks you are mixing optimized and debug versions - >> i.e., one node is using an optimized version of OMPI and another >> is using a debug version. This also isn't allowed. > > I build my packages with copy-paste from a file. All configure > commands use "--enable-debug" (three different architectures with > two different compilers each). > > tyr openmpi-1.6.5 1263 grep -- enable-debug README-OpenMPI-1.6.5 > --enable-debug \ > --enable-debug \ > --enable-debug \ > --enable-debug \ > --enable-debug \ > --enable-debug \ > tyr openmpi-1.6.5 1264 > > >> If you check and find those two conditions are okay, then I suspect >> you are hitting the Solaris "bit rot" problem that we've talked >> about before - and are unlikely to be able to fix any time soon. > > sunpc1 hello_1 113 mpiexec -mca btl ^udapl -np 4 -host sunpc1 hello_1_mpi > Process 2 of 4 running on sunpc1 > ... > > > sunpc1 hello_1 114 mpiexec -mca btl ^udapl -np 4 -host linpc1 hello_1_mpi > [sunpc1:05035] [[4165,0],0] ORTE_ERROR_LOG: Buffer type (described vs > non-described) mismatch - operation not allowed in file > ../../../../../openmpi-1.6.5a1r28554/orte/mca/grpcomm/bad/grpcomm_bad_module.c > > at line 841 > ^Cmpiexec: killing job... > > > I get the following table, if I use every machine as local machine > and run my command on one of my hosts. > > > local machine | -host > | > | sunpc1 | linpc1 | rs1 > -----------------------------+--------+--------+------- > sunpc1 (Solaris 10, x86_64) | ok | hangs | hangs > linpc1 (Solaris 10, x86_64) | hangs | ok | ok > rs1 (Solaris 10, sparc) | hangs | ok | ok > > > > It seems that I have a problem with Solaris x86_64 and gcc-4.8.0, > if I use a 64-bit version of Open MPI. I have no problems with > Sun C and a 64-bit version of Open MPI or any 32-bit version of > Open MPI. Do you have any idea, what I can do to track the problem > and to get a solution? > > > Kind regards > > Siegmar > > > >> On May 24, 2013, at 12:02 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: >> >>> Hi >>> >>> I installed openmpi-1.6.5a1r28554 on "openSuSE Linux 12.1", "Solaris 10 >>> x86_64", and "Solaris 10 sparc" with gcc-4.8.0 and "Sun C 5.12" in 32- >>> and 64-bit versions. Unfortunately I have a problem with the 64-bit >>> version, if I build Open MPI with gcc. The program hangs and I have >>> to terminate it with <Ctrl-c>. >>> >>> >>> sunpc1 hello_1 144 mpiexec -mca btl ^udapl -np 4 \ >>> -host sunpc1,linpc1,rs0 hello_1_mpi >>> [sunpc1:15576] [[16182,0],0] ORTE_ERROR_LOG: Buffer type (described vs >>> non-described) mismatch - operation not allowed in file >>> > ../../../../../openmpi-1.6.5a1r28554/orte/mca/grpcomm/bad/grpcomm_bad_module.c >>> at line 841 >>> ^Cmpiexec: killing job... >>> >>> sunpc1 hello_1 145 which mpiexec >>> /usr/local/openmpi-1.6.5_64_gcc/bin/mpiexec >>> sunpc1 hello_1 146 >>> >>> >>> I have no problems with the 64-bit version, if I compile Open MPI >>> with Sun C. Both 32-bit versions (compiled with "cc" or "gcc") work >>> as expectedas well. >>> >>> sunpc1 hello_1 106 mpiexec -mca btl ^udapl -np 4 \ >>> -host sunpc1,linpc1,rs0 hello_1_mpi >>> Process 2 of 4 running on rs0.informatik.hs-fulda.de >>> Process 0 of 4 running on sunpc1 >>> Process 3 of 4 running on sunpc1 >>> Process 1 of 4 running on linpc1 >>> Now 3 slave tasks are sending greetings. >>> Greetings from task 3: >>> message type: 3 >>> msg length: 116 characters >>> message: >>> hostname: sunpc1 >>> operating system: SunOS >>> release: 5.10 >>> processor: i86pc >>> ... >>> >>> sunpc1 hello_1 107 which mpiexec >>> /usr/local/openmpi-1.6.5_64_cc/bin/mpiexec >>> >>> >>> >>> sunpc1 hello_1 106 mpiexec -mca btl ^udapl -np 4 \ >>> -host sunpc1,linpc1,rs0 hello_1_mpi >>> Process 2 of 4 running on rs0.informatik.hs-fulda.de >>> Process 3 of 4 running on sunpc1 >>> Process 0 of 4 running on sunpc1 >>> Process 1 of 4 running on linpc1 >>> ... >>> >>> sunpc1 hello_1 107 which mpiexec >>> /usr/local/openmpi-1.6.5_32_gcc/bin/mpiexec >>> >>> >>> I would be grateful, if somebody can fix the problem for the >>> 64-bit version with gcc. Thank you very much for any help in >>> advance. >>> >>> >>> Kind regards >>> >>> Siegmar >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/