Dear all, I'm trying to run RaXML 7.0.4 on my 64bit Rocks 5.1 cluster (ie Centos 5.2). I compiled Open MPI 1.3.3 using the Intel compilers v 11.1.056 using ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-sge --prefix=/usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man --with-memory-manager=none.
When I run run RaXML in a qlogin session using /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/bin/mpirun -np 8 /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI -f a -x 12345 -p12345 -# 10 -m GTRGAMMA -s /users/holwani1/jay/ornodko-1582 -n mpitest39 I get the following output: This is the RAxML MPI Worker Process Number: 1 This is the RAxML MPI Worker Process Number: 3 This is the RAxML MPI Master process This is the RAxML MPI Worker Process Number: 7 This is the RAxML MPI Worker Process Number: 4 This is the RAxML MPI Worker Process Number: 5 This is the RAxML MPI Worker Process Number: 2 This is the RAxML MPI Worker Process Number: 6 IMPORTANT WARNING: Alignment column 1695 contains only undetermined values which will be treated as missing data IMPORTANT WARNING: Sequences A4_H10 and A3ii_E11 are exactly identical IMPORTANT WARNING: Sequences A2_A08 and A9_C10 are exactly identical IMPORTANT WARNING: Sequences A3ii_B03 and A3ii_C06 are exactly identical IMPORTANT WARNING: Sequences A9_D08 and A9_F10 are exactly identical IMPORTANT WARNING: Sequences A3ii_F07 and A9_C08 are exactly identical IMPORTANT WARNING: Sequences A6_F05 and A6_F11 are exactly identical IMPORTANT WARNING Found 6 sequences that are exactly identical to other sequences in the alignment. Normally they should be excluded from the analysis. IMPORTANT WARNING Found 1 column that contains only undetermined values which will be treated as missing data. Normally these columns should be excluded from the analysis. An alignment file with undetermined columns and sequence duplicates removed has already been printed to file /users/holwani1/jay/ornodko-1582.reduced You are using RAxML version 7.0.4 released by Alexandros Stamatakis in April 2008 Alignment has 1280 distinct alignment patterns Proportion of gaps and completely undetermined characters in this alignment: 0.124198 RAxML rapid bootstrapping and subsequent ML search Executing 10 rapid bootstrap inferences and thereafter a thorough ML search All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units Partition: 0 Name: No Name Provided DataType: DNA Substitution Matrix: GTR Empirical Base Frequencies: pi(A): 0.261129 pi(C): 0.228570 pi(G): 0.315946 pi(T): 0.194354 Switching from GAMMA to CAT for rapid Bootstrap, final ML search will be conducted under the GAMMA model you specified Bootstrap[10]: Time 44.442728 bootstrap likelihood -inf, best rearrangement setting 5 Bootstrap[0]: Time 44.814948 bootstrap likelihood -inf, best rearrangement setting 5 Bootstrap[6]: Time 46.470371 bootstrap likelihood -inf, best rearrangement setting 6 [compute-0-11:08698] *** Process received signal *** [compute-0-11:08698] Signal: Segmentation fault (11) [compute-0-11:08698] Signal code: Address not mapped (1) [compute-0-11:08698] Failing at address: 0x408 [compute-0-11:08698] [ 0] /lib64/libpthread.so.0 [0x3fb580de80] [compute-0-11:08698] [ 1] /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(hookup+0) [0x413ca0] [compute-0-11:08698] [ 2] /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(restoreTL+0xd9) [0x442c09] [compute-0-11:08698] [ 3] /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI [0x42c968] [compute-0-11:08698] [ 4] /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(doAllInOne+0x91a) [0x42b21a] [compute-0-11:08698] [ 5] /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI(main+0xc25) [0x4063f5] [compute-0-11:08698] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3fb501d8b4] [compute-0-11:08698] [ 7] /usr/prog/bioinformatics/RAxML/7.0.4/x86_64/RAxML-7.0.4/raxmlHPC-MPI [0x405719] [compute-0-11:08698] *** End of error message *** Bootstrap[1]: Time 8.400332 bootstrap likelihood -inf, best rearrangement setting 5 -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 8698 on node compute-0-11.local exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- My $PATH is /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/bin/:/usr/prog/mpi/openmpi/1.3.3/x86_64/bin/:/usr/prog/intel/ifort/11.1.056/bin/intel64:/usr/prog/intel/icc/11.1.056//bin/intel64:/usr/prog/intel/ifort/11.1.056/bin/intel64:/usr/prog/intel/icc/11.1.056//bin/intel64:/opt/gridengine/bin/lx26-amd64:/usr/kerberos/sbin:/usr/kerberos/bin:/opt/gridengine/bin/lx26-amd64:/usr/java/latest/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/ganglia/bin:/opt/ganglia/sbin:/opt/rocks/bin:/opt/rocks/sbin:/root/bin My $LD_LIBRARY_PATH is /usr/prog/mpi/openmpi/1.3.3/x86_64-no-mem-man/lib/:/usr/prog/mpi/openmpi/1.3.3/x86_64/lib/:/usr/prog/intel/ifort/11.1.056/lib/intel64:/usr/prog/intel/ifort/11.1.056/mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//lib/intel64:/usr/prog/intel/icc/11.1.056//ipp/em64t/sharedlib:/usr/prog/intel/icc/11.1.056//mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/usr/prog/intel/ifort/11.1.056/lib/intel64:/usr/prog/intel/ifort/11.1.056/mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//lib/intel64:/usr/prog/intel/icc/11.1.056//ipp/em64t/sharedlib:/usr/prog/intel/icc/11.1.056//mkl/lib/em64t:/usr/prog/intel/icc/11.1.056//tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/gridengine/lib/lx26-amd64:/opt/gridengine/lib/lx26-amd64 Although I'm only running this on one node, it may be helpful to know that there is Infiniband with Voltaire OFED v1.4 on the nodes. Rocks' HPC roll MPIs is not installed. I've tried running the above on multiple nodes but still see the same error. I've attached the config.log and ompi_info to the email. I believe that the input is OK as I can run the serial gcc-compiled raXML on the data with no problems. I tried compiling openmpi with --with-memory-manager=none as a quick google (http://osdir.com/ml/clustering.open-mpi.user/2008-07/msg00201.html) suggested that it could help, but it made no difference. Google also suggested that it could be caused by the compile environment being different to the runtime, to test this I compiled and ran RaXML immediately after I compiled Openmpi in the same session, again with no joy. Does any one know how I can fix this? Thanks Nick
config.tar.gz
Description: GNU Zip compressed data
ompi-info.tar.gz
Description: GNU Zip compressed data