Hi, Tom,

I tried it. The problem is still there.


2008/8/24 Thomas Sadowski

> That might be your problem...Try linking against libmpi_f90.a
> -Tom
> Date: Sat, 23 Aug 2008 10:13:43 +0800
> Subject: Re: [SIESTA-L] Strange mpi problem
> To: SIESTA-L@listserv.uam.es
> Hi, Dear Tom,
> I tried "mpirun -np 4 siesta < input.fdf > output.out &". What I get is the
> following:
> libibverbs: Fatal: couldn't read uverbs ABI version.
> --------------------------------------------------------------------------
> [0,1,0]: OpenIB on host node1 was unable to find any HCAs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> libibverbs: Fatal: couldn't read uverbs ABI version.
> --------------------------------------------------------------------------
> [0,1,1]: OpenIB on host node1 was unable to find any HCAs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> [node1:11756] *** An error occurred in MPI_Comm_group
> [node1:11757] *** An error occurred in MPI_Comm_group
> [node1:11757] *** on communicator MPI_COMM_WORLD
> [node1:11757] *** MPI_ERR_COMM: invalid communicator
> [node1:11757] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node1:11756] *** on communicator MPI_COMM_WORLD
> [node1:11756] *** MPI_ERR_COMM: invalid communicator
> [node1:11756] *** MPI_ERRORS_ARE_FATAL (goodbye)
> From the FAQ of openmpi official site, it seems that we should use the
> option "--mca btl tcp,self"  for TCP network?
> My machines are also x86-64 architecture with RedHat Enterprise Linux 4.
> The compiler is pgi-7.1.4.  My configure options are:
> ./configure --prefix=/usr/local/math_library/pgi-7.1.4/openmpi-1.2.6 CC=cc
> CXX=c++ F77=pgf90 F90=pgf90 FC=pgf90 CFLAGS="-O2" FFLAGS="-O2"
> Do you think there is anything wrong here, please?
> By the way, when I compile BLACS and scalapack,  in the "lib" directory of
> openmpi-1.2.6, I can not see the file "libmpich.a", but I can see it in both
> mpich2 and mvapich2. So I set MPILIB in Bmake.inc of BLACS and SMPLIB in
> SLmake.inc of scalapack to "openmpi-1.2.6/lib/libmpi.la", since only this
> file looks similar to libmpich.a. Is it wrong here, please?
> I will appreciate it very much if you can send your log file for
> configuring openmpi, the Bmake.inc and SLmake.inc.
> Sincerely,
> Lakee
2008/8/23 Thomas Sadowski
> Lakee,
> Glad to hear everything worked well with MVAPICH. In regards to OpenMPI, I
> am confused. You are running on a machine with eight CPUs correct? Why
> supply a host file then? Try it without the -hostfile <filename>. Typically,
> my routines are input
> mpirun -np 4 siesta < input.fdf > output.out &
> If this still doesn't work, it may have to do with how you configured
> OpenMPI. The machines I run SIESTA on are x86_64 architecture with SuSE
> 10.x. OpenMPI was compiled with Intel Fortran, C compilers. I can send the
> log file, if necessary
> -Tom
> Date: Fri, 22 Aug 2008 22:47:25 +0800
> Subject: Re: [SIESTA-L] Strange mpi problem
> To: SIESTA-L@listserv.uam.es
> Hello, Dear Tom,
> Thank you so much for your experience and suggestion. I also have tried
> MVAPICH and OpenMPI today.
> For me, MVAPICH also works very well! :-)
> However, OpenMPI does not work for more than 1 processor. Every time after
> I run the command:
>  mpirun --mca btl tcp,self -np 4  --hostfile hostfile siesta <input.fdf
> >output
> I always got the following error message
> [node1:10222] *** An error occurred in MPI_Comm_group
> [node1:10222] *** on communicator MPI_COMM_WORLD
> [node1:10222] *** MPI_ERR_COMM: invalid communicator
> [node1:10222] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node1:10223] *** An error occurred in MPI_Comm_group
> [node1:10223] *** on communicator MPI_COMM_WORLD
> [node1:10223] *** MPI_ERR_COMM: invalid communicator
> [node1:10223] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node1:10224] *** An error occurred in MPI_Comm_group
> [node1:10224] *** on communicator MPI_COMM_WORLD
> [node1:10224] *** MPI_ERR_COMM: invalid communicator
> [node1:10224] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node1:10225] *** An error occurred in MPI_Comm_group
> [node1:10225] *** on communicator MPI_COMM_WORLD
> [node1:10225] *** MPI_ERR_COMM: invalid communicator
> [node1:10225] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 275
> [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
> line 1166
> [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line
> 90
> [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> base/pls_base_orted_cmds.c at line 188
> [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
> line 1198
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons for this job. Returned
> value Timeout instead of ORTE_SUCCESS.
> --------------------------------------------------------------------------
> [1]+  Exit 1                  mpirun --mca btl tcp,self -np 4 --hostfile
> hostfile siesta <input.fdf >output
> But when I check the output file, I can get the following lines at the end:
> ...
> * Maximum dynamic memory allocated =     1 MB
> siesta:                 ==============================
>                             Begin CG move =      0
>                         ==============================
> outcell: Unit cell vectors (Ang):
>        12.000000    0.000000    0.000000
>         0.000000   12.000000    0.000000
>         0.000000    0.000000   41.507400
> outcell: Cell vector modules (Ang)   :   12.000000   12.000000   41.507400
> outcell: Cell angles (23,13,12) (deg):     90.0000     90.0000     90.0000
> outcell: Cell volume (Ang**3)        :   5977.0656
> InitMesh: MESH =   108 x   108 x   360 =     4199040
> InitMesh: Mesh cutoff (required, used) =   200.000   207.901 Ry
> * Maximum dynamic memory allocated =   163 MB
> So, the calculation started normally, but it stopped suddengly.  I dit it
> on a single machine with 8 cores. Do you have any idea about it, please?
> Sincerely,
> Lakee
2008/8/20 Thomas Sadowski
> Lakee,
> I cannt speak about the serial sleep, but I have encountered similar
> problems trying to run parallel SIESTA using MPICH2 as my MPI interface. I
> would recommend considering one of the other MPI programs and see if this
> doesn't solve the problem. Myself I use both OpenMPI and MVAPICH. Depending
> on how aggressively the libraries are compiled, the latter tends to run a
> little faster than the former, but it is not really that significant of a
> difference. I would be interested to hear what the other users have to say
> concerning this issue.
> Tom Sadowski
> University of Connecticut
> Date: Wed, 20 Aug 2008 22:06:20 +0800
> Subject: [SIESTA-L] Strange mpi problem
> To: SIESTA-L@listserv.uam.es
> Hello, Dear all,
> These days, I was trying to run a calculation with the parallel version of
> siesta on a PC cluster. It was compiled by mpich2 and pgi compiler. What
> surprises me is that, sometimes, it runs normally, and sometimes, the task
> enters a sleeping status right after I submit the job by "mpiexec -n 8
> siesta <input> output".  In the output of the command "top", I can see "S"
> as the status on the line of my job. At this time ,it never goes on.  This
> happens very frequently, even for the same input file.  I do not know why.
> Could you tell me how to avoid entering a sleeping state right after the
> submission of the job, please?
> Thank you very much!!
> Sincerely,
> Lakee
Reply via email to