Lakee,
Glad to hear everything worked well with MVAPICH. In regards to OpenMPI, I am confused. You are running on a machine with eight CPUs correct? Why supply a host file then? Try it without the -hostfile <filename>. Typically, my routines are input mpirun -np 4 siesta < input.fdf > output.out & If this still doesn't work, it may have to do with how you configured OpenMPI. The machines I run SIESTA on are x86_64 architecture with SuSE 10.x. OpenMPI was compiled with Intel Fortran, C compilers. I can send the log file, if necessary -Tom Date: Fri, 22 Aug 2008 22:47:25 +0800 From: [EMAIL PROTECTED] Subject: Re: [SIESTA-L] Strange mpi problem To: SIESTA-L@listserv.uam.es Hello, Dear Tom, Thank you so much for your experience and suggestion. I also have tried MVAPICH and OpenMPI today. For me, MVAPICH also works very well! :-) However, OpenMPI does not work for more than 1 processor. Every time after I run the command: mpirun --mca btl tcp,self -np 4 --hostfile hostfile siesta <input.fdf >output I always got the following error message [node1:10222] *** An error occurred in MPI_Comm_group [node1:10222] *** on communicator MPI_COMM_WORLD [node1:10222] *** MPI_ERR_COMM: invalid communicator [node1:10222] *** MPI_ERRORS_ARE_FATAL (goodbye) [node1:10223] *** An error occurred in MPI_Comm_group [node1:10223] *** on communicator MPI_COMM_WORLD [node1:10223] *** MPI_ERR_COMM: invalid communicator [node1:10223] *** MPI_ERRORS_ARE_FATAL (goodbye) [node1:10224] *** An error occurred in MPI_Comm_group [node1:10224] *** on communicator MPI_COMM_WORLD [node1:10224] *** MPI_ERR_COMM: invalid communicator [node1:10224] *** MPI_ERRORS_ARE_FATAL (goodbye) [node1:10225] *** An error occurred in MPI_Comm_group [node1:10225] *** on communicator MPI_COMM_WORLD [node1:10225] *** MPI_ERR_COMM: invalid communicator [node1:10225] *** MPI_ERRORS_ARE_FATAL (goodbye) [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166 [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90 [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188 [node1:10219] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198 -------------------------------------------------------------------------- mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS. -------------------------------------------------------------------------- [1]+ Exit 1 mpirun --mca btl tcp,self -np 4 --hostfile hostfile siesta <input.fdf >output But when I check the output file, I can get the following lines at the end: ... * Maximum dynamic memory allocated = 1 MB siesta: ============================== Begin CG move = 0 ============================== outcell: Unit cell vectors (Ang): 12.000000 0.000000 0.000000 0.000000 12.000000 0.000000 0.000000 0.000000 41.507400 outcell: Cell vector modules (Ang) : 12.000000 12.000000 41.507400 outcell: Cell angles (23,13,12) (deg): 90.0000 90.0000 90.0000 outcell: Cell volume (Ang**3) : 5977.0656 InitMesh: MESH = 108 x 108 x 360 = 4199040 InitMesh: Mesh cutoff (required, used) = 200.000 207.901 Ry * Maximum dynamic memory allocated = 163 MB So, the calculation started normally, but it stopped suddengly. I dit it on a single machine with 8 cores. Do you have any idea about it, please? Sincerely, Lakee 2008/8/20 Thomas Sadowski <[EMAIL PROTECTED]> Lakee, I cannt speak about the serial sleep, but I have encountered similar problems trying to run parallel SIESTA using MPICH2 as my MPI interface. I would recommend considering one of the other MPI programs and see if this doesn't solve the problem. Myself I use both OpenMPI and MVAPICH. Depending on how aggressively the libraries are compiled, the latter tends to run a little faster than the former, but it is not really that significant of a difference. I would be interested to hear what the other users have to say concerning this issue. Tom Sadowski University of Connecticut Date: Wed, 20 Aug 2008 22:06:20 +0800 From: [EMAIL PROTECTED] Subject: [SIESTA-L] Strange mpi problem To: SIESTA-L@listserv.uam.es Hello, Dear all, These days, I was trying to run a calculation with the parallel version of siesta on a PC cluster. It was compiled by mpich2 and pgi compiler. What surprises me is that, sometimes, it runs normally, and sometimes, the task enters a sleeping status right after I submit the job by "mpiexec -n 8 siesta <input> output". In the output of the command "top", I can see "S" as the status on the line of my job. At this time ,it never goes on. This happens very frequently, even for the same input file. I do not know why. Could you tell me how to avoid entering a sleeping state right after the submission of the job, please? Thank you very much!! Sincerely, Lakee See what people are saying about Windows Live. Check out featured posts. Check It Out! _________________________________________________________________ Get thousands of games on your PC, your mobile phone, and the web with Windows®. http://clk.atdmt.com/MRT/go/108588800/direct/01/