Dear users and developers,

     Recently I have done a test on Nankai Stars HPC. The error message 
"MPI_COMM_RANK : Null communicator??Aborting program !"appeared when I did 
a scf calculation through 2 cpu (2nodes). 

     To solve this problem, I have found some hints from google, such as??please
make sure that you used the same version of MPI for compiling and running, and
included the corresponding header file mpi.h in your code.?? 
(http://www.ncsa.edu/UserInfo/Resources/Hardware/XeonCluster/FAQ/XeonJobs.html)

     According to the pwscf mailing list,"dynamic port number used in mpi
intercommunication is not working. This is most probably an installation issue
regarding LSF." may be a problem. 
(http://www.democritos.it/pipermail/pw_forum/2007-June/006689.html)

     According to the pwscf manual,"Your machine might be configured so as to 
disallow interactive execution" may be another problem.

     My question is:
     To solve ??MPI_COMM_RANK???? problem, do I need to modify pwscf code,
mpich_gm code or LSF system?

Calculation Details are as follows:
---------------------------------------------------------------------------------
HPC background:
Nankai Stars (http://202.113.29.200/introduce.htm)
800 Xeon 3.06 Ghz CPU (400 nodes)   
800 GB Memory    
53T High-Speed Storage    
Myrinet
Parallel jobs are run and debuged through Platform LSF system.
Mpich_gm driver:1.2.6..13a
Espresso-3.2.3
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Installation:
/configure CC=mpicc F77=mpif77 F90=mpif90
make all
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Submit script :
#!/bin/bash
#BSUB -q normal
#BSUB -J test.icymoon
#BSUB -c 3:00
#BSUB -a "mpich_gm"
#BSUB -o %J.log
#BSUB -n 2 

cd /nfs/s04r2p1/wangxq_tj
echo "test icymoon"

mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x <
/nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out

echo "test icymoon end"
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Output file (%J.log):

?? ??
The output (if any) follows:

test icymoon
0 - MPI_COMM_RANK : Null communicator
[0]  Aborting program !
[0] Aborting program!
test icymoon end
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
<cu.scf.in>
&control

    calculation='scf'
    restart_mode='from_scratch',
    pseudo_dir = '/nfs/s04r2p1/wangxq_tj/espresso-3.2.3/pseudo/',
    outdir='/nfs/s04r2p1/wangxq_tj/',
    prefix='cu'
 /

 &system

    ibrav = 2, celldm(1) =6.73, nat= 1, ntyp= 1,
    ecutwfc = 25.0, ecutrho = 300.0
    occupations='smearing', smearing='methfessel-paxton', degauss=0.02
    noncolin = .true.
    starting_magnetization(1) = 0.5
    angle1(1) = 90.0
    angle2(1) =  0.0
 /

 &electrons

    conv_thr = 1.0e-8
    mixing_beta = 0.7 
 /

ATOMIC_SPECIES
 Cu 63.55 Cu.pz-d-rrkjus.UPF
ATOMIC_POSITIONS
 Cu 0.0 0.0 0.0
K_POINTS (automatic)
 8 8 8 0 0 0
--------------------------------------------------------------------------------

---------------------------------------------------------------------------------
cu.scf.out

1 - MPI_COMM_RANK : Null communicator
[1]  Aborting program !
[1] Aborting program!

TID  HOST_NAME    COMMAND_LINE            STATUS            TERMINATION_TIME

==== ========== ================  =======================  ===================

0001 node333                      Exit (255)               04/08/2008 19:36:59

0002 node284                      Exit (255)               04/08/2008 19:36:59

---------------------------------------------------------------------------------

Any help will be deeply appreciated!

Best regards,

=====================================

X.Q. Wang 

wangxinquan at tju.edu.cn

School of Chemical Engineering and Technology

Tianjin University

92 Weijin Road, Tianjin, P. R. China

tel:86-22-27890268, fax: 86-22-27892301

=====================================


Reply via email to