One addendum. Torque-MOAB probably sets up some default files for you in many cases under the assumption that all you are doing is running a single mpi task using all the nodes you asked for. You might be able to get away with something like changing to
setenv WIEN_MPIRUN "mpirun _EXEC_" and a machines file such as lapw0: cn002:8 cn004:8 cn016:8 cn018:8 1: cn002:8 cn004:8 cn016:8 cn018:8 granularity:1 extrafine:1 so in effect you are running 1 mpi job on all the nodes with the MOAB defaults. (You might need -np _NP_ is the WIEN_MPIRUN, you have to experiment and read your mpirun instructions, e.g. "mpirun --help" or "man mpirun".) However, this is not very efficient On Sun, Nov 14, 2010 at 9:53 AM, Laurence Marks <L-marks at northwestern.edu> wrote: > I don't think that this has much to do with Wien2k, it is an issue > with how you are setting up your mpi. From the looks of it you are > using MPICH2, whereas most of the scripts in Wien2k are setup to use > MPICH1 which is rather simpler. For MPICH2 you have to setup the mpd > daemon and configuration files, which is very different from the > simpler hostfile structure of MPICH1. > > (I personally have never got Wien2k running smoothly with MPICH2, but > have not tried too hard. If ?anyone has a detailed description this > would be a useful post.) > > You can find some information about the other steps you need for > MPICH2 on the web, e.g. > > http://developer.amd.com/documentation/articles/pages/HPCHighPerformanceLinpack.aspx > > and google searches on "WARNING: Unable to read mpd.hosts or list of > hosts isn't provided" > > On Sun, Nov 14, 2010 at 3:19 AM, Stefan Becuwe <stefan.becuwe at ua.ac.be> > wrote: >> >> Hello, >> >> Our problem is more or less related to Wei Xie's postings of two weeks ago. >> ?We can't get Wien2k 10.1 running using the MPI setup. ?Serial versions and >> parallel versions based on ssh do work. ?Since his solution does not seem to >> work for us, I'll describe our problem/setup. >> >> FYI: the Intel MPI setup does work for lots of other programs on our >> cluster, so I guess it must be an Intel MPI-Wien2k(-Torque-MOAB) specific >> problem. >> >> Software environment: >> >> icc/ifort: 11.1.073 >> impi: ? ? ?4.0.0.028 >> imkl: ? ? ?10.2.6.038 >> FFTW: ? ? ?2.1.5 >> Torque/MOAB >> >> >> $ cat parallel_options >> setenv USE_REMOTE 1 >> setenv MPI_REMOTE 1 >> setenv WIEN_GRANULARITY 1 >> setenv WIEN_MPIRUN "mpirun -r ssh -np _NP_ _EXEC_" >> >> >> Call: >> >> clean_lapw -s >> run_lapw -p -ec 0.00001 -i 1000 >> >> >> $ cat .machines >> lapw0: cn002:8 cn004:8 cn016:8 cn018:8 >> 1: cn002:8 >> 1: cn004:8 >> 1: cn016:8 >> 1: cn018:8 >> granularity:1 >> extrafine:1 >> >> >> Also, the appropriate .machine1, .machine2, etc are generated. >> >> >> $ cat TiC.dayfile >> [...] >>> >>> ?lapw0 -p ? ?(09:59:34) starting parallel lapw0 at Sun Nov 14 09:59:34 CET >>> 2010 >> >> -------- .machine0 : 32 processors >> 0.428u 0.255s 0:05.12 13.0% ? ? 0+0k 0+0io 0pf+0w >>> >>> ?lapw1 ?-p ? (09:59:39) starting parallel lapw1 at Sun Nov 14 09:59:39 CET >>> 2010 >> >> -> ?starting parallel LAPW1 jobs at Sun Nov 14 09:59:39 CET 2010 >> running LAPW1 in parallel mode (using .machines) >> 4 number_of_parallel_jobs >> ? ? cn002 cn002 cn002 cn002 cn002 cn002 cn002 cn002(1) WARNING: Unable to >> read mpd.hosts or list of hosts isn't provided. MPI job will be run on the >> current machine only. >> rank 5 in job 1 ?cn002_55855 ? caused collective abort of all ranks >> ?exit status of rank 5: killed by signal 9 >> rank 4 in job 1 ?cn002_55855 ? caused collective abort of all ranks >> ?exit status of rank 4: killed by signal 9 >> rank 3 in job 1 ?cn002_55855 ? caused collective abort of all ranks >> ?exit status of rank 3: killed by signal 9 >> [...] >> >> >> Specifying -hostfile in the WIEN_MPIRUN variable results in the following >> error >> >> invalid "local" arg: -hostfile >> >> >> Thanks in advance for helping us running Wien2k in an MPI setup ;-) >> >> Regards >> >> >> Stefan Becuwe >> _______________________________________________ >> Wien mailing list >> Wien at zeus.theochem.tuwien.ac.at >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien >> > > > > -- > Laurence Marks > Department of Materials Science and Engineering > MSE Rm 2036 Cook Hall > 2220 N Campus Drive > Northwestern University > Evanston, IL 60208, USA > Tel: (847) 491-3996 Fax: (847) 491-7820 > email: L-marks at northwestern dot edu > Web: www.numis.northwestern.edu > Chair, Commission on Electron Crystallography of IUCR > www.numis.northwestern.edu/ > Electron crystallography is the branch of science that uses electron > scattering and imaging to study the structure of matter. > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Electron crystallography is the branch of science that uses electron scattering and imaging to study the structure of matter.