Hello, Our problem is more or less related to Wei Xie's postings of two weeks ago. We can't get Wien2k 10.1 running using the MPI setup. Serial versions and parallel versions based on ssh do work. Since his solution does not seem to work for us, I'll describe our problem/setup.
FYI: the Intel MPI setup does work for lots of other programs on our cluster, so I guess it must be an Intel MPI-Wien2k(-Torque-MOAB) specific problem. Software environment: icc/ifort: 11.1.073 impi: 4.0.0.028 imkl: 10.2.6.038 FFTW: 2.1.5 Torque/MOAB $ cat parallel_options setenv USE_REMOTE 1 setenv MPI_REMOTE 1 setenv WIEN_GRANULARITY 1 setenv WIEN_MPIRUN "mpirun -r ssh -np _NP_ _EXEC_" Call: clean_lapw -s run_lapw -p -ec 0.00001 -i 1000 $ cat .machines lapw0: cn002:8 cn004:8 cn016:8 cn018:8 1: cn002:8 1: cn004:8 1: cn016:8 1: cn018:8 granularity:1 extrafine:1 Also, the appropriate .machine1, .machine2, etc are generated. $ cat TiC.dayfile [...] > lapw0 -p (09:59:34) starting parallel lapw0 at Sun Nov 14 09:59:34 CET > 2010 -------- .machine0 : 32 processors 0.428u 0.255s 0:05.12 13.0% 0+0k 0+0io 0pf+0w > lapw1 -p (09:59:39) starting parallel lapw1 at Sun Nov 14 09:59:39 CET > 2010 -> starting parallel LAPW1 jobs at Sun Nov 14 09:59:39 CET 2010 running LAPW1 in parallel mode (using .machines) 4 number_of_parallel_jobs cn002 cn002 cn002 cn002 cn002 cn002 cn002 cn002(1) WARNING: Unable to read mpd.hosts or list of hosts isn't provided. MPI job will be run on the current machine only. rank 5 in job 1 cn002_55855 caused collective abort of all ranks exit status of rank 5: killed by signal 9 rank 4 in job 1 cn002_55855 caused collective abort of all ranks exit status of rank 4: killed by signal 9 rank 3 in job 1 cn002_55855 caused collective abort of all ranks exit status of rank 3: killed by signal 9 [...] Specifying -hostfile in the WIEN_MPIRUN variable results in the following error invalid "local" arg: -hostfile Thanks in advance for helping us running Wien2k in an MPI setup ;-) Regards Stefan Becuwe