Hi, It looks like Intel's mpirun doesn't have '-machinefile' option. Instead of this it has '-hostfile' option (form here: http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt).
Try 'mpirun -h' for information about options and apply appropriate. Best regards, Maxim Rakitin email: rms85 at physics.susu.ac.ru web: http://www.susu.ac.ru 01.11.2010 4:56, Wei Xie ?????: > Dear all WIEN2k community members: > > We encountered some problem when running in parallel (K-point, MPI or > both)--the calculations crashed at LAPW2. Note we had no problem > running it in serial. We have tried to diagnose the problem, recompile > the code with difference options and test with difference cases and > parameters based on similar problems reported on the mail list, but > the problem persists. So we write here hoping someone can offer us > some suggestion. We have attached related files below for your > reference. Your replies are appreciated in advance! > > This is a TiC example running in both Kpoint and MPI parallel on two > nodes /r1i0n0/ and /r1i0n1/ (8cores/node): > > *1. **stdout **(abridged) * > MPI: invalid option -machinefile > real0m0.004s > user0m0.000s > sys0m0.000s > ... > MPI: invalid option -machinefile > real0m0.003s > user0m0.000s > sys0m0.004s > TiC.scf1up_1: No such file or directory. > > LAPW2 - Error. Check file lapw2.error > cp: cannot stat `.in.tmp': No such file or directory > rm: cannot remove `.in.tmp': No such file or directory > *rm: cannot remove `.in.tmp1': No such file or directory* > * > * > *2. TiC.dayfile (abridged) * > ... > start (Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go) > cycle 1 (Sun Oct 31 16:25:06 MDT 2010) (40/99 to go) > > > lapw0 -p(16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 > MDT 2010 > -------- .machine0 : 16 processors > invalid "local" arg: -machinefile > > 0.436u 0.412s 0:04.63 18.1%0+0k 2600+0io 1pf+0w > > lapw1 -up -p (16:25:12) starting parallel lapw1 at Sun Oct 31 > 16:25:12 MDT 2010 > -> starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010 > running LAPW1 in parallel mode (using .machines) > 2 number_of_parallel_jobs > r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) > r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1) > r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) Summary > of lapw1para: > r1i0n0 k=0 user=0 wallclock=0 > r1i0n1 k=0 user=0 wallclock=0 > ... > 0.116u 0.316s 0:10.48 4.0%0+0k 0+0io 0pf+0w > > lapw2 -up -p (16:25:34) running LAPW2 in parallel mode > ** LAPW2 crashed! > 0.032u 0.104s 0:01.13 11.5%0+0k 82304+0io 8pf+0w > error: command /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def failed > > *3. uplapw2.error * > Error in LAPW2 > 'LAPW2' - can't open unit: 18 > 'LAPW2' - filename: TiC.vspup > 'LAPW2' - status: old form: formatted > ** testerror: Error in Parallel LAPW2 > > *4. .machines* > # > 1:r1i0n0:8 > 1:r1i0n1:8 > lapw0:r1i0n0:8 r1i0n1:8 > granularity:1 > extrafine:1 > > *5. compilers, MPI and options* > Intel Compilers and MKL 11.1.046 > Intel MPI 3.2.0.011 > > current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback > current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback > current:LDFLAGS:$(FOPT) > -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread > current:DPARALLEL:'-DParallel' > current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread > -lmkl_core -openmp -lpthread -lguide > current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t > -lmkl_scalapack_lp64 > /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a > -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core > -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread > -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS) > current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ > > Best regards, > Wei Xie > Computational Materials Group > University of Wisconsin-Madison > > > _______________________________________________ > Wien mailing list > Wien at zeus.theochem.tuwien.ac.at > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101101/e1463e23/attachment.htm>