Hi Maxim,

Thanks for your reply! 
We tried MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_, but the problem 
persists. The only difference is that stdout changes to ''? MPI: invalid option 
-hostfile ?''.

Thanks,
Wei


On Oct 31, 2010, at 10:40 PM, Maxim Rakitin wrote:

> Hi,
> 
> It looks like Intel's mpirun doesn't have '-machinefile' option. Instead of 
> this it has '-hostfile' option (form here: 
> http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt).
> 
> Try 'mpirun -h' for information about options and apply appropriate.
> Best regards,
>    Maxim Rakitin
>    email: rms85 at physics.susu.ac.ru
>    web: http://www.susu.ac.ru
> 
> 01.11.2010 4:56, Wei Xie ?????:
>> 
>> Dear all WIEN2k community members:
>> 
>> We encountered some problem when running in parallel (K-point, MPI or 
>> both)--the calculations crashed at LAPW2. Note we had no problem running it 
>> in serial. We have tried to diagnose the problem, recompile the code with 
>> difference options and test with difference cases and parameters based on 
>> similar problems reported on the mail list, but the problem persists. So we 
>> write here hoping someone can offer us some suggestion. We have attached 
>> related files below for your reference. Your replies are appreciated in 
>> advance! 
>> 
>> This is a TiC example running in both Kpoint and MPI parallel on two nodes 
>> r1i0n0 and r1i0n1 (8cores/node):
>> 
>> 1. stdout (abridged) 
>> MPI: invalid option -machinefile
>> real 0m0.004s
>> user 0m0.000s
>> sys 0m0.000s
>> ...
>> MPI: invalid option -machinefile
>> real 0m0.003s
>> user 0m0.000s
>> sys 0m0.004s
>> TiC.scf1up_1: No such file or directory.
>> 
>> LAPW2 - Error. Check file lapw2.error
>> cp: cannot stat `.in.tmp': No such file or directory
>> rm: cannot remove `.in.tmp': No such file or directory
>> rm: cannot remove `.in.tmp1': No such file or directory
>> 
>> 2. TiC.dayfile (abridged) 
>> ...
>>     start  (Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go)
>>     cycle 1  (Sun Oct 31 16:25:06 MDT 2010)  (40/99 to go)
>> 
>> >   lapw0 -p (16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 MDT 
>> > 2010
>> -------- .machine0 : 16 processors
>> invalid "local" arg: -machinefile
>> 
>> 0.436u 0.412s 0:04.63 18.1% 0+0k 2600+0io 1pf+0w
>> >   lapw1  -up -p    (16:25:12) starting parallel lapw1 at Sun Oct 31 
>> > 16:25:12 MDT 2010
>> ->  starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010
>> running LAPW1 in parallel mode (using .machines)
>> 2 number_of_parallel_jobs
>>      r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)      r1i0n1 
>> r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1)      r1i0n0 r1i0n0 
>> r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1)    Summary of lapw1para:
>>    r1i0n0  k=0  user=0  wallclock=0
>>    r1i0n1  k=0  user=0  wallclock=0
>> ...
>> 0.116u 0.316s 0:10.48 4.0% 0+0k 0+0io 0pf+0w
>> >   lapw2 -up -p   (16:25:34) running LAPW2 in parallel mode
>> **  LAPW2 crashed!
>> 0.032u 0.104s 0:01.13 11.5% 0+0k 82304+0io 8pf+0w
>> error: command   /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def   failed
>> 
>> 3. uplapw2.error 
>> Error in LAPW2
>>  'LAPW2' - can't open unit: 18                                               
>>  
>>  'LAPW2' -        filename: TiC.vspup                                        
>>  
>>  'LAPW2' -          status: old          form: formatted                     
>>  
>> **  testerror: Error in Parallel LAPW2
>> 
>> 4. .machines
>> #
>> 1:r1i0n0:8
>> 1:r1i0n1:8
>> lapw0:r1i0n0:8 r1i0n1:8 
>> granularity:1
>> extrafine:1
>> 
>> 5. compilers, MPI and options
>> Intel Compilers  and MKL 11.1.046
>> Intel MPI 3.2.0.011
>> 
>> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
>> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
>> current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
>> -pthread
>> current:DPARALLEL:'-DParallel'
>> current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
>> -openmp -lpthread -lguide
>> current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t 
>> -lmkl_scalapack_lp64 
>> /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a 
>> -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
>> -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread 
>> -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS)
>> current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
>> 
>> Best regards,
>> Wei Xie
>> Computational Materials Group
>> University of Wisconsin-Madison
>> 
>> 
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101031/2ce15505/attachment.htm>

Reply via email to