Hi, thank you for the responses Yes, sorry the dayfile was from a different test run. The run using "./wien2k_tasks_v4.sh 2 4" shows it as:
> lapw0 -p (12:51:21) starting parallel lapw0 at Thu Mar 23 12:51:21 CD$ -------- .machine0 : 2 processors ** lapw0 crashed! .machines file was generated using: # create hostfile_tacc from a batch mpiexec.hydra hostname|cut -d \. -f 1 | sort -n > hostlist_wien2k # head of machines_kpoint # rm .machines echo '#' > .machines echo 'granularity:1' >> .machines # list the hosts in rows for k-point parallelism awk -v div=$1 '{_=int(NR/(div+1.0e-10))} {a[_]=((a[_])?a[_]FS:x)$1;l=(_>l)?_:l}END{for(i=0;i<=0;++i)print "lapw0:"a[i]":1"}' hostlist_wien2k >>.machines awk -v div=$2 '{_=int(NR/(div+1.0e-10))} {a[_]=((a[_])?a[_]FS:x)$1;l=(_>l)?_:l}END{for(i=0;i<=l;++i)print "1:"a[i]":1"}' hostlist_wien2k >>.machines # # tail of machines_kpoint: allocate remaining k points one by one over all tasks # echo 'extrafine:1' >>.machines # machines_kpoint is end # cleanup rm hostlist_wien2k I believe both of fftw and WIEN2k were compiled with the same intel compilers, but I've attached my WIEN2k options in the second email. I’ve tried using different “CORES_PER_NODE” settings (16, 64) to either match the number of cores per node I request or the number of total cores per node, but the error is still the same, and running x lapw0 followed by x lapw1 -p in my job script leads to: LAPW0 END forrtl: No such file or directory forrtl: severe (28): CLOSE error, unit 200, file "Unknown" Image PC Routine Line Source lapw1_mpi 00000000004DCBAB Unknown Unknown Unknown lapw1_mpi 00000000004CED9F Unknown Unknown Unknown lapw1_mpi 000000000045DEE3 inilpw_ 264 inilpw.f lapw1_mpi 0000000000462050 MAIN__ 48 lapw1_tmp_.F lapw1_mpi 0000000000408362 Unknown Unknown Unknown libc-2.28.so 0000147E06BC9CF3 __libc_start_main Unknown Unknown lapw1_mpi 000000000040826E Unknown Unknown Unknown srun: error: c306-005: task 0: Exited with exit code 28 forrtl: No such file or directory forrtl: severe (28): CLOSE error, unit 200, file "Unknown" Any additional help/information would be greatly appreciated Regards, Brian Lee | Graduate Student The University of Texas at Austin | Texas Materials Institute (he/him/his) On Thu, Mar 23, 2023 at 3:51 PM Peter Blaha <peter.bl...@tuwien.ac.at> wrote: > My guess would be that you link with a fftw which is compiled with > gfortran, while wien2k is compiled with ifort (of the opposite or different > compiler versions.....). > > Or it was compiled with proper compilers, but the mpi was mixed (openmpi > vs intelmpi, ... > > > You can also try to run only > > x lapw0 (serial, so that you get proper vsp and vns files for lapw1) > > x lapw1 -p in mpi-mode. lapw1 does not link fftw (but scalapack and > hopefully elpa). > > > Otherwise your report cannot be fully correct: > > You claim that you requested 2 cores for lapw0 and part of your email > supports this . > > However, I do not understand why the dayfile claims to have 4 cores in > .machine0 ??? > > About the way wien2k launches mpi jobs: You can "see" how it does it in > the error logs: > > srun -K -N1 -n2 -r0 /home1/08844/leebrian/wien2k/lapw0_mpi lapw0.def >> > .time00 > > Your sysadmins can check this command and you can put this line in your > submit script and test it. > > PS: In any case, you request 4 nodes and in total 64 cores. > > But with this .machines file you use only 2 cores in lapw0 and 16 in > lapw1/2. This waists your cpu-hours. > > Check the part of your script (wien2k_tasks... ????) that generates the > .machines file. > > PS: What is your CORES_PER_NODE setting ? > > PPS: The message from L.Marks that you need a ":number" in the .machines > file is not true. It is perfectly ok and the same to use node:1 or > only node > > > Am 23.03.2023 um 19:14 schrieb Brian Lee: > > Hello WIEN2k users/developers, > > I am a graduate student at UT Austin in the MS&E program and would like to > test > > WIEN2k_23.2 using various parallelization schemes. When I try to run > “run_lapw -p” with the default MPI run command suggested during siteconfig > along with a .machines file/job script that requests 2 processors per lapw0 > and/or 2 processors per kpt, I receive the following error: > /index.html > > -- > ----------------------------------------------------------------------- > Peter Blaha, Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna > Phone: +43-158801165300 > Email: peter.bl...@tuwien.ac.at > WWW: http://www.imc.tuwien.ac.at WIEN2k: http://www.wien2k.at > ------------------------------------------------------------------------- > > _______________________________________________ > Wien mailing list > Wien@zeus.theochem.tuwien.ac.at > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > SEARCH the MAILING-LIST at: > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html >
_______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html