Thanks for the suggestion regarding the .processes file; this will probably come in handy at a later stage. Regarding the qtl program, my end goal is to calculate an ELNES spectrum for the structures I am investigating. To this end, is there any difference between running 'x lapw2 -p -qtl' and 'x qtl -p -telnes' (assuming case.innes is present)? Specifically, will the workflow:
script 1: run_lapw -p x -qtl -telnes x telnes3 perform the same task as this: script 1: run_lapw followed by script 2: lapw1 -p -d >&/dev/null lapw2 -p -qtl x telnes3 My question is both related to the parallellization schemes (assuming I use the same setup of nodes and generate the .machines file in the same way) and related to the behaviour of the programs (will lapw2 -p -qtl calculate suitable input files for telnes3, assuming case.innes is present)? I realize that telnes3 can probably be run locally, but I included the command for completeness. Best regards Christian ________________________________ Fra: Wien <wien-boun...@zeus.theochem.tuwien.ac.at> på vegne af Peter Blaha <pbl...@theochem.tuwien.ac.at> Sendt: 12. oktober 2020 11:58:22 Til: wien@zeus.theochem.tuwien.ac.at Emne: Re: [Wien] .machines for several nodes Yes, this is ok when your have nodes with 16 cores !!! (Only the lapw0 line could use :16 instead of 8 if you have 96 atoms, but most likely this is fairly negligible). Yes, the QTL calculation in lapw2 is also affected by the parallelization. but it reads from a .processes file, which is created by lapw1. If you run x lapw2 -p -qtl in an extra job, you should add the following line to create a "correct" .processes file: x lapw1 -p -d >&/dev/null # Create .processes (necessary for standalone-lapw2) On 10/12/20 11:45 AM, Christian Søndergaard Pedersen wrote: > This went a long way towards clearing up my confusion, thanks again. I > will try starting an MPI-parallel calculations for 4 nodes with 16 cores > each using the following .machines-file: > > 1:g008:16 > 1:g021:16 > 1:g025:16 > 1:g028:16 > lapw0: g008:8 g021:8 g025:8 g028:8 > > dstart: g008:8 g021:8 g025:8 g028:8 > > > ... and see how it performs. If the matrix sizes are small, I understand > that I could also have each node work on 2 (or more) k-points at the > same time, by specifying: > > > 1:g008:8 > 1:g008:8 > 1:g021:8 > 1:g021:8 > 1:g025:8 > 1:g025:8 > 1:g028:8 > 1:g028:8 > > so that for instance g008 will work on 2 kpoints using 8 cores for each > k point, am I right? And a (hopefully) final question, since qtl > according to the manual runs in k-point parallel, is it also affected by > the parallellization scheme specified for lapw1 and lapw2 (unless I > deliberately change it)? > > > > ------------------------------------------------------------------------ > *Fra:* Wien <wien-boun...@zeus.theochem.tuwien.ac.at> på vegne af Ruh, > Thomas <thomas....@tuwien.ac.at> > *Sendt:* 12. oktober 2020 10:59:09 > *Til:* A Mailing list for WIEN2k users > *Emne:* Re: [Wien] .machines for several nodes > > I am afraid, there is still some confusion. > > > First about /lapw1/: > > Sorry for my unclear statement - I meant that you need one line per > k-parallel job in the sense that #lines k-points are run simultaneously, > i. e. if you speficify this part of the machines file like this: > > > 1:g008:16 > > 1:g021:16 > > 1:g025:16 > > 1:g028:16 > > > your k-point list will be split into 4 parts of 56 k-points each [1] , > which will be processed step-by-step. Node g008 will work in its first > k-point, while node g021 will do the same for its first k-point, and so on > > You need the ":16" after the name of the node. Otherwise, on every node > only *one* core would be used. If it is useful to use 16 mpi-parallel > jobs per k-point (meaning that the matrices will distributed on 16 cores > with each core getting only 1/16 of the matrix elements) depends on your > matrix sizes (which in turn depend on your rkmax). You should check that > by grepping :rkm in your case.scf file. If the matrix size there is > small, using OMP_NUM_THREADS 16 might be much faster (since MPI adds > overhead to your calculation). > > > > Regarding /lapw0/dstart/: > > The way you set the calculation up could lead to (possible severe) > overloading of your nodes: WIEN2k will start 24 jobs on each node (so > 1.5 times the number of cores) at the same time doing the calculation > for 1 atom each. > > As one possible alternative, you specify only 8 cores per node (i.e. for > example "lapw0: g008:8" and so on) 8 jobs per node, which would lead to > step-by-step calculations for 3 atoms per core. > > Which option is faster is hard to tell and depends a lot on your hardware. > > > So what you could do - in principle - is to test multiple configurations > (you can modify your .machines file on the fly during a SCF run) in the > first cycles, compare the times (in case.dayfile), and use the faster > one for the rest of the run. > > > > Regards, > Thomas > > > [1] Sidenote: This splitting is controlled by the first number - in this > case 4 equal sublists will be set-up - you could also specifiy different > "weights", for instance, if your nodes are of different speeds, the > machinesfile could then read for example: > > > 3:g008:16 > > 2:g021:16 > > 2:g025:16 > > 1:g028:16 > > > In this case, the first node would "get" 3/8 of the k-points (84), nodes > g021 and g025 would geht 2/8 each (56), and the last one (because it is > very slow) would get only 28 k-points. > > > ------------------------------------------------------------------------ > *Von:* Wien <wien-boun...@zeus.theochem.tuwien.ac.at> im Auftrag von > Christian Søndergaard Pedersen <chr...@dtu.dk> > *Gesendet:* Montag, 12. Oktober 2020 10:24 > *An:* A Mailing list for WIEN2k users > *Betreff:* Re: [Wien] .machines for several nodes > > Thanks a lot for your answer. After re-reading the relevant pages in the > User Guide, I am still left with some questions. Specifically, I am > working with a system containing 96 atoms (as described in the > case.struct-file) and 224 inequivalent k points; i.e. 500 kpoints > distributed as a 7x8x8 grid (448 total) reduced to 224 kpoints. Running > on 4 nodes each with 16 cores, I want each of the 4 nodes to calculate > 56 k points (224/4 = 56). Meanwhile, each node should handle 24 atoms > (96/4 = 24). > > > Part of my confusion stems from your suggestion that I repeat the line > "1:g008:4 [...]" a number of times equal to the number of k points I > want to run in parallel, and that each repetition should refer to a > different node. The reason is that the line in question already contains > the names of all four nodes that were assigned to the job. However, > combining your advice with the example on page 86, the lines should read: > > > 1:g008 > > 1:g021 > > 1:g025 > > 1:g028 # k points distributed over 4 jobs, running on 1 node each > > extrafine:1 > > > As for the parallellization over atoms for dstart and lapw0, I > understand that the numbers assigned to each individual node should sum > up to the number of atoms in the system, like this: > > > dstart:g008:24 g021:24 g025:24 g028:24 > > lapw0:g008:24 g021:24 g025:24 g028:24 > > > so the final .machines-file would be a combination of the above pieces. > Have I understood this correctly, or am I missing the mark? Also, is > there any difference between distributing the k points across four jobs > (1 for each node), and across 224 jobs (by repeating each of the 1:gxxx > lines 56 times)? > > > Best regards > > Christian > > ------------------------------------------------------------------------ > *Fra:* Wien <wien-boun...@zeus.theochem.tuwien.ac.at> på vegne af Ruh, > Thomas <thomas....@tuwien.ac.at> > *Sendt:* 12. oktober 2020 09:29:37 > *Til:* A Mailing list for WIEN2k users > *Emne:* Re: [Wien] .machines for several nodes > > Hi, > > > your .machines is wrong. > > > The nodes for /lapw1 /are prefaced not with "lapw1:" but only with "1:". > /lapw2 /needs no line, as it takes the same nodes as lapw1 before. > > > So an example for your usecase would be: > > > # > > dstart:g008:4 g021:4 g025:4 g028:4 > > lapw0:g008:4 g021:4 g025:4 g028:4 > > 1:g008:4 g021:4 g025:4 g028:4 > > granularity:1 > > extrafine:1 > > > The line starting with "1:" has to be repeated (with different nodes, of > course) x times, if you want to run x k-points in parallel (you can find > more details about this in the usersguide, pages 84-91). > > > Regards, > > Thomas > > > PS: As a sidenote: Both /dstart /and /lapw0 /parallelize over atoms, so > 16 nodes might not be the best choice for your example. > > ------------------------------------------------------------------------ > *Von:* Wien <wien-boun...@zeus.theochem.tuwien.ac.at> im Auftrag von > Christian Søndergaard Pedersen <chr...@dtu.dk> > *Gesendet:* Montag, 12. Oktober 2020 09:06 > *An:* wien@zeus.theochem.tuwien.ac.at > *Betreff:* [Wien] .machines for several nodes > > Hello everybody > > > I am new to WIEN2k, and am struggling with parallellizing calculations > on our HPC cluster beyond what can be achieved using OMP. In particular, > I want to execute run_lapw and/or runsp_lapw running on four identical > nodes (16 cores each), parallellizing over k points (unless there's a > more efficient scheme). To achieve this, I try to mimic the example from > the User Guide (without the extra Alpha node), but my .machines-file > does not work the way I intended. This is what I have: > > > # > > dstart:g008:4 g021:4 g025:4 g028:4 > > lapw0:g008:4 g021:4 g025:4 g028:4 > > lapw1:g008:4 g021:4 g025:4 g028:4 > > lapw2:g008:4 g021:4 g025:4 g028:4 > > granularity:1 > > extrafine:1 > > > The node names gxxx are read from SLURM_JOB_NODELIST in the submit > script, and a couple of regular expressions generate the above lines. > Afterwards, my job script does the following: > > > srun hostname -s > slurm.hosts > run_lapw -p > > which results in a job that idles for the entire walltime and finishes > with a CPU efficiency of 0.00%. I would appreciate any help in figuring > out where I've gone wrong. > > > Best regards > Christian > > > _______________________________________________ > Wien mailing list > Wien@zeus.theochem.tuwien.ac.at > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > SEARCH the MAILING-LIST at: > http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html > -- P.Blaha -------------------------------------------------------------------------- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 Email: bl...@theochem.tuwien.ac.at WIEN2k: http://www.wien2k.at WWW: http://www.imc.tuwien.ac.at/TC_Blaha -------------------------------------------------------------------------- _______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
_______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html