Thank you Erik - it seems that when I keep the num-threads variable in machine.ini file, it overrides any argument given to simfactory executable. Removing it appears to fix the problem.
If I don't provide --configuration=okeanos and --machine=okeanos though, it defaults to any local (login) node I used to ./simfactory/bin/sim setup simfactory. So I suppose in each invocation I need to specify it? After commenting out num-threads in okeanos.ini, here's what works and what doesn't: ./simfactory/bin/sim create-submittopoltopolski@okeanos-login1:~/Cactus> ./simfactory/bin/sim submit GW150914_MPI --parfile GW150914_MPI.rpar --define N 28 --walltime=1:00:00 --procs 192 --num-threads=4 Warning: Unknown machine name nid00069 Error: Unknown local machine nid00069. Please use 'sim setup' to create a local machine entry from the generic template. Aborting Simfactory. topolski@okeanos-login1:~/Cactus> ./simfactory/bin/sim submit GW150914_MPI --parfile GW150914_MPI.rpar --define N 28 --walltime=1:00:00 --procs 192 --num-threads=4 --machine=okeanos Warning: Current Working directory does not match Cactus sourcetree, changing to /home/topolski/Cactus Warning: simulation "GW150914_MPI" does not exist or is not readable Parameter file: /lustre/tetyda/home/topolski/Cactus/GW150914_MPI.rpar Error: Executable /home/topolski/Cactus/exe/cactus_sim for configuration sim does not exist or is not readable Aborting Simfactory. topolski@okeanos-login1:~/Cactus> ./simfactory/bin/sim create-submit GW150914_MPI --parfile GW150914_MPI.rpar --define N 28 --walltime=1:00:00 --procs 192 --num-threads=4 --machine=okeanos --configuration=okeanos Warning: Current Working directory does not match Cactus sourcetree, changing to /home/topolski/Cactus Parameter file: /lustre/tetyda/home/topolski/Cactus/GW150914_MPI.rpar Skeleton Created Job directory: "/home/topolski/simulations/GW150914_MPI" Executable: "/home/topolski/Cactus/exe/cactus_okeanos" Option list: "/home/topolski/simulations/GW150914_MPI/SIMFACTORY/cfg/OptionList" Submit script: "/home/topolski/simulations/GW150914_MPI/SIMFACTORY/run/SubmitScript" Run script: "/home/topolski/simulations/GW150914_MPI/SIMFACTORY/run/RunScript" Parameter file: "/home/topolski/simulations/GW150914_MPI/SIMFACTORY/par/GW150914_MPI.rpar" Assigned restart id: 0 Executing submit command: sbatch /home/topolski/simulations/GW150914_MPI/output-0000/SIMFACTORY/SubmitScript Submit finished, job id is 739787 And the last attempt yields the desired result. So I suppose this is the correct way? śr., 14 lip 2021 o 01:25 Erik Schnetter <[email protected]> napisał(a): > Konrad > > Changing the number of MPI processes and OpenMP threads with > Simfactory when restarting works the same way as setting them in the > first place. For example, your first run might be submitted with > > ./simfactory/bin/sim submit poisson --parfile=poisson.par --procs=120 > --num-threads=4 > > You can restart this simulations with > > ./simfactory/bin/sim submit poisson > > which will re-use the original settings. You can also restart with > > ./simfactory/bin/sim submit poisson --procs=160 --num-threads=8 > > to change these settings. > > If this does not work, then your machine might be configured wrong. > For example, you say that you specify the number of MPI processes by > setting "--num-threads", which sounds suspicious. > > The default-generated machine configuration only works for > workstations or laptops. If you run this script on an HPC system, it > will generate a nonsense configuration, and might even hide a "real" > configuration if one is present. > > -erik > > > On Tue, Jul 13, 2021 at 5:23 PM Konrad Topolski > <[email protected]> wrote: > > > > Hi, > > > > I am currently trying to find what the optimal number of MPI processes > is for my purposes. > > I have managed to change the number of MPI processes when restarting a > simulation from a checkpoint - but using the bare executable, not > simfactory. > > > > Now, I would like to learn how to do it in simfactory. > > > > I have learned that to successfully steer the number of threads per 1 > MPI process (which, combined with a total number of threads requested, > yields the total number of MPI processes), I change the num-thread variable > in the machine.ini file. > > This is probably (certainly?) suboptimal, so if there's a proper way, > I'd like to learn it. > > > > I submit/recover simulations via . > > /simfactory/bin/sim submit <sim_name> --parfile <parfile_name> > --recover --procs NUM_PROCS --machine=okeanos --configuration=okeanos > > > > If I don't use the --machine option specifying my cluster, it will > default to some config with max nodes = 1 (generic?). Which is why I steer > MPI processes via num-thread. > > > > Trying to recover a simulation via simfactory with a new machine file > (with num-thread changed) yields an error message: > > > > INFO (Carpet): MPI is enabled > > INFO (Carpet): Carpet is running on 4 processes > > WARNING level 0 from host nid00392 process 0 > > in thorn Carpet, file > /lustre/tetyda/home/topolski/Cactus/arrangements/Carpet/Carpet/src/SetupGH.cc:148: > > -> The environment variable CACTUS_NUM_PROCS is set to 96, but there > are 4 MPI processes. This may indicate a severe problem with the MPI > startup mechanism. > > > > What can I do to recover a simulation via simfactory and use a different > number of MPI processes? > > > > While I'm at it, can I also change parameters such as the number of > refinement levels or make new guesses for AHFinderDirect, in case the > previously-used parameters did not provide high enough resolution for a > successful find? > > > > Best regards > > Konrad Topolski > > > > > > > > > > > > _______________________________________________ > > Users mailing list > > [email protected] > > http://lists.einsteintoolkit.org/mailman/listinfo/users > > > > -- > Erik Schnetter <[email protected]> > http://www.perimeterinstitute.ca/personal/eschnetter/ >
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
