For sure your script is not ok. Most likely you should take the slurm batch file from our website (faq page) as basis.
-----------------------
Please check the created   .machines file

You do not have a line:

lapw0: host:xx     where xx is number of cores, .....

Therefore the message:  lapw0 not found  (instead if lapw0_mpi not found).

So the script tried to run lapw0 in serial (not mpi-parallel).
------------------------------
Still, this means that your environment in the batch job is not ok.

Maybe you need lines like:

source .bashrc

and checks like:

echo $WIENROOT
which lapw0
which lapw0_mpi



Am 28.03.2021 um 22:37 schrieb leila mollabashi:
Dear Wien2k users,

I have a problem with MPI parallelization while I have compiled the code with no error. The v19.2 version of WIEN2k has been compiled with ifort, cc and openmpi. The mkl and FFTW libraries were also used. On the SLURM queue cluster I can run with the k-point parallel mode. But I could not run it mpi parallel mode even on 1 node.  I used this script to run:

Sbatch submit_script.sl <http://submit_script.sl>

Which submit_script.sl <http://submit_script.sl> file is for example as follows:

#! /bin/bash -l

hostname

rm -fr .machines

# for 4 cpus and kpoints (in input file)

nproc=4

#write .machines file

echo '#' .machines

# example for an MPI parallel lapw0

echo 'lapw0:'`hostname`'

#:'$nproc >> .machines

# k-point and mpi parallel lapw1/2

echo '1:'`hostname`':2' >> .machines

echo '1:'`hostname`':2' >> .machines

echo 'granularity:1' >>.machines

echo 'extrafine:1' >>.machines

  run_lapw –p

Then this error appears:

error: command /home/users/mollabashi/v19.2/lapw0para lapw0.def   failed

slurm-17032361.out file is as follows:

# .machines

bash: lapw0: command not found

real    0m0.001s

user    0m0.000s

sys     0m0.001s

grep: *scf1*: No such file or directory

grep: lapw2*.error: No such file or directory

   stop error

Then when I manually run this error appears:

There are not enough slots available in the system to satisfy the 4

slots that were requested by the application:

/home/users/mollabashi/v19.2/lapw0_mpi

Either request fewer slots for your application, or make more slots

available for use.

A "slot" is the Open MPI term for an allocatable unit where we can

launch a process. The number of slots available are defined by the

environment in which Open MPI processes are run:

   1. Hostfile, via "slots=N" clauses (N defaults to number of

      processor cores if not provided)

   2. The --host command line parameter, via a ":N" suffix on the

      hostname (N defaults to 1 if not provided)

   3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)

   4. If none of a hostfile, the --host command line parameter, or an

      RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number

of hardware threads instead of the number of processor cores, use the

--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the

number of available slots when deciding the number of processes to

launch.

--------------------------------------------------------------------------

[1]    Exit 1                        mpirun -np 4 -machinefile .machine0 /home/users/mollabashi/v19.2/lapw0_mpi lapw0.def >> .time00

--------------------------------------------------------------------------

There are not enough slots available in the system to satisfy the 2

slots that were requested by the application:

/home/users/mollabashi/v19.2/lapw1_mpi

Either request fewer slots for your application, or make more slots

available for use.

A "slot" is the Open MPI term for an allocatable unit where we can

launch a process. The number of slots available are defined by the

environment in which Open MPI processes are run:

   1. Hostfile, via "slots=N" clauses (N defaults to number of

      processor cores if not provided)

   2. The --host command line parameter, via a ":N" suffix on the

      hostname (N defaults to 1 if not provided)

   3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)

   4. If none of a hostfile, the --host command line parameter, or an

      RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number

of hardware threads instead of the number of processor cores, use the

--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the

number of available slots when deciding the number of processes to

launch.

--------------------------------------------------------------------------

[1]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop

--------------------------------------------------------------------------

There are not enough slots available in the system to satisfy the 2

slots that were requested by the application:

/home/users/mollabashi/v19.2/lapw1_mpi

Either request fewer slots for your application, or make more slots

available for use.

A "slot" is the Open MPI term for an allocatable unit where we can

launch a process. The number of slots available are defined by the

environment in which Open MPI processes are run:

   1. Hostfile, via "slots=N" clauses (N defaults to number of

      processor cores if not provided)

   2. The --host command line parameter, via a ":N" suffix on the

      hostname (N defaults to 1 if not provided)

   3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)

   4. If none of a hostfile, the --host command line parameter, or an

      RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number

of hardware threads instead of the number of processor cores, use the

--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the

number of available slots when deciding the number of processes to

launch.

--------------------------------------------------------------------------

[1]  + Done                          ( cd $PWD; $t $ttt; rm -f .lock_$lockfile[$p] ) >> .time1_$loop

ce.scf1_1: No such file or directory.

grep: *scf1*: No such file or directory

LAPW2 - Error. Check file lapw2.error

cp: cannot stat ‘.in.tmp’: No such file or directory

grep: *scf1*: No such file or directory

   stop error

Would you please kindly guide me?

Sincerely yours,

Leila Mollabashi


_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


--
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at
-------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to