It cannot initialize an mpi job, because it is missing the interface software.

You need to ask the computing center / system administrators how one executes a mpi job on this computer.

It could be, that "mpirun" is not supported on this machine. You may try a wien2k installation with system "LS" in siteconfig. This will configure the parallel environment/commands using "slurm" commands like srun -K -N_nodes_ -n_NP_ ..., replacing mpirun. We used it once on our hpc machine, since it was recommended by the computing center people. However, it turned out that the standard mpirun installation was more stable because the "slurm controller" died too often leading to many random crashes. Anyway, if your system has what is called "tight integration of mpi", it might be necessary.

Am 13.04.2021 um 21:47 schrieb leila mollabashi:
Dear Prof. Peter Blaha and WIEN2k users,

Then by run x lapw1 –p:

starting parallel lapw1 at Tue Apr 13 21:04:15 CEST 2021

->  starting parallel LAPW1 jobs at Tue Apr 13 21:04:15 CEST 2021

running LAPW1 in parallel mode (using .machines)

2 number_of_parallel_jobs

[1] 14530

[e0467:14538] mca_base_component_repository_open: unable to open mca_btl_uct: libucp.so.0: cannot open shared object file: No such file or directory (ignored)

WARNING: There was an error initializing an OpenFabrics device.

   Local host:   e0467

   Local device: mlx4_0

MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD

with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--------------------------------------------------------------------------

[e0467:14567] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init

[e0467:14567] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init

[e0467:14567] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

[warn] Epoll MOD(1) on fd 27 failed.  Old events were 6; read change was 0 (none); write change was 2 (del): Bad file descriptor

Somewhere there should be some documentation how one runs an mpi job on
your system.

Only I found this:

Before ordering a task, it should be encapsulated in an appropriate script understandable for the queue system, e.g .:

/home/users/user/submit_script.sl <http://submit_script.sl>

Sample SLURM script:

#! / bin / bash -l

#SBATCH -N 1

#SBATCH --mem 5000

#SBATCH --time = 20:00:00

/sciezka/do/pliku/binarnego/plik_binarny.in <http://plik_binarny.in>> /sciezka/do/pliku/wyjsciowego.out

To order a task to a specific queue, use the #SBATCH -p parameter, e.g.

#! / bin / bash -l

#SBATCH -N 1

#SBATCH --mem 5000

#SBATCH --time = 20:00:00

#SBATCH -p standard

/sciezka/do/pliku/binarnego/plik_binarny.in <http://plik_binarny.in>> /siezka/do/pliku/wyjsciowego.out

The task must then be ordered using the *sbatch* command

sbatch /home/users/user/submit_script.sl <http://submit_script.sl>

*Ordering interactive tasks***


Interactive tasks can be divided into two groups:

·interactive task (working in text mode)

·interactive task

*Interactive task (working in text mode)***


Ordering interactive tasks is very simple and in the simplest case it comes down to issuing the command below.

srun --pty / bin / bash

Sincerely yours,

Leila Mollabashi


On Wed, Apr 14, 2021 at 12:03 AM leila mollabashi <le.mollaba...@gmail.com <mailto:le.mollaba...@gmail.com>> wrote:

    Dear Prof. Peter Blaha and WIEN2k users,

    Thank you for your assistances.

    >  At least now the error: "lapw0 not found" is gone. Do you
    understand why ??

    Yes, I think that because now the path is clearly known.

    >How many slots do you get by this srun command ?

    Usually I went to node with 28 CPUs.

    >Is this the node with the name  e0591 ???

    Yes, it is.

    >Of course the .machines file must be consistent (dynamically adapted)

    with the actual nodename.

    Yes, to do this I use my script.

    >When I  use “srun --pty -n 8 /bin/bash” that goes to the node with 8 free
    cores, and run x lapw0 –p then this happens:

    starting parallel lapw0 at Tue Apr 13 20:50:49 CEST 2021

    -------- .machine0 : 4 processors

    [1] 12852

    [e0467:12859] mca_base_component_repository_open: unable to open
    mca_btl_uct: libucp.so.0: cannot open shared object file: No such
    file or directory (ignored)

    [e0467][[56319,1],1][btl_openib_component.c:1699:init_one_device]
    error obtaining device attributes for mlx4_0 errno says Protocol not
    supported

    [e0467:12859] mca_base_component_repository_open: unable to open
    mca_pml_ucx: libucp.so.0: cannot open shared object file: No such
    file or directory (ignored)

    LAPW0 END

    [1]    Done                          mpirun -np 4 -machinefile
    .machine0 /home/users/mollabashi/v19.2/lapw0_mpi lapw0.def >> .time00

    Sincerely yours,

    Leila Mollabashi


_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


--
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at
-------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to