You state "However, a .machines file with several machines will run using all
required CPUs on the machine where launched (ignoring hosts)."

That implies that you have not correctly configured the command to execute
the mpi task. Without knowledge of what this is on your system (mpirun,
srun, other) it is impossible to say more than this.


On Mon, Sep 10, 2018, 13:57 Luc Fruchter <luc.fruch...@u-psud.fr> wrote:

> Dear users,
>
> I failed configuring the parallel options to run cases on several
> machines, each of them with several CPUs, driven by ssh protocol.
>
> * Configuring the parallel options with: shared memory, MPI = 0, ssh
> protocol, allows to run parallel jobs using several CPUs on the same
> machine. However, a .machines file with several machines will run using
> all required CPUs on the machine where launched (ignoring hosts).
>
> - Configuring with: no shared memory, MPI = 0, ssh protocol, will run no
> parallel jobs, either on the same or different machines (Below is the
> output for the error in this case).
>
> All machines communicate without problem with ssh and no password, and
> have identical file paths.
>
> Thanks for helping
>
> ------------------------------------------------------------------
>
>  >   lapw0  -p  (20:33:36) starting parallel lapw0 at Mon Sep 10 20:33:36
> CEST 2018
> -------- .machine0 : processors
> running lapw0 in single mode
> 6.793u 0.073s 0:06.86 100.0%    0+0k 0+5152io 0pf+0w
>  >   lapw1  -p          (20:33:43) starting parallel lapw1 at Mon Sep 10
> 20:33:43 CEST 2018
> ->  starting parallel LAPW1 jobs at Mon Sep 10 20:33:43 CEST 2018
> running LAPW1 in parallel mode (using .machines)
> 1 number_of_parallel_jobs
>       localhost(48)    Summary of lapw1para:
>     localhost    k=48    user=0  wallclock=0
> 0.112u 0.158s 0:02.28 11.4%     0+0k 0+224io 0pf+0w
>  >   lapw2 -p           (20:33:45) running LAPW2 in parallel mode
> **  LAPW2 crashed!
> 0.085u 0.062s 0:00.13 107.6%    0+0k 0+872io 0pf+0w
> error: command   /root/Documents/WIEN2KROOT/lapw2para lapw2.def   failed
>
>  >   stop error
> _______________________________________________
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=jnWlNOsPQtu8S9u0zpnjg1uwkVTrqkU_pN0NSD9BZ8g&s=P9YpOLMVPRwD8rg_-dqUngDGtvXLh4QdqM0nLjjzfgI&e=
> SEARCH the MAILING-LIST at:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=jnWlNOsPQtu8S9u0zpnjg1uwkVTrqkU_pN0NSD9BZ8g&s=9gXTE0OrAjv5wkaa5hWJTChJjI_2UG4VVZDEVV4W2ZM&e=
>
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to