Hi Ralph and Brian,

Thanks for the advice, I have checked the permission to /tmp

drwxrwxrwt   19 root  root  4096 Jan 18 11:38 tmp

which I think there shouldn't be any problem to create files there, so
option (a) still not work for me.

I tried option (b) which set --tmpdir on command line and run as normal
user, it works for -np 1, however it gives the same error for -np 2.

Option (c) also tested by setting "OMPI_MCA_tmpdir_base =
/home2/mpi_tut/tmp" in "~/.openmpi/mca-params.conf", however error still
occurred.

I included the debug output of what I ran (with IP masked), I noticed that
the optional tmp directory set in the beginning of the process, however it
changed back to "/tmp" after executing orted. Could the error I got related
to SSH setting?

Many thanks,

Eddie.


[eddie@oceanus:~/home2/mpi_tut]$ mpirun -d --tmpdir /home2/mpi_tut/tmp -np 2
tut01
[oceanus:129119] [0,0,0] setting up session dir with
[oceanus:129119]        tmpdir /home2/mpi_tut/tmp
[oceanus:129119]        universe default-universe
[oceanus:129119]        user eddie
[oceanus:129119]        host oceanus
[oceanus:129119]        jobid 0
[oceanus:129119]        procid 0
[oceanus:129119] procdir:
/home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/0/0
[oceanus:129119] jobdir:
/home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/0
[oceanus:129119] unidir:
/home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe
[oceanus:129119] top: openmpi-sessions-eddie@oceanus_0
[oceanus:129119] tmp: /home2/mpi_tut/tmp
[oceanus:129119] [0,0,0] contact_file
/home2/mpi_tut/tmp/openmpi-sessions-eddie@oceanus_0/default-universe/universe-setup.txt
[oceanus:129119] [0,0,0] wrote setup file
[oceanus:129119] pls:rsh: local csh: 0, local bash: 1
[oceanus:129119] pls:rsh: assuming same remote shell as local shell
[oceanus:129119] pls:rsh: remote csh: 0, remote bash: 1
[oceanus:129119] pls:rsh: final template argv:
[oceanus:129119] pls:rsh:     /usr/bin/ssh <template> orted --debug
--bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename
<template> --universe eddie@oceanus:default-universe --nsreplica
"0.0.0;tcp://xxx.xxx.xxx.xxx:52428"
--gprreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 0
[oceanus:129119] pls:rsh: launching on node localhost
[oceanus:129119] pls:rsh: oversubscribed -- setting mpi_yield_when_idle to 1
(1 2)
[oceanus:129119] pls:rsh: localhost is a LOCAL node
[oceanus:129119] pls:rsh: changing to directory /home/eddie
[oceanus:129119] pls:rsh: executing: orted --debug --bootproxy 1 --name
0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost --universe
eddie@oceanus:default-universe --nsreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428"
--gprreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 1
[oceanus:129120] [0,0,1] setting up session dir with
[oceanus:129120]        universe default-universe
[oceanus:129120]        user eddie
[oceanus:129120]        host localhost
[oceanus:129120]        jobid 0
[oceanus:129120]        procid 1
[oceanus:129120] procdir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe/0/1
[oceanus:129120] jobdir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe/0
[oceanus:129120] unidir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe
[oceanus:129120] top: openmpi-sessions-eddie@localhost_0
[oceanus:129120] tmp: /tmp
[oceanus:129121] [0,1,0] setting up session dir with
[oceanus:129121]        universe default-universe
[oceanus:129121]        user eddie
[oceanus:129121]        host localhost
[oceanus:129121]        jobid 1
[oceanus:129121]        procid 0
[oceanus:129121] procdir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/0
[oceanus:129121] jobdir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe/1
[oceanus:129121] unidir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe
[oceanus:129121] top: openmpi-sessions-eddie@localhost_0
[oceanus:129121] tmp: /tmp
[oceanus:129122] [0,1,1] setting up session dir with
[oceanus:129122]        universe default-universe
[oceanus:129122]        user eddie
[oceanus:129122]        host localhost
[oceanus:129122]        jobid 1
[oceanus:129122]        procid 1
[oceanus:129122] procdir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/1
[oceanus:129122] jobdir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe/1
[oceanus:129122] unidir:
/tmp/openmpi-sessions-eddie@localhost_0/default-universe
[oceanus:129122] top: openmpi-sessions-eddie@localhost_0
[oceanus:129122] tmp: /tmp
[oceanus:129119] spawn: in job_state_callback(jobid = 1, state = 0x4)
[oceanus:129119] Info: Setting up debugger process table for applications
 MPIR_being_debugged = 0
 MPIR_debug_gate = 0
 MPIR_debug_state = 1
 MPIR_acquired_pre_main = 0
 MPIR_i_am_starter = 0
 MPIR_proctable_size = 2
 MPIR_proctable:
   (i, host, exe, pid) = (0, localhost, tut01, 129121)
   (i, host, exe, pid) = (1, localhost, tut01, 129122)
[oceanus:129121] mca_common_sm_mmap_init: ftruncate failed with errno=13
[oceanus:129121] mca_mpool_sm_init: unable to create shared memory mapping (
/tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/shared_mem_pool.localhost
)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 PML add procs failed
 --> Returned "Out of resource" (-2) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
[oceanus:129120] sess_dir_finalize: job session dir not empty - leaving
[oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
[oceanus:129120] sess_dir_finalize: found job session dir empty - deleting
[oceanus:129120] sess_dir_finalize: univ session dir not empty - leaving
[oceanus:129120] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_TERMINATED)
[oceanus:129120] sess_dir_finalize: job session dir not empty - leaving
[oceanus:129120] sess_dir_finalize: found proc session dir empty - deleting
[oceanus:129120] sess_dir_finalize: found job session dir empty - deleting
[oceanus:129120] sess_dir_finalize: found univ session dir empty - deleting
[oceanus:129120] sess_dir_finalize: found top session dir empty - deleting
[eddie@oceanus:~/home2/mpi_tut]$


On 1/18/07, Ralph H Castain <r...@lanl.gov> wrote:

Hi Eddie

Open MPI needs to create a temporary file system – what we call our
"session directory" - where it stores things like the shared memory file.
From this output, it appears that your /tmp directory is "locked" to root
access only.

You have three options for resolving this problem:

(a) you could make /tmp accessible to general users;

(b) you could use the —tmpdir xxx command line option to point Open MPI at
another directory that is accessible to the user (for example, you could use
a "tmp" directory under the user's home directory); or

(c) you could set an MCA parameter OMPI_MCA_tmpdir_base to identify a
directory we can use instead of /tmp.

 If you select options (b) or (c), the only requirement is that this
location must be accessible on every node being used. Let me be clear on
this: the tmp directory *must not* be NSF mounted and therefore shared
across all nodes. However, each node must be able to access a location of
the given name – that location should be strictly local to each node.

Hope that helps
Ralph



On 1/17/07 12:25 AM, "eddie168" <eddie168+ompi_u...@gmail.com> wrote:

 Dear all,

I have recently installed OpenMPI 1.1.2 on a OpenSSI cluster running
Fedora core 3. I tested a simple hello world mpi program (attached) and it
runs ok as root. However, if I run the same program under normal user, it
gives the following error:

[eddie@oceanus:~/home2/mpi_tut]$ mpirun -np 2 tut01
[oceanus:125089] mca_common_sm_mmap_init: ftruncate failed with errno=13
[oceanus:125089] mca_mpool_sm_init: unable to create shared memory mapping
( /tmp/openmpi-sessions-eddie@localhost
_0/default-universe/1/shared_mem_pool.localhost)
--------------------------------------------------------------------------

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
  PML add procs failed
  --> Returned "Out of resource" (-2) instead of "Success" (0)
--------------------------------------------------------------------------

*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[eddie@oceanus:~/home2/mpi_tut]$

Am I need to give certain permission to the user in order to oversubscribe
processes?

Thanks in advance,

Eddie.



------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to