Hi Ralph and Jeff,
Thanks a lot for the advice. Our cluster setup is pretty "limited",
it is an OpenSSI cluster with several P3 and P4 machines connected
through Ethernet. So I guess we won't be able to take full speed
advantage of the current OMPI implementation.
So far I narrowed down the problem (oh well, kind of solved) to
share memory permission for normal user. Pretty sure the machine
got enough memory since starting two processes works fine for root.
I list the test cases below:
*** As root
$ mpirun --mca btl sm --np 1 tut01
oceanus:Hello world from 0
$ mpirun --mca btl sm --np 2 tut01
oceanus:Hello world from 1
oceanus:Hello world from 0
*** As normal user
$ mpirun --mca btl sm --np 1 tut01
oceanus:Hello world from 0
$ mpirun --mca btl sm --np 2 tut01
[oceanus:126207] mca_common_sm_mmap_init: ftruncate failed with
errno=13
[oceanus:126207] mca_mpool_sm_init: unable to create shared memory
mapping (/tmp/openmpi-sessions-eddie@localhost_0/default-
universe-126204/1/shared_mem_pool.localhost ).....
$ free -m
total used free shared buffers
cached
Mem: 499 491 7 0
179 37
-/+ buffers/cache: 274 224
Swap: 1027 0 1027
$ echo "Hello World" > /tmp/eddie.txt | ll /tmp/eddie.txt
-rw-rw-r-- 1 eddie eddie 12 Jan 22 11:58 /tmp/eddie.txt
Not so sure why normal user can't create shared memory for two
processes, it is very strange.
Cheers,
Eddie.
On 1/19/07, Ralph Castain <r...@lanl.gov> wrote:
What that parameter does is turn "off" all of the transports except
tcp – so the problem you're seeing goes away because we no longer
try to create the shared memory file. This will somewhat hurt your
performance, but it will work.
Alternatively, you could use "--mca btl ^sm", which would allow you
to use whatever high speed interconnects are on your system while
still turning "off" the shared memory file.
I'm not sure why your tmp directory is getting its permissions
wrong. It sounds like there is something in your environment that
is doing something unexpected. You might just write a script and
execute it that creates a file and lists its permissions – would be
interesting to see if the user or access permissions are different
than what you would normally expect.
Ralph
On 1/18/07 8:30 PM, "eddie168" < eddie168+ompi_u...@gmail.com> wrote:
Just to answer my own question, after I explicitly specify the "--
mca btl tcp" parameter, the program works. So I will need to issue
command like this:
$ mpirun --mca btl tcp -np 2 tut01
oceanus:Hello world from 0
oceanus:Hello world from 1
Regards,
Eddie.
On 1/18/07, eddie168 < eddie168+ompi_u...@gmail.com> wrote:
Hi Ralph and Brian,
Thanks for the advice, I have checked the permission to /tmp
drwxrwxrwt 19 root root 4096 Jan 18 11:38 tmp
which I think there shouldn't be any problem to create files there,
so option (a) still not work for me.
I tried option (b) which set --tmpdir on command line and run as
normal user, it works for -np 1, however it gives the same error
for -np 2.
Option (c) also tested by setting "OMPI_MCA_tmpdir_base = /home2/
mpi_tut/tmp" in "~/.openmpi/mca- params.conf", however error still
occurred.
I included the debug output of what I ran (with IP masked), I
noticed that the optional tmp directory set in the beginning of the
process, however it changed back to "/tmp" after executing orted.
Could the error I got related to SSH setting?
Many thanks,
Eddie.
[eddie@oceanus:~/home2/mpi_tut]$ mpirun -d --tmpdir /home2/mpi_tut/
tmp -np 2 tut01
[oceanus:129119] [0,0,0] setting up session dir with
[oceanus:129119] tmpdir /home2/mpi_tut/tmp
[oceanus:129119] universe default-universe
[oceanus:129119] user eddie
[oceanus:129119] host oceanus
[oceanus:129119] jobid 0
[oceanus:129119] procid 0
[oceanus:129119] procdir: /home2/mpi_tut/tmp/openmpi- sessions-
eddie@oceanus_0/default-universe/0/0
[oceanus:129119] jobdir: /home2/mpi_tut/tmp/openmpi-sessions-
eddie@oceanus_0/default-universe/0
[oceanus:129119] unidir: /home2/mpi_tut/tmp/openmpi-sessions-
eddie@oceanus _0/default-universe
[oceanus:129119] top: openmpi-sessions-eddie@oceanus_0
[oceanus:129119] tmp: /home2/mpi_tut/tmp
[oceanus:129119] [0,0,0] contact_file /home2/mpi_tut/tmp/openmpi-
sessions-eddie@oceanus_0/default-universe/universe- setup.txt
[oceanus:129119] [0,0,0] wrote setup file
[oceanus:129119] pls:rsh: local csh: 0, local bash: 1
[oceanus:129119] pls:rsh: assuming same remote shell as local shell
[oceanus:129119] pls:rsh: remote csh: 0, remote bash: 1
[oceanus:129119] pls:rsh: final template argv:
[oceanus:129119] pls:rsh: /usr/bin/ssh <template> orted --debug
--bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --
nodename <template> --universe eddie@oceanus:default-universe --
nsreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --gprreplica "
0.0.0;tcp://xxx.xxx.xxx.xxx:52428" --mpi-call-yield 0
[oceanus:129119] pls:rsh: launching on node localhost
[oceanus:129119] pls:rsh: oversubscribed -- setting
mpi_yield_when_idle to 1 (1 2)
[oceanus:129119] pls:rsh: localhost is a LOCAL node
[oceanus:129119] pls:rsh: changing to directory /home/eddie
[oceanus:129119] pls:rsh: executing: orted --debug --bootproxy 1 --
name 0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost --
universe eddie@oceanus:default-universe
<mailto:eddie@oceanus:default-universe> --nsreplica "0.0.0;tcp://
xxx.xxx.xxx.xxx:52428" --gprreplica "0.0.0;tcp://xxx.xxx.xxx.xxx:
52428" --mpi-call-yield 1
[oceanus:129120] [0,0,1] setting up session dir with
[oceanus:129120] universe default-universe
[oceanus:129120] user eddie
[oceanus:129120] host localhost
[oceanus:129120] jobid 0
[oceanus:129120] procid 1
[oceanus:129120] procdir: /tmp/openmpi-sessions-eddie@localhost_0/
default-universe/0/1
[oceanus:129120] jobdir: /tmp/openmpi-sessions-eddie@localhost _0/
default-universe/0
[oceanus:129120] unidir: /tmp/openmpi-sessions-eddie@localhost_0/
default-universe
[oceanus:129120] top: openmpi-sessions-eddie@localhost_0
[oceanus:129120] tmp: /tmp
[oceanus:129121] [0,1,0] setting up session dir with
[oceanus:129121] universe default-universe
[oceanus:129121] user eddie
[oceanus:129121] host localhost
[oceanus:129121] jobid 1
[oceanus:129121] procid 0
[oceanus:129121] procdir: /tmp/openmpi- sessions-eddie@localhost_0/
default-universe/1/0
[oceanus:129121] jobdir: /tmp/openmpi-sessions-eddie@localhost_0/
default-universe/1
[oceanus:129121] unidir: /tmp/openmpi-sessions-eddie@localhost_0/
default-universe
[oceanus:129121] top: openmpi-sessions-eddie@localhost_0
[oceanus:129121] tmp: /tmp
[oceanus:129122] [0,1,1] setting up session dir with
[oceanus:129122] universe default-universe
[oceanus:129122] user eddie
[oceanus:129122] host localhost
[oceanus:129122] jobid 1
[oceanus:129122] procid 1
[oceanus:129122] procdir: /tmp/openmpi-sessions-eddie@localhost_0/
default-universe/1/1
[oceanus:129122] jobdir: /tmp/openmpi- sessions-eddie@localhost_0/
default-universe/1
[oceanus:129122] unidir: /tmp/openmpi-sessions-eddie@localhost_0/
default-universe
[oceanus:129122] top: openmpi-sessions-eddie@localhost_0
[oceanus:129122] tmp: /tmp
[oceanus:129119] spawn: in job_state_callback(jobid = 1, state = 0x4)
[oceanus:129119] Info: Setting up debugger process table for
applications
MPIR_being_debugged = 0
MPIR_debug_gate = 0
MPIR_debug_state = 1
MPIR_acquired_pre_main = 0
MPIR_i_am_starter = 0
MPIR_proctable_size = 2
MPIR_proctable:
(i, host, exe, pid) = (0, localhost, tut01, 129121)
(i, host, exe, pid) = (1, localhost, tut01, 129122)
[oceanus:129121] mca_common_sm_mmap_init: ftruncate failed with
errno=13
[oceanus:129121] mca_mpool_sm_init: unable to create shared memory
mapping (/tmp/openmpi-sessions-eddie@localhost_0/default-universe/1/
shared_mem_pool.localhost )
----------------------------------------------------------------------
----
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Out of resource" (-2) instead of "Success" (0)
----------------------------------------------------------------------
----
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[oceanus:129120] sess_dir_finalize: found proc session dir empty -
deleting
[oceanus:129120] sess_dir_finalize: job session dir not empty -
leaving
[oceanus:129120] sess_dir_finalize: found proc session dir empty -
deleting
[oceanus:129120] sess_dir_finalize: found job session dir empty -
deleting
[oceanus:129120] sess_dir_finalize: univ session dir not empty -
leaving
[oceanus:129120] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_TERMINATED)
[oceanus:129120] sess_dir_finalize: job session dir not empty -
leaving
[oceanus:129120] sess_dir_finalize: found proc session dir empty -
deleting
[oceanus:129120] sess_dir_finalize: found job session dir empty -
deleting
[oceanus:129120] sess_dir_finalize: found univ session dir empty -
deleting
[oceanus:129120] sess_dir_finalize: found top session dir empty -
deleting
[eddie@oceanus:~/home2/mpi_tut]$
On 1/18/07, Ralph H Castain <r...@lanl.gov> wrote:
Hi Eddie
Open MPI needs to create a temporary file system – what we call our
"session directory" - where it stores things like the shared memory
file. From this output, it appears that your /tmp directory is
"locked" to root access only.
You have three options for resolving this problem:
(a) you could make /tmp accessible to general users;
(b) you could use the —tmpdir xxx command line option to point Open
MPI at another directory that is accessible to the user (for
example, you could use a "tmp" directory under the user's home
directory); or
(c) you could set an MCA parameter OMPI_MCA_tmpdir_base to identify
a directory we can use instead of /tmp.
If you select options (b) or (c), the only requirement is that
this location must be accessible on every node being used. Let me
be clear on this: the tmp directory must not be NSF mounted and
therefore shared across all nodes. However, each node must be able
to access a location of the given name – that location should be
strictly local to each node.
Hope that helps
Ralph
On 1/17/07 12:25 AM, "eddie168" < eddie168+ompi_u...@gmail.com
<mailto:eddie168+ompi_u...@gmail.com> > wrote:
Dear all,
I have recently installed OpenMPI 1.1.2 on a OpenSSI cluster
running Fedora core 3. I tested a simple hello world mpi program
(attached) and it runs ok as root. However, if I run the same
program under normal user, it gives the following error:
[eddie@oceanus:~/home2/mpi_tut]$ mpirun -np 2 tut01
[oceanus:125089] mca_common_sm_mmap_init: ftruncate failed with
errno=13
[oceanus:125089] mca_mpool_sm_init: unable to create shared memory
mapping ( /tmp/openmpi- sessions-eddie@localhost_0/default-universe/
1/shared_mem_pool.localhost)
----------------------------------------------------------------------
----
It looks like MPI_INIT failed for some reason; your parallel
process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Out of resource" (-2) instead of "Success" (0)
----------------------------------------------------------------------
----
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[eddie@oceanus:~/home2/mpi_tut]$
Am I need to give certain permission to the user in order to
oversubscribe processes?
Thanks in advance,
Eddie.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users