I was wondering if someone could send me the HACKING file so I can do a
bit more with debugging on the snapshots. Our web proxy has webdav
methods turned off (request methods fail) so that I can't get to the
latest of the svn repos.

> Second thing. From one of your previous emails, I see that MX 
> is configured with 4 instance by node. Your running with 
> exactly 4 processes on the first 2 nodes. Weirds things might 
> happens ...

Just curious about this comment. Are you referring to over subscribing?
We run 4 processes on each node because we have 2 dual core cpu's on
each node. Am I not understanding processor counts correctly?

> PS: Is there any way you can attach to the processes with gdb 
> ? I would like to see the backtrace as showed by gdb in order 
> to be able to figure out what's wrong there.

When I can get more detailed dbg, I'll send. Though I'm not clear on
what executable is being searched for below.

$ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca pml cm
--mca mtl mx ./cpi

[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] connect_uni: connection not allowed
[juggernaut:14949] [0,0,0] setting up session dir with
[juggernaut:14949]  universe default-universe-14949
[juggernaut:14949]  user ggrobe
[juggernaut:14949]  host juggernaut
[juggernaut:14949]  jobid 0
[juggernaut:14949]  procid 0
[juggernaut:14949] procdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0/0
[juggernaut:14949] jobdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0
[juggernaut:14949] unidir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949
[juggernaut:14949] top: openmpi-sessions-ggrobe@juggernaut_0
[juggernaut:14949] tmp: /tmp
[juggernaut:14949] [0,0,0] contact_file
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/univers
e-setup.txt
[juggernaut:14949] [0,0,0] wrote setup file
[juggernaut:14949] pls:rsh: local csh: 0, local sh: 1
[juggernaut:14949] pls:rsh: assuming same remote shell as local shell
[juggernaut:14949] pls:rsh: remote csh: 0, remote sh: 1
[juggernaut:14949] pls:rsh: final template argv:
[juggernaut:14949] pls:rsh:     /usr/bin/ssh <template> orted --debug
--bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename
<template> --universe ggrobe@juggernaut:default-universe-14949
--nsreplica "0.0.0;tcp://192.168.2.10:43121" --gprreplica
"0.0.0;tcp://192.168.2.10:43121"
[juggernaut:14949] pls:rsh: launching on node juggernaut
[juggernaut:14949] pls:rsh: juggernaut is a LOCAL node
[juggernaut:14949] pls:rsh: changing to directory /home/ggrobe
[juggernaut:14949] pls:rsh: executing: orted --debug --bootproxy 1
--name 0.0.1 --num_procs 2 --vpid_start 0 --nodename juggernaut
--universe ggrobe@juggernaut:default-universe-14949 --nsreplica
"0.0.0;tcp://192.168.2.10:43121" --gprreplica
"0.0.0;tcp://192.168.2.10:43121"
[juggernaut:14950] [0,0,1] setting up session dir with
[juggernaut:14950]  universe default-universe-14949
[juggernaut:14950]  user ggrobe
[juggernaut:14950]  host juggernaut
[juggernaut:14950]  jobid 0
[juggernaut:14950]  procid 1
[juggernaut:14950] procdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0/1
[juggernaut:14950] jobdir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0
[juggernaut:14950] unidir:
/tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949
[juggernaut:14950] top: openmpi-sessions-ggrobe@juggernaut_0
[juggernaut:14950] tmp: /tmp
------------------------------------------------------------------------
--
Failed to find the following executable:

Host:       juggernaut
Executable: -b

Cannot continue.
------------------------------------------------------------------------
--
[juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file
odls_default_module.c at line 1193
[juggernaut:14949] spawn: in job_state_callback(jobid = 1, state = 0x80)
[juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file orted.c at line
575
[juggernaut:14950] sess_dir_finalize: job session dir not empty -
leaving
[juggernaut:14950] sess_dir_finalize: proc session dir not empty -
leaving
[juggernaut:14949] sess_dir_finalize: proc session dir not empty -
leaving




Reply via email to