I was wondering if someone could send me the HACKING file so I can do a bit more with debugging on the snapshots. Our web proxy has webdav methods turned off (request methods fail) so that I can't get to the latest of the svn repos.
> Second thing. From one of your previous emails, I see that MX > is configured with 4 instance by node. Your running with > exactly 4 processes on the first 2 nodes. Weirds things might > happens ... Just curious about this comment. Are you referring to over subscribing? We run 4 processes on each node because we have 2 dual core cpu's on each node. Am I not understanding processor counts correctly? > PS: Is there any way you can attach to the processes with gdb > ? I would like to see the backtrace as showed by gdb in order > to be able to figure out what's wrong there. When I can get more detailed dbg, I'll send. Though I'm not clear on what executable is being searched for below. $ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca pml cm --mca mtl mx ./cpi [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] connect_uni: connection not allowed [juggernaut:14949] [0,0,0] setting up session dir with [juggernaut:14949] universe default-universe-14949 [juggernaut:14949] user ggrobe [juggernaut:14949] host juggernaut [juggernaut:14949] jobid 0 [juggernaut:14949] procid 0 [juggernaut:14949] procdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0/0 [juggernaut:14949] jobdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0 [juggernaut:14949] unidir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949 [juggernaut:14949] top: openmpi-sessions-ggrobe@juggernaut_0 [juggernaut:14949] tmp: /tmp [juggernaut:14949] [0,0,0] contact_file /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/univers e-setup.txt [juggernaut:14949] [0,0,0] wrote setup file [juggernaut:14949] pls:rsh: local csh: 0, local sh: 1 [juggernaut:14949] pls:rsh: assuming same remote shell as local shell [juggernaut:14949] pls:rsh: remote csh: 0, remote sh: 1 [juggernaut:14949] pls:rsh: final template argv: [juggernaut:14949] pls:rsh: /usr/bin/ssh <template> orted --debug --bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename <template> --universe ggrobe@juggernaut:default-universe-14949 --nsreplica "0.0.0;tcp://192.168.2.10:43121" --gprreplica "0.0.0;tcp://192.168.2.10:43121" [juggernaut:14949] pls:rsh: launching on node juggernaut [juggernaut:14949] pls:rsh: juggernaut is a LOCAL node [juggernaut:14949] pls:rsh: changing to directory /home/ggrobe [juggernaut:14949] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename juggernaut --universe ggrobe@juggernaut:default-universe-14949 --nsreplica "0.0.0;tcp://192.168.2.10:43121" --gprreplica "0.0.0;tcp://192.168.2.10:43121" [juggernaut:14950] [0,0,1] setting up session dir with [juggernaut:14950] universe default-universe-14949 [juggernaut:14950] user ggrobe [juggernaut:14950] host juggernaut [juggernaut:14950] jobid 0 [juggernaut:14950] procid 1 [juggernaut:14950] procdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0/1 [juggernaut:14950] jobdir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949/0 [juggernaut:14950] unidir: /tmp/openmpi-sessions-ggrobe@juggernaut_0/default-universe-14949 [juggernaut:14950] top: openmpi-sessions-ggrobe@juggernaut_0 [juggernaut:14950] tmp: /tmp ------------------------------------------------------------------------ -- Failed to find the following executable: Host: juggernaut Executable: -b Cannot continue. ------------------------------------------------------------------------ -- [juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file odls_default_module.c at line 1193 [juggernaut:14949] spawn: in job_state_callback(jobid = 1, state = 0x80) [juggernaut:14950] [0,0,1] ORTE_ERROR_LOG: Fatal in file orted.c at line 575 [juggernaut:14950] sess_dir_finalize: job session dir not empty - leaving [juggernaut:14950] sess_dir_finalize: proc session dir not empty - leaving [juggernaut:14949] sess_dir_finalize: proc session dir not empty - leaving