Yes, the -npernode segv is a known issue. We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl and see if that fixes your problem?
http://www.open-mpi.org/nightly/v1.4/ On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote: > Hello OMPI: > > We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. OMPI > was built uisng Intel compilers 11.1.072. I am attaching the configuration > log and output from ompi_info -a. > > The problem we are encountering is that whenever we use option '-npernode N' > in the mpirun command line we get a segmentation fault as in below: > > > miket@login002[pts/7]PS $ mpirun -npernode 1 --display-devel-map > --tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname > > Map generated by mapping policy: 0402 > Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE > Num new daemons: 2 New daemon starting vpid 1 > Num nodes: 3 > > Data for node: Name: login001 Launch id: -1 Arch: 0 State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 > Daemon: [[44812,0],1] Daemon launched: False > Num slots: 1 Slots in use: 2 > Num slots allocated: 1 Max slots: 0 > Username on node: NULL > Num procs: 1 Next node_rank: 1 > Data for proc: [[44812,1],0] > Pid: 0 Local rank: 0 Node rank: 0 > State: 0 App_context: 0 Slot list: NULL > > Data for node: Name: login002 Launch id: -1 Arch: ffc91200 > State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 > Daemon: [[44812,0],0] Daemon launched: True > Num slots: 1 Slots in use: 2 > Num slots allocated: 1 Max slots: 0 > Username on node: NULL > Num procs: 1 Next node_rank: 1 > Data for proc: [[44812,1],0] > Pid: 0 Local rank: 0 Node rank: 0 > State: 0 App_context: 0 Slot list: NULL > > Data for node: Name: login003 Launch id: -1 Arch: 0 State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 > Daemon: [[44812,0],2] Daemon launched: False > Num slots: 1 Slots in use: 2 > Num slots allocated: 1 Max slots: 0 > Username on node: NULL > Num procs: 1 Next node_rank: 1 > Data for proc: [[44812,1],0] > Pid: 0 Local rank: 0 Node rank: 0 > State: 0 App_context: 0 Slot list: NULL > [login002:02079] *** Process received signal *** > [login002:02079] Signal: Segmentation fault (11) > [login002:02079] Signal code: Address not mapped (1) > [login002:02079] Failing at address: 0x50 > [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0] > [login002:02079] [ 1] > /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7) > [0x2afa70d25de7] > [login002:02079] [ 2] > /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8) > [0x2afa70d36088] > [login002:02079] [ 3] > /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7) > [0x2afa70d37fc7] > [login002:02079] [ 4] > /g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1] > [login002:02079] [ 5] mpirun [0x404c27] > [login002:02079] [ 6] mpirun [0x403e38] > [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3568e1d994] > [login002:02079] [ 8] mpirun [0x403d69] > [login002:02079] *** End of error message *** > Segmentation fault > > We tried version 1.4.1 and this problem did not emerge. > > This option is necessary for when our users launch hybrid MPI-OMP code were > they can request M nodes and n ppn in a PBS/Torque setup so they can only get > the right amount of MPI taks. Unfortunately, as soon as we use the 'npernode > N' option mprun crashes. > > Is this a known issue? I found related problem (of around May, 2010) when > people were using the same option but in a SLURM environment. > > regards > > Michael > > <config.log.gz><ompi_info-a.out.gz>_______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/