Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

Jeff Squyres Mon, 23 Aug 2010 20:29:02 -0400

Yes, the -npernode segv is a known issue.

We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl and see 
if that fixes your problem?


    http://www.open-mpi.org/nightly/v1.4/



On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:

> Hello OMPI:
> 
> We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. OMPI 
> was built uisng Intel compilers 11.1.072. I am attaching the configuration 
> log and output from ompi_info -a.
> 
> The problem we are encountering is that whenever we use option '-npernode N' 
> in the mpirun command line we get a segmentation fault as in below:
> 
> 
> miket@login002[pts/7]PS $ mpirun -npernode 1  --display-devel-map  
> --tag-output -np 6 -cpus-per-proc 2 -H 'login001,login002,login003' hostname
> 
>  Map generated by mapping policy: 0402
>         Npernode: 1     Oversubscribe allowed: TRUE     CPU Lists: FALSE
>         Num new daemons: 2      New daemon starting vpid 1
>         Num nodes: 3
> 
>  Data for node: Name: login001          Launch id: -1   Arch: 0 State: 2
>         Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>         Daemon: [[44812,0],1]   Daemon launched: False
>         Num slots: 1    Slots in use: 2
>         Num slots allocated: 1  Max slots: 0
>         Username on node: NULL
>         Num procs: 1    Next node_rank: 1
>         Data for proc: [[44812,1],0]
>                 Pid: 0  Local rank: 0   Node rank: 0
>                 State: 0        App_context: 0  Slot list: NULL
> 
>  Data for node: Name: login002          Launch id: -1   Arch: ffc91200  
> State: 2
>         Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>         Daemon: [[44812,0],0]   Daemon launched: True
>         Num slots: 1    Slots in use: 2
>         Num slots allocated: 1  Max slots: 0
>         Username on node: NULL
>         Num procs: 1    Next node_rank: 1
>         Data for proc: [[44812,1],0]
>                 Pid: 0  Local rank: 0   Node rank: 0
>                 State: 0        App_context: 0  Slot list: NULL
> 
>  Data for node: Name: login003          Launch id: -1   Arch: 0 State: 2
>         Num boards: 1   Num sockets/board: 2    Num cores/socket: 4
>         Daemon: [[44812,0],2]   Daemon launched: False
>         Num slots: 1    Slots in use: 2
>         Num slots allocated: 1  Max slots: 0
>         Username on node: NULL
>         Num procs: 1    Next node_rank: 1
>         Data for proc: [[44812,1],0]
>                 Pid: 0  Local rank: 0   Node rank: 0
>                 State: 0        App_context: 0  Slot list: NULL
> [login002:02079] *** Process received signal ***
> [login002:02079] Signal: Segmentation fault (11)
> [login002:02079] Signal code: Address not mapped (1)
> [login002:02079] Failing at address: 0x50
> [login002:02079] [ 0] /lib64/libpthread.so.0 [0x3569a0e7c0]
> [login002:02079] [ 1] 
> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xa7)
>  [0x2afa70d25de7]
> [login002:02079] [ 2] 
> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x3b8)
>  [0x2afa70d36088]
> [login002:02079] [ 3] 
> /g/software/openmpi-1.4.2/intel/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xd7)
>  [0x2afa70d37fc7]
> [login002:02079] [ 4] 
> /g/software/openmpi-1.4.2/intel/lib/openmpi/mca_plm_rsh.so [0x2afa721085a1]
> [login002:02079] [ 5] mpirun [0x404c27]
> [login002:02079] [ 6] mpirun [0x403e38]
> [login002:02079] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3568e1d994]
> [login002:02079] [ 8] mpirun [0x403d69]
> [login002:02079] *** End of error message ***
> Segmentation fault
> 
> We tried version 1.4.1 and this problem did not emerge. 
> 
> This option is necessary for when our users launch hybrid MPI-OMP code were 
> they can request M nodes and n ppn in a PBS/Torque setup so they can only get 
> the right amount of MPI taks. Unfortunately, as soon as we use the 'npernode 
> N' option mprun crashes. 
> 
> Is this a known issue? I found related problem (of around May, 2010)  when 
> people were using the same option but in a SLURM environment. 
> 
> regards
> 
> Michael
> 
> <config.log.gz><ompi_info-a.out.gz>_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

Reply via email to