Found the problem. I had accidently linked to BLACS built wit mpich, not openmpi.
Cheers, Conn ________________________________ From: Conn ORourke <conn.orou...@yahoo.co.uk> To: "us...@open-mpi.org" <us...@open-mpi.org>; "terry.don...@oracle.com" <terry.don...@oracle.com> Sent: Saturday, 14 January 2012, 17:42 Subject: Re: [OMPI users] Openmpi SGE and BLACS Dear Terry, Thanks for the reply, and sorry for the delay in getting back to you. Here is the relevant part of the gdb output: Program terminated with signal 11, Segmentation fault. #0 0x00002b63ba7f9291 in PMPI_Comm_size () at ./pcomm_size.c:46 46 if ( ompi_comm_invalid (comm)) { (gdb) where #0 0x00002b63ba7f9291 in PMPI_Comm_size () at ./pcomm_size.c:46 #1 0x000000000062cb6c in blacs_pinfo_ () at ./blacs_pinfo_.c:29 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Do you think the problem is being caused by SGE feeding the wrong number of processors to BLACS in someway? As I mentioned previously I am requesting a different number of processors than I am running on, as I run several jobs on the requested processors. Thanks for your time & help. Conn ________________________________ From: TERRY DONTJE <terry.don...@oracle.com> To: us...@open-mpi.org Sent: Friday, 13 January 2012, 13:21 Subject: Re: [OMPI users] Openmpi SGE and BLACS Do you have a stack of where exactly things are seg faulting in blacs_pinfo? --td On 1/13/2012 8:12 AM, Conn ORourke wrote: Dear Openmpi Users, > > >I am reserving several processors with SGE upon which I want to run a number >of openmpi jobs, all of which individually (and combined) use less than the >reserved number of processors. The code I am using uses BLACS, and when >blacs_pinfo is called I get a seg fault. If the code doesn't call blacs_pinfo >it runs fine being submitted in this manner. blacs_pinfo simply returns the >number of available processors, so I suspect this is an issue with SGE and >openmpi and the requested node number being different to that given to mpirun. > > > >Can anyone explain why this would happen with openmpi jobs using BLACS on the >SGE? And suggest maybe a way around it? > > > >Many thanks > >Conn > > > >example submission script: >#!/bin/bash -f -l#$ -V #$ -N test #$ -S /bin/bash#$ -cwd#$ -l vf=1800M#$ -pe >ib-ompi 12 #$ -q infiniband.q BIN=~/bin/program fori inXPOL,YPOL,ZPOL;do mkdir ${TMPDIR}/4ZP; mkdir ${TMPDIR}/4ZP/$i; cp ./4ZP/$i/*${TMPDIR}/4ZP/$i; done cd ${TMPDIR}/4ZP/XPOL; mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output & cd ${TMPDIR}/4ZP/YPOL; mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output & cd ${TMPDIR}/4ZP/ZPOL; mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output ; fori in XPOL YPOL ZPOL ;do cp ${TMPDIR}/4ZP/$i/*${HOME}/4ZP/$i; doneblacs_pinfo.c: #include "Bdef.h"#if (INTFACE == C_CALL)void Cblacs_pinfo(int *mypnum,int *nprocs)#elseF_VOID_FUNC blacs_pinfo_(int *mypnum,int *nprocs)#endif{ int ierr; extern int BI_Iam,BI_Np;/* *Ifthis is our first call,will need to setup some stuff */ if(BI_F77_MPI_COMM_WORLD ==NULL) {/* * TheBLACS always call f77's mpi_init. If the user is using C, he should * explicitly call MPI_Init . . . */ MPI_Initialized(nprocs); #ifdef MainInF77 if (!(*nprocs)) bi_f77_init_(); #else if (!(*nprocs)) BI_BlacsErr(-1, -1, __FILE__, "Users with C main programs must explicitly call MPI_Init"); #endif BI_F77_MPI_COMM_WORLD = (int *) malloc(sizeof(int)); #ifdef UseF77Mpi BI_F77_MPI_CONSTANTS = (int *) malloc(23*sizeof(int)); ierr = 1; bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, BI_F77_MPI_CONSTANTS); #else ierr = 0; bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, nprocs); #endif BI_MPI_Comm_size(BI_MPI_COMM_WORLD, &BI_Np, ierr); BI_MPI_Comm_rank(BI_MPI_COMM_WORLD, &BI_Iam, ierr); } *mypnum = BI_Iam; *nprocs = BI_Np; } > _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Terry D. Dontje | Principal Software Engineer Developer Tools Engineering | +1.781.442.2631 Oracle - Performance Technologies 95 Network Drive, Burlington, MA 01803 Email terry.don...@oracle.com _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users