Cris' output is coming solely from the HNP, which is correct given the way things were executed. My comment was from another email where he did what I asked, which was to include the flags:
--report-bindings --leave-session-attached so we could see the output from each orted. In that email, it was clear that while mpirun was bound to multiple cores, the orteds are being bound to a -single- core. Hence the problem. HTH Ralph On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje <terry.don...@oracle.com>wrote: > On 11/17/2010 07:41 AM, Chris Jewell wrote: > > On 17 Nov 2010, at 11:56, Terry Dontje wrote: > > You are absolutely correct, Terry, and the 1.4 release series does include > the proper code. The point here, though, is that SGE binds the orted to a > single core, even though other cores are also allocated. So the orted detects > an external binding of one core, and binds all its children to that same core. > > I do not think you are right here. Chris sent the following which looks > like OGE (fka SGE) actually did bind the hnp to multiple cores. However that > message I believe is not coming from the processes themselves and actually is > only shown by the hnp. I wonder if Chris adds a "-bind-to-core" option > we'll see more output from the a.out's before they exec unterm? > > As requested using > > $ qsub -pe mpi 8 -binding linear:2 myScript.com' > > and > > 'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core > -bind-to-core ./unterm' > > [exec5:06671] System has detected external process binding to cores 0028 > [exec5:06671] ras:gridengine: JOB_ID: 59434 > [exec5:06671] ras:gridengine: PE_HOSTFILE: > /usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile > [exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows > slots=2 > [exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows > slots=2 > [exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows > slots=1 > [exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows > slots=1 > [exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows > slots=1 > [exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows > slots=1 > > No more info. I note that the external binding is slightly different to what > I had before, but our cluster is busier today :-) > > > I would have expected more output. > > --td > > Chris > > > -- > Dr Chris Jewell > Department of Statistics > University of Warwick > Coventry > CV4 7AL > UK > Tel: +44 (0)24 7615 0778 > > > > > > > _______________________________________________ > users mailing > listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > [image: Oracle] > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.781.442.2631 > Oracle * - Performance Technologies* > 95 Network Drive, Burlington, MA 01803 > Email terry.don...@oracle.com > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >