On 11/17/2010 10:00 AM, Ralph Castain wrote:
--leave-session-attached is always required if you want to see output from the daemons. Otherwise, the launcher closes the ssh session (or qrsh session, in this case) as part of its normal operating procedure, thus terminating the stdout/err channel.


I believe you but isn't it weird that without the --binding option to qsub we saw -report-bindings output from the orteds?

Do you have the date of the email that has the info you talked about below. I really am not trying to be an a-hole about this but there have been so much data and email flying around it would be nice to actually see the output you mention.

--td

On Wed, Nov 17, 2010 at 7:51 AM, Terry Dontje <terry.don...@oracle.com <mailto:terry.don...@oracle.com>> wrote:

    On 11/17/2010 09:32 AM, Ralph Castain wrote:
    Cris' output is coming solely from the HNP, which is correct
    given the way things were executed. My comment was from another
    email where he did what I asked, which was to include the flags:

    --report-bindings --leave-session-attached

    so we could see the output from each orted. In that email, it was
    clear that while mpirun was bound to multiple cores, the orteds
    are being bound to a -single- core.

    Hence the problem.

    Hmm, I see Ralph's comment on 11/15 but I don't see any output
    that shows what Ralph say's above.  The only report-bindings
    output I see is when he runs without OGE binding.   Can someone
    give me the date and time of Chris' email with the
    --report-bindings and --leave-session-attached.  Or a rerun of the
    below with the --leave-session-attached option would also help.

    I find it confusing that --leave-session-attached is not required
    when the OGE binding argument is not given.

    --td

    HTH
    Ralph


    On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje
    <terry.don...@oracle.com <mailto:terry.don...@oracle.com>> wrote:

        On 11/17/2010 07:41 AM, Chris Jewell wrote:
        On 17 Nov 2010, at 11:56, Terry Dontje wrote:
        You are absolutely correct, Terry, and the 1.4 release series does 
include the proper code. The point here, though, is that SGE binds the orted to 
a single core, even though other cores are also allocated. So the orted detects 
an external binding of one core, and binds all its children to that same core.
        I do not think you are right here.  Chris sent the following which looks like OGE 
(fka SGE) actually did bind the hnp to multiple cores.  However that message I believe is 
not coming from the processes themselves and actually is only shown by the hnp.  I wonder 
if Chris adds a "-bind-to-core" option  we'll see more output from the a.out's 
before they exec unterm?
        As requested using

        $ qsub -pe mpi 8 -binding linear:2 myScript.com'

        and

        'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core 
-bind-to-core ./unterm'

        [exec5:06671] System has detected external process binding to cores 0028
        [exec5:06671] ras:gridengine: JOB_ID: 59434
        [exec5:06671] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
        [exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE 
shows slots=2
        [exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE 
shows slots=2
        [exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE 
shows slots=1
        [exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE 
shows slots=1
        [exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE 
shows slots=1
        [exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE 
shows slots=1

        No more info.  I note that the external binding is slightly different 
to what I had before, but our cluster is busier today :-)

        I would have expected more output.

        --td

        Chris


        --
        Dr Chris Jewell
        Department of Statistics
        University of Warwick
        Coventry
        CV4 7AL
        UK
        Tel: +44 (0)24 7615 0778






        _______________________________________________
        users mailing list
        us...@open-mpi.org  <mailto:us...@open-mpi.org>
        http://www.open-mpi.org/mailman/listinfo.cgi/users


-- Oracle
        Terry D. Dontje | Principal Software Engineer
        Developer Tools Engineering | +1.781.442.2631
        Oracle *- Performance Technologies*
        95 Network Drive, Burlington, MA 01803
        Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>




        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        http://www.open-mpi.org/mailman/listinfo.cgi/users



    _______________________________________________
    users mailing list
    us...@open-mpi.org  <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users


-- Oracle
    Terry D. Dontje | Principal Software Engineer
    Developer Tools Engineering | +1.781.442.2631
    Oracle *- Performance Technologies*
    95 Network Drive, Burlington, MA 01803
    Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>




    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>



Reply via email to