Hi Anthony, Ralph, Gilles, all

As far as I know, for core/processor assignment to user jobs to work,
Torque needs to be configured with cpuset support
(configure --enable-cpuset ...).
That is separate from what OpenMPI does in terms of process binding.
Otherwise, the user processes in the job
will be free to use any cores/processors on the nodes assigned to it.

Some additional work to setup Linux support for cpuset is also needed,
for Torque to use it at runtime (create a subdirectory in /dev/cpuset,
mount the cpuset file system there).
I do this in the pbs_mom daemon startup stcript,
but that can be done in other ways:

##################################################
# create and mount /dev/cpuset
if [ ! -e /dev/cpuset ];then
    mkdir /dev/cpuset
fi

if [ "`mount -t cpuset`x" == "x" ];then
   mount -t cpuset none /dev/cpuset
fi
##################################################

I don't know if the epel Torque package is configured
with cpuset support, but I would guess it is not.
Look at /dev/cpuset in your compute nodes
to see if Torque created anything there.

I don't know either if OpenMPI can somehow bypass the cores/processors assigned by torque to a job, if any, or when Torque is configured without cpuset support, to somehow still bind the MPI processes to cores/processors/sockets/etc.

I hope this helps,
Gus Correa

On 10/06/2017 02:22 AM, Anthony Thyssen wrote:
Sorry r...@open-mpi.org <mailto:r...@open-mpi.org>  as Gilles Gouaillardet pointed out to me the problem wasn't OpenMPI, but with the specific EPEL implementation (see Redhat Bugzilla 1321154)

Today, the the server was able to be taken down for maintance, and I wanted to try a few things.

After installing EPEL Testing Repo    torque-4.2.10-11.el7

However I found that all the nodes were 'down'  even though everything appears to be running, with no errors in the error logs.

After a lot of trials, errors and reseach, I eventually (on a whim) I decided to remove the "num_node_boards=1" entry from the "torque/server_priv/nodes" file and restart the server & scheduler.  Suddenly the nodes were "free" and my initial test job ran.

Perhaps the EPEL-Test Torque 4.2.10-11  does not contain Numa?

ALL later tests (with OpenMPI - RHEL SRPM 1.10.6-2 re-compiled "--with-tm")  is now responding to the Torque mode allocation correctly and is no longer simply running all the jobs on the first node.

That is    $PBS_NODEFILE  ,    pbsdsh hostname  and   mpirun hostname are all in agreement.

Thank you all for your help, and putting up with with me.

  Anthony Thyssen ( System Programmer )    <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>>
  --------------------------------------------------------------------------
   "Around here we've got a name for people what talks to dragons."
   "Traitor?"  Wiz asked apprehensively.
   "No.  Lunch."                     -- Rick Cook, "Wizadry Consulted"
  --------------------------------------------------------------------------


On Wed, Oct 4, 2017 at 11:43 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:

    Can you try a newer version of OMPI, say the 3.0.0 release? Just
    curious to know if we perhaps “fixed” something relevant.


    On Oct 3, 2017, at 5:33 PM, Anthony Thyssen
    <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>> wrote:

    FYI...

    The problem is discussed further in

    Redhat Bugzilla: Bug 1321154 - numa enabled torque don't work
    https://bugzilla.redhat.com/show_bug.cgi?id=1321154
    <https://bugzilla.redhat.com/show_bug.cgi?id=1321154>

    I'd seen this previous as it required me to add
    "num_node_boards=1" to each node in the
    /var/lib/torque/server_priv/nodes  to get torque to at least
    work.  Specifically by munging
    the $PBS_NODES" (which comes out correcT) into a host list
    containing the correct
    "slot=" counts.  But of course now that I have compiled OpenMPI
    using "--with-tm" that
    should not have been needed as in fact is now ignored by OpenMPI
    in a Torque-PBS
    environment.

    However it seems ever since "NUMA" support was into the Torque
    RPM's, has also caused
    the current problems, and is still continuing.   The last action
    is a new EPEL "test' version
    (August 2017),  I will try shortly.

    Take you for your help, though I am still open to suggestions for
    a replacement.

  Anthony Thyssen ( System Programmer ) <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>>
     --------------------------------------------------------------------------
       Encryption... is a powerful defensive weapon for free people.
       It offers a technical guarantee of privacy, regardless of who is
       running the government... It's hard to think of a more powerful,
       less dangerous tool for liberty.            --  Esther Dyson
     --------------------------------------------------------------------------



    On Wed, Oct 4, 2017 at 9:02 AM, Anthony Thyssen
    <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>> wrote:

        Thank you Gilles.  At least I now have something to follow
        though with.

        As a FYI, the torque is the pre-built version from the Redhat
        Extras (EPEL) archive.
        torque-4.2.10-10.el7.x86_64

        Normally pre-build packages have no problems, but in this case.




        On Tue, Oct 3, 2017 at 3:39 PM, Gilles Gouaillardet
        <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

            Anthony,


            we had a similar issue reported some times ago (e.g. Open
            MPI ignores torque allocation),

            and after quite some troubleshooting, we ended up with the
            same behavior (e.g. pbsdsh is not working as expected).

            see
            https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html
            
<https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html>
            for the last email.


            from an Open MPI point of view, i would consider the root
            cause is with your torque install.

            this case was reported at
            
http://www.clusterresources.com/pipermail/torqueusers/2016-September/018858.html
            
<http://www.clusterresources.com/pipermail/torqueusers/2016-September/018858.html>

            and no conclusion was reached.


            Cheers,


            Gilles


            On 10/3/2017 2:02 PM, Anthony Thyssen wrote:

                The stdin and stdout are saved to separate channels.

                It is interesting that the output from pbsdsh is
                node21.emperor 5 times, even though $PBS_NODES is the
                5 individual nodes.

                Attached are the two compressed files, as well as the
                pbs_hello batch used.

Anthony Thyssen ( System Programmer ) <a.thys...@griffith.edu.au
                <mailto:a.thys...@griffith.edu.au>
                <mailto:a.thys...@griffith.edu.au
                <mailto:a.thys...@griffith.edu.au>>>
                 
--------------------------------------------------------------------------
                  There are two types of encryption:
                    One that will prevent your sister from reading
                your diary, and
    One that will prevent your government.  -- Bruce Schneier
                 
--------------------------------------------------------------------------




                On Tue, Oct 3, 2017 at 2:39 PM, Gilles Gouaillardet
                <gil...@rist.or.jp <mailto:gil...@rist.or.jp>
                <mailto:gil...@rist.or.jp <mailto:gil...@rist.or.jp>>>
                wrote:

                    Anthony,


                    in your script, can you


                    set -x

                    env

                    pbsdsh hostname

                    mpirun --display-map --display-allocation --mca
                ess_base_verbose
                    10 --mca plm_base_verbose 10 --mca
                ras_base_verbose 10 hostname


                    and then compress and send the output ?


                    Cheers,


                    Gilles


                    On 10/3/2017 1:19 PM, Anthony Thyssen wrote:

                        I noticed that too.  Though the submitting
                host for torque is
        a different host (main head node, "shrek"), "node21" is the
                        host that torque runs the batch script (and
                the mpirun
                        command) it being the first node in the
                "dualcore" resource group.

                        Adding option...

                        It fixed the hostname in the allocation map,
                though had no
                        effect on the outcome.  The allocation is
                still simply ignored.

                        =======8<--------CUT HERE----------
                        PBS Job Number       9000
                        PBS batch run on     node21.emperor
                        Time it was started  2017-10-03_14:11:20
        Current Directory /net/shrek.emperor/home/shrek/anthony
                        Submitted work dir   /home/shrek/anthony/mpi-pbs
                        Number of Nodes      5
        Nodefile List /var/lib/torque/aux//9000.shrek.emperor
                        node21.emperor
                        node25.emperor
                        node24.emperor
                        node23.emperor
                        node22.emperor
                        ---------------------------------------

                        ======================  ALLOCATED NODES
                 ======================
                        node21.emperor: slots=1 max_slots=0
                slots_inuse=0 state=UP
                        node25.emperor: slots=1 max_slots=0
                slots_inuse=0 state=UP
                        node24.emperor: slots=1 max_slots=0
                slots_inuse=0 state=UP
                        node23.emperor: slots=1 max_slots=0
                slots_inuse=0 state=UP
                        node22.emperor: slots=1 max_slots=0
                slots_inuse=0 state=UP
=================================================================
                        node21.emperor
                        node21.emperor
                        node21.emperor
                        node21.emperor
                        node21.emperor
                        =======8<--------CUT HERE----------


                          Anthony Thyssen ( System Programmer )
                        <a.thys...@griffith.edu.au
                <mailto:a.thys...@griffith.edu.au>
                <mailto:a.thys...@griffith.edu.au
                <mailto:a.thys...@griffith.edu.au>>
                        <mailto:a.thys...@griffith.edu.au
                <mailto:a.thys...@griffith.edu.au>
                        <mailto:a.thys...@griffith.edu.au
                <mailto:a.thys...@griffith.edu.au>>>>
 --------------------------------------------------------------------------
                           The equivalent of an armoured car should
                always be used to
                           protect any secret kept in a cardboard box.
                           -- Anthony Thyssen, On the use of Encryption
 --------------------------------------------------------------------------




                        _______________________________________________
                        users mailing list
                users@lists.open-mpi.org
                <mailto:users@lists.open-mpi.org>
                <mailto:users@lists.open-mpi.org
                <mailto:users@lists.open-mpi.org>>
                https://lists.open-mpi.org/mailman/listinfo/users
                <https://lists.open-mpi.org/mailman/listinfo/users>
<https://lists.open-mpi.org/mailman/listinfo/users
                <https://lists.open-mpi.org/mailman/listinfo/users>>


                    _______________________________________________
                    users mailing list
                users@lists.open-mpi.org
                <mailto:users@lists.open-mpi.org>
                <mailto:users@lists.open-mpi.org
                <mailto:users@lists.open-mpi.org>>
                https://lists.open-mpi.org/mailman/listinfo/users
                <https://lists.open-mpi.org/mailman/listinfo/users>
                    <https://lists.open-mpi.org/mailman/listinfo/users
                <https://lists.open-mpi.org/mailman/listinfo/users>>




                _______________________________________________
                users mailing list
                users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
                https://lists.open-mpi.org/mailman/listinfo/users
                <https://lists.open-mpi.org/mailman/listinfo/users>


            _______________________________________________
            users mailing list
            users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
            https://lists.open-mpi.org/mailman/listinfo/users
            <https://lists.open-mpi.org/mailman/listinfo/users>



    _______________________________________________
    users mailing list
    users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
    https://lists.open-mpi.org/mailman/listinfo/users
    <https://lists.open-mpi.org/mailman/listinfo/users>




_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to