Can you try a newer version of OMPI, say the 3.0.0 release? Just curious to know if we perhaps “fixed” something relevant.
> On Oct 3, 2017, at 5:33 PM, Anthony Thyssen <a.thys...@griffith.edu.au> wrote: > > FYI... > > The problem is discussed further in > > Redhat Bugzilla: Bug 1321154 - numa enabled torque don't work > https://bugzilla.redhat.com/show_bug.cgi?id=1321154 > <https://bugzilla.redhat.com/show_bug.cgi?id=1321154> > > I'd seen this previous as it required me to add "num_node_boards=1" to each > node in the > /var/lib/torque/server_priv/nodes to get torque to at least work. > Specifically by munging > the $PBS_NODES" (which comes out correcT) into a host list containing the > correct > "slot=" counts. But of course now that I have compiled OpenMPI using > "--with-tm" that > should not have been needed as in fact is now ignored by OpenMPI in a > Torque-PBS > environment. > > However it seems ever since "NUMA" support was into the Torque RPM's, has > also caused > the current problems, and is still continuing. The last action is a new > EPEL "test' version > (August 2017), I will try shortly. > > Take you for your help, though I am still open to suggestions for a > replacement. > > Anthony Thyssen ( System Programmer ) <a.thys...@griffith.edu.au > <mailto:a.thys...@griffith.edu.au>> > -------------------------------------------------------------------------- > Encryption... is a powerful defensive weapon for free people. > It offers a technical guarantee of privacy, regardless of who is > running the government... It's hard to think of a more powerful, > less dangerous tool for liberty. -- Esther Dyson > -------------------------------------------------------------------------- > > > > On Wed, Oct 4, 2017 at 9:02 AM, Anthony Thyssen <a.thys...@griffith.edu.au > <mailto:a.thys...@griffith.edu.au>> wrote: > Thank you Gilles. At least I now have something to follow though with. > > As a FYI, the torque is the pre-built version from the Redhat Extras (EPEL) > archive. > torque-4.2.10-10.el7.x86_64 > > Normally pre-build packages have no problems, but in this case. > > > > > On Tue, Oct 3, 2017 at 3:39 PM, Gilles Gouaillardet <gil...@rist.or.jp > <mailto:gil...@rist.or.jp>> wrote: > Anthony, > > > we had a similar issue reported some times ago (e.g. Open MPI ignores torque > allocation), > > and after quite some troubleshooting, we ended up with the same behavior > (e.g. pbsdsh is not working as expected). > > see https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html > <https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html> for the > last email. > > > from an Open MPI point of view, i would consider the root cause is with your > torque install. > > this case was reported at > http://www.clusterresources.com/pipermail/torqueusers/2016-September/018858.html > > <http://www.clusterresources.com/pipermail/torqueusers/2016-September/018858.html> > > and no conclusion was reached. > > > Cheers, > > > Gilles > > > On 10/3/2017 2:02 PM, Anthony Thyssen wrote: > The stdin and stdout are saved to separate channels. > > It is interesting that the output from pbsdsh is node21.emperor 5 times, even > though $PBS_NODES is the 5 individual nodes. > > Attached are the two compressed files, as well as the pbs_hello batch used. > > Anthony Thyssen ( System Programmer ) <a.thys...@griffith.edu.au > <mailto:a.thys...@griffith.edu.au> <mailto:a.thys...@griffith.edu.au > <mailto:a.thys...@griffith.edu.au>>> > -------------------------------------------------------------------------- > There are two types of encryption: > One that will prevent your sister from reading your diary, and > One that will prevent your government. -- Bruce Schneier > -------------------------------------------------------------------------- > > > > > On Tue, Oct 3, 2017 at 2:39 PM, Gilles Gouaillardet <gil...@rist.or.jp > <mailto:gil...@rist.or.jp> <mailto:gil...@rist.or.jp > <mailto:gil...@rist.or.jp>>> wrote: > > Anthony, > > > in your script, can you > > > set -x > > env > > pbsdsh hostname > > mpirun --display-map --display-allocation --mca ess_base_verbose > 10 --mca plm_base_verbose 10 --mca ras_base_verbose 10 hostname > > > and then compress and send the output ? > > > Cheers, > > > Gilles > > > On 10/3/2017 1:19 PM, Anthony Thyssen wrote: > > I noticed that too. Though the submitting host for torque is > a different host (main head node, "shrek"), "node21" is the > host that torque runs the batch script (and the mpirun > command) it being the first node in the "dualcore" resource group. > > Adding option... > > It fixed the hostname in the allocation map, though had no > effect on the outcome. The allocation is still simply ignored. > > =======8<--------CUT HERE---------- > PBS Job Number 9000 > PBS batch run on node21.emperor > Time it was started 2017-10-03_14:11:20 > Current Directory /net/shrek.emperor/home/shrek/anthony > Submitted work dir /home/shrek/anthony/mpi-pbs > Number of Nodes 5 > Nodefile List /var/lib/torque/aux//9000.shrek.emperor > node21.emperor > node25.emperor > node24.emperor > node23.emperor > node22.emperor > --------------------------------------- > > ====================== ALLOCATED NODES ====================== > node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > node25.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > node24.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > node22.emperor: slots=1 max_slots=0 slots_inuse=0 state=UP > ================================================================= > node21.emperor > node21.emperor > node21.emperor > node21.emperor > node21.emperor > =======8<--------CUT HERE---------- > > > Anthony Thyssen ( System Programmer ) > <a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au> > <mailto:a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>> > <mailto:a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au> > <mailto:a.thys...@griffith.edu.au > <mailto:a.thys...@griffith.edu.au>>>> > > -------------------------------------------------------------------------- > The equivalent of an armoured car should always be used to > protect any secret kept in a cardboard box. > -- Anthony Thyssen, On the use of Encryption > > -------------------------------------------------------------------------- > > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > <mailto:users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users> > <https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users>> > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > <mailto:users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users> > <https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users>> > > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users