On Oct 3, 2017, at 5:33 PM, Anthony Thyssen
<a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>> wrote:
FYI...
The problem is discussed further in
Redhat Bugzilla: Bug 1321154 - numa enabled torque don't work
https://bugzilla.redhat.com/show_bug.cgi?id=1321154
<https://bugzilla.redhat.com/show_bug.cgi?id=1321154>
I'd seen this previous as it required me to add
"num_node_boards=1" to each node in the
/var/lib/torque/server_priv/nodes to get torque to at least
work. Specifically by munging
the $PBS_NODES" (which comes out correcT) into a host list
containing the correct
"slot=" counts. But of course now that I have compiled OpenMPI
using "--with-tm" that
should not have been needed as in fact is now ignored by OpenMPI
in a Torque-PBS
environment.
However it seems ever since "NUMA" support was into the Torque
RPM's, has also caused
the current problems, and is still continuing. The last action
is a new EPEL "test' version
(August 2017), I will try shortly.
Take you for your help, though I am still open to suggestions for
a replacement.
Anthony Thyssen ( System Programmer )
<a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>>
--------------------------------------------------------------------------
Encryption... is a powerful defensive weapon for free people.
It offers a technical guarantee of privacy, regardless of who is
running the government... It's hard to think of a more powerful,
less dangerous tool for liberty. -- Esther Dyson
--------------------------------------------------------------------------
On Wed, Oct 4, 2017 at 9:02 AM, Anthony Thyssen
<a.thys...@griffith.edu.au <mailto:a.thys...@griffith.edu.au>> wrote:
Thank you Gilles. At least I now have something to follow
though with.
As a FYI, the torque is the pre-built version from the Redhat
Extras (EPEL) archive.
torque-4.2.10-10.el7.x86_64
Normally pre-build packages have no problems, but in this case.
On Tue, Oct 3, 2017 at 3:39 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Anthony,
we had a similar issue reported some times ago (e.g. Open
MPI ignores torque allocation),
and after quite some troubleshooting, we ended up with the
same behavior (e.g. pbsdsh is not working as expected).
see
https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html
<https://www.mail-archive.com/users@lists.open-mpi.org/msg29952.html>
for the last email.
from an Open MPI point of view, i would consider the root
cause is with your torque install.
this case was reported at
http://www.clusterresources.com/pipermail/torqueusers/2016-September/018858.html
<http://www.clusterresources.com/pipermail/torqueusers/2016-September/018858.html>
and no conclusion was reached.
Cheers,
Gilles
On 10/3/2017 2:02 PM, Anthony Thyssen wrote:
The stdin and stdout are saved to separate channels.
It is interesting that the output from pbsdsh is
node21.emperor 5 times, even though $PBS_NODES is the
5 individual nodes.
Attached are the two compressed files, as well as the
pbs_hello batch used.
Anthony Thyssen ( System Programmer )
<a.thys...@griffith.edu.au
<mailto:a.thys...@griffith.edu.au>
<mailto:a.thys...@griffith.edu.au
<mailto:a.thys...@griffith.edu.au>>>
--------------------------------------------------------------------------
There are two types of encryption:
One that will prevent your sister from reading
your diary, and
One that will prevent your government.
-- Bruce Schneier
--------------------------------------------------------------------------
On Tue, Oct 3, 2017 at 2:39 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>
<mailto:gil...@rist.or.jp <mailto:gil...@rist.or.jp>>>
wrote:
Anthony,
in your script, can you
set -x
env
pbsdsh hostname
mpirun --display-map --display-allocation --mca
ess_base_verbose
10 --mca plm_base_verbose 10 --mca
ras_base_verbose 10 hostname
and then compress and send the output ?
Cheers,
Gilles
On 10/3/2017 1:19 PM, Anthony Thyssen wrote:
I noticed that too. Though the submitting
host for torque is
a different host (main head node, "shrek"),
"node21" is the
host that torque runs the batch script (and
the mpirun
command) it being the first node in the
"dualcore" resource group.
Adding option...
It fixed the hostname in the allocation map,
though had no
effect on the outcome. The allocation is
still simply ignored.
=======8<--------CUT HERE----------
PBS Job Number 9000
PBS batch run on node21.emperor
Time it was started 2017-10-03_14:11:20
Current Directory
/net/shrek.emperor/home/shrek/anthony
Submitted work dir /home/shrek/anthony/mpi-pbs
Number of Nodes 5
Nodefile List
/var/lib/torque/aux//9000.shrek.emperor
node21.emperor
node25.emperor
node24.emperor
node23.emperor
node22.emperor
---------------------------------------
====================== ALLOCATED NODES
======================
node21.emperor: slots=1 max_slots=0
slots_inuse=0 state=UP
node25.emperor: slots=1 max_slots=0
slots_inuse=0 state=UP
node24.emperor: slots=1 max_slots=0
slots_inuse=0 state=UP
node23.emperor: slots=1 max_slots=0
slots_inuse=0 state=UP
node22.emperor: slots=1 max_slots=0
slots_inuse=0 state=UP
=================================================================
node21.emperor
node21.emperor
node21.emperor
node21.emperor
node21.emperor
=======8<--------CUT HERE----------
Anthony Thyssen ( System Programmer )
<a.thys...@griffith.edu.au
<mailto:a.thys...@griffith.edu.au>
<mailto:a.thys...@griffith.edu.au
<mailto:a.thys...@griffith.edu.au>>
<mailto:a.thys...@griffith.edu.au
<mailto:a.thys...@griffith.edu.au>
<mailto:a.thys...@griffith.edu.au
<mailto:a.thys...@griffith.edu.au>>>>
--------------------------------------------------------------------------
The equivalent of an armoured car should
always be used to
protect any secret kept in a cardboard box.
-- Anthony Thyssen, On the use of Encryption
--------------------------------------------------------------------------
_______________________________________________
users mailing list
users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org>
<mailto:users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org>>
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
<https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>>
_______________________________________________
users mailing list
users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org>
<mailto:users@lists.open-mpi.org
<mailto:users@lists.open-mpi.org>>
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
<https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>>
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>