Mahmood, you might want to have a look at OpenHPC (which comes with a recent Open MPI)
Cheers, Gilles On Thu, Aug 3, 2017 at 9:48 PM, Mahmood Naderan <mahmood...@gmail.com> wrote: > Well, it seems that the default Rocks-openmpi dominates the systems. So, at > the moment, I stick with that which is 1.6.5 and uses -machinefile. > I will later debug to see why 2.0.1 doesn't work. > > Thanks. > > Regards, > Mahmood > > > > On Tue, Aug 1, 2017 at 12:30 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: >> >> Maybe something is wrong with the Torque installation? >> Or perhaps with the Open MPI + Torque integration? >> >> 1) Make sure your Open MPI was configured and compiled with the >> Torque "tm" library of your Torque installation. >> In other words: >> >> configure --with-tm=/path/to/your/Torque/tm_library ... >> >> 2) Check if your $TORQUE/server_priv/nodes file has all the nodes >> in your cluster. If not, edit the file and add the missing nodes. >> Then restart the Torque server (service pbs_server restart). >> >> 3) Run "pbsnodes" to see if all nodes are listed. >> >> 4) Run "hostname" with mpirun in a short Torque script: >> >> #PBS -l nodes=4:ppn=1 >> ... >> mpirun hostname >> >> The output should show all four nodes. >> >> Good luck! >> Gus Correa >> >> On 07/31/2017 02:41 PM, Mahmood Naderan wrote: >>> >>> Well it is confusing!! As you can see, I added four nodes to the host >>> file (the same nodes are used by PBS). The --map-by ppr:1:node works well. >>> However, the PBS directive doesn't work >>> >>> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun >>> -hostfile hosts --map-by ppr:1:node a.out >>> >>> **************************************************************************** >>> * hwloc 1.11.2 has encountered what looks like an error from the >>> operating system. >>> * >>> * Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset >>> 0xff00ffff) without inclusion! >>> * Error occurred in topology.c line 1048 >>> * >>> * The following FAQ entry in the hwloc documentation may help: >>> * What should I do when hwloc reports "operating system" warnings? >>> * Otherwise please report this error message to the hwloc user's mailing >>> list, >>> * along with the output+tarball generated by the hwloc-gather-topology >>> script. >>> >>> **************************************************************************** >>> Hello world from processor cluster.hpc.org <http://cluster.hpc.org>, rank >>> 0 out of 4 processors >>> Hello world from processor compute-0-0.local, rank 1 out of 4 processors >>> Hello world from processor compute-0-1.local, rank 2 out of 4 processors >>> Hello world from processor compute-0-2.local, rank 3 out of 4 processors >>> mahmood@cluster:mpitest$ cat mmt.sh >>> #!/bin/bash >>> #PBS -V >>> #PBS -q default >>> #PBS -j oe >>> #PBS -l nodes=4:ppn=1 >>> #PBS -N job1 >>> #PBS -o . >>> cd $PBS_O_WORKDIR >>> /share/apps/computer/openmpi-2.0.1/bin/mpirun a.out >>> mahmood@cluster:mpitest$ qsub mmt.sh >>> 6428.cluster.hpc.org <http://6428.cluster.hpc.org> >>> >>> mahmood@cluster:mpitest$ cat job1.o6428 >>> Hello world from processor compute-0-1.local, rank 0 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 2 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 3 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 4 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 5 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 6 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 8 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 9 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 12 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 15 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 16 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 18 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 19 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 20 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 21 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 22 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 24 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 26 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 27 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 28 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 29 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 30 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 31 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 7 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 10 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 14 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 1 out of 32 processors >>> Hello world from processor compute-0-1.local, rank 11 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 13 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 17 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 23 out of 32 >>> processors >>> Hello world from processor compute-0-1.local, rank 25 out of 32 >>> processors >>> >>> >>> >>> Any idea? >>> >>> Regards, >>> Mahmood >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >>> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users