Am 28.02.2013 um 06:55 schrieb Ralph Castain: > I don't off-hand see a problem, though I do note that your "working" version > incorrectly reports the universe size as 2!
Yes, it was 2 in the case when it was working by giving only two hostnames without any dedicated slot count. What should it be in this case - "unknown", "infinity"? -- Reuti > > I'll have to take a look at this and get back to you on it. > > On Feb 27, 2013, at 3:15 PM, Reuti <re...@staff.uni-marburg.de> wrote: > >> Hi, >> >> I have an issue using the option -cpus-per-proc 2. As I have Bulldozer >> machines and I want only one process per FP core, I thought using >> -cpus-per-proc 2 would be the way to go. Initially I had this issue inside >> GridEngine but then tried it outside any queuingsystem and face exactly the >> same behavior. >> >> @) Each machine has 4 CPUs with each having 16 integer cores, hence 64 >> integer cores per machine in total. Used Open MPI is 1.6.4. >> >> >> a) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 >> ./mpihello >> >> and a hostfile containing only the two lines listing the machines: >> >> node006 >> node007 >> >> This works as I would like it (see working.txt) when initiated on node006. >> >> >> b) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 >> ./mpihello >> >> But changing the hostefile so that it is having a slot count which might >> mimic the behavior in case of a parsed machinefile out of any queuing system: >> >> node006 slots=64 >> node007 slots=64 >> >> This fails with: >> >> -------------------------------------------------------------------------- >> An invalid physical processor ID was returned when attempting to bind >> an MPI process to a unique processor on node: >> >> Node: node006 >> >> This usually means that you requested binding to more processors than >> exist (e.g., trying to bind N MPI processes to M processors, where N > >> M), or that the node has an unexpectedly different topology. >> >> Double check that you have enough unique processors for all the >> MPI processes that you are launching on this host, and that all nodes >> have identical topologies. >> >> You job will now abort. >> -------------------------------------------------------------------------- >> >> (see failed.txt) >> >> >> b1) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 32 >> ./mpihello >> >> This works and the found universe is 128 as expected (see only32.txt). >> >> >> c) Maybe the used machinefile is not parsed in the correct way, so I checked: >> >> c1) mpiexec -hostfile machines -np 64 ./mpihello => works >> >> c2) mpiexec -hostfile machines -np 128 ./mpihello => works >> >> c3) mpiexec -hostfile machines -np 129 ./mpihello => fails as expected >> >> So, it got the slot counts in the correct way. >> >> What do I miss? >> >> -- Reuti >> >> <failed.txt><only32.txt><working.txt>_______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users