Okay, so what’s happening is that we are auto-detecting only 4 cores on that box, and since you didn’t provide any further info, we set the #slots = #cores. If you want to run more than that, you can either tell us a number of slots to use (e.g., -host mybox:32) or add --oversubscribe to the cmd line
> On Apr 25, 2017, at 1:31 PM, Eric Chamberland > <eric.chamberl...@giref.ulaval.ca> wrote: > > Ok, here it is: > > =================== > first, with -n 8: > =================== > > mpirun -mca ras_base_verbose 10 --display-allocation -n 8 echo "Hello" > > [zorg:22429] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL > [zorg:22429] plm:base:set_hnp_name: initial bias 22429 nodename hash 810220270 > [zorg:22429] plm:base:set_hnp_name: final jobfam 40249 > [zorg:22429] [[40249,0],0] plm:rsh_setup on agent ssh : rsh path NULL > [zorg:22429] [[40249,0],0] plm:base:receive start comm > [zorg:22429] mca: base: components_register: registering framework ras > components > [zorg:22429] mca: base: components_register: found loaded component > loadleveler > [zorg:22429] mca: base: components_register: component loadleveler register > function successful > [zorg:22429] mca: base: components_register: found loaded component slurm > [zorg:22429] mca: base: components_register: component slurm register > function successful > [zorg:22429] mca: base: components_register: found loaded component simulator > [zorg:22429] mca: base: components_register: component simulator register > function successful > [zorg:22429] mca: base: components_open: opening ras components > [zorg:22429] mca: base: components_open: found loaded component loadleveler > [zorg:22429] mca: base: components_open: component loadleveler open function > successful > [zorg:22429] mca: base: components_open: found loaded component slurm > [zorg:22429] mca: base: components_open: component slurm open function > successful > [zorg:22429] mca: base: components_open: found loaded component simulator > [zorg:22429] mca:base:select: Auto-selecting ras components > [zorg:22429] mca:base:select:( ras) Querying component [loadleveler] > [zorg:22429] [[40249,0],0] ras:loadleveler: NOT available for selection > [zorg:22429] mca:base:select:( ras) Querying component [slurm] > [zorg:22429] mca:base:select:( ras) Querying component [simulator] > [zorg:22429] mca:base:select:( ras) No component selected! > [zorg:22429] [[40249,0],0] plm:base:setup_job > [zorg:22429] [[40249,0],0] ras:base:allocate > [zorg:22429] [[40249,0],0] ras:base:allocate nothing found in module - > proceeding to hostfile > [zorg:22429] [[40249,0],0] ras:base:allocate parsing default hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile > [zorg:22429] [[40249,0],0] hostfile: checking hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile for nodes > [zorg:22429] [[40249,0],0] ras:base:allocate nothing found in hostfiles - > checking for rankfile > [zorg:22429] [[40249,0],0] ras:base:allocate nothing found in rankfile - > inserting current node > [zorg:22429] [[40249,0],0] ras:base:node_insert inserting 1 nodes > [zorg:22429] [[40249,0],0] ras:base:node_insert updating HNP [zorg] info to 1 > slots > > ====================== ALLOCATED NODES ====================== > zorg: flags=0x01 slots=1 max_slots=0 slots_inuse=0 state=UP > ================================================================= > [zorg:22429] [[40249,0],0] plm:base:setup_vm > [zorg:22429] [[40249,0],0] plm:base:setup_vm creating map > [zorg:22429] [[40249,0],0] setup:vm: working unmanaged allocation > [zorg:22429] [[40249,0],0] using default hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile > [zorg:22429] [[40249,0],0] hostfile: checking hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile for nodes > [zorg:22429] [[40249,0],0] plm:base:setup_vm only HNP in allocation > [zorg:22429] [[40249,0],0] plm:base:setting slots for node zorg by cores > > ====================== ALLOCATED NODES ====================== > zorg: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP > ================================================================= > [zorg:22429] [[40249,0],0] complete_setup on job [40249,1] > [zorg:22429] [[40249,0],0] plm:base:launch_apps for job [40249,1] > [zorg:22429] [[40249,0],0] hostfile: checking hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile for nodes > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 8 slots > that were requested by the application: > echo > > Either request fewer slots for your application, or make more slots available > for use. > -------------------------------------------------------------------------- > [zorg:22429] [[40249,0],0] plm:base:orted_cmd sending orted_exit commands > [zorg:22429] [[40249,0],0] plm:base:receive stop comm > > =================== > second with -n 4: > =================== > (16:31:23) [zorg]:~> mpirun -mca ras_base_verbose 10 --display-allocation -n > 4 echo "Hello" > > [zorg:22463] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL > [zorg:22463] plm:base:set_hnp_name: initial bias 22463 nodename hash 810220270 > [zorg:22463] plm:base:set_hnp_name: final jobfam 40219 > [zorg:22463] [[40219,0],0] plm:rsh_setup on agent ssh : rsh path NULL > [zorg:22463] [[40219,0],0] plm:base:receive start comm > [zorg:22463] mca: base: components_register: registering framework ras > components > [zorg:22463] mca: base: components_register: found loaded component > loadleveler > [zorg:22463] mca: base: components_register: component loadleveler register > function successful > [zorg:22463] mca: base: components_register: found loaded component slurm > [zorg:22463] mca: base: components_register: component slurm register > function successful > [zorg:22463] mca: base: components_register: found loaded component simulator > [zorg:22463] mca: base: components_register: component simulator register > function successful > [zorg:22463] mca: base: components_open: opening ras components > [zorg:22463] mca: base: components_open: found loaded component loadleveler > [zorg:22463] mca: base: components_open: component loadleveler open function > successful > [zorg:22463] mca: base: components_open: found loaded component slurm > [zorg:22463] mca: base: components_open: component slurm open function > successful > [zorg:22463] mca: base: components_open: found loaded component simulator > [zorg:22463] mca:base:select: Auto-selecting ras components > [zorg:22463] mca:base:select:( ras) Querying component [loadleveler] > [zorg:22463] [[40219,0],0] ras:loadleveler: NOT available for selection > [zorg:22463] mca:base:select:( ras) Querying component [slurm] > [zorg:22463] mca:base:select:( ras) Querying component [simulator] > [zorg:22463] mca:base:select:( ras) No component selected! > [zorg:22463] [[40219,0],0] plm:base:setup_job > [zorg:22463] [[40219,0],0] ras:base:allocate > [zorg:22463] [[40219,0],0] ras:base:allocate nothing found in module - > proceeding to hostfile > [zorg:22463] [[40219,0],0] ras:base:allocate parsing default hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile > [zorg:22463] [[40219,0],0] hostfile: checking hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile for nodes > [zorg:22463] [[40219,0],0] ras:base:allocate nothing found in hostfiles - > checking for rankfile > [zorg:22463] [[40219,0],0] ras:base:allocate nothing found in rankfile - > inserting current node > [zorg:22463] [[40219,0],0] ras:base:node_insert inserting 1 nodes > [zorg:22463] [[40219,0],0] ras:base:node_insert updating HNP [zorg] info to 1 > slots > > ====================== ALLOCATED NODES ====================== > zorg: flags=0x01 slots=1 max_slots=0 slots_inuse=0 state=UP > ================================================================= > [zorg:22463] [[40219,0],0] plm:base:setup_vm > [zorg:22463] [[40219,0],0] plm:base:setup_vm creating map > [zorg:22463] [[40219,0],0] setup:vm: working unmanaged allocation > [zorg:22463] [[40219,0],0] using default hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile > [zorg:22463] [[40219,0],0] hostfile: checking hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile for nodes > [zorg:22463] [[40219,0],0] plm:base:setup_vm only HNP in allocation > [zorg:22463] [[40219,0],0] plm:base:setting slots for node zorg by cores > > ====================== ALLOCATED NODES ====================== > zorg: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP > ================================================================= > [zorg:22463] [[40219,0],0] complete_setup on job [40219,1] > [zorg:22463] [[40219,0],0] plm:base:launch_apps for job [40219,1] > [zorg:22463] [[40219,0],0] hostfile: checking hostfile > /opt/openmpi-3.x_debug/etc/openmpi-default-hostfile for nodes > [zorg:22463] [[40219,0],0] plm:base:launch wiring up iof for job [40219,1] > [zorg:22463] [[40219,0],0] plm:base:launch job [40219,1] is not a dynamic > spawn > Hello > Hello > Hello > Hello > [zorg:22463] [[40219,0],0] plm:base:orted_cmd sending orted_exit commands > [zorg:22463] [[40219,0],0] plm:base:receive stop comm > > > Thanks! > > Eric > > On 25/04/17 04:00 PM, r...@open-mpi.org wrote: >> -mca ras_base_verbose 10 --display-allocation > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users