Sounds rather bizarre. Do you have lstopo on your machine? Might be useful to 
see the output of that so we can understand what it thinks the topology is like 
as this underpins the binding code.

The -nooversubscribe option is a red herring here - it has nothing to do with 
the problem, nor will it help.

FWIW: if you aren't adding --bind-to-core, then OMPI isn't launching your 
process on any specific core at all - we are simply launching it on the node. 
It sounds to me like your code is incorrectly identifying "sharing" when a 
process isn't bound to a specific core.

On Apr 25, 2012, at 10:39 AM, Kyle Boe wrote:

> >I just re-read the thread. I think there's a little confusion between the 
> >terms "processor" and "MPI process" here. You said "As a pre-processing 
> >step, each processor must figure out which other processors it must 
> >communicate with by virtue of sharing neighboring grid points." Did you mean 
> >"MPI process" instead of "processor"? 
> 
> The code is designed to be run using only one MPI process per 
> core/slot/whatever word you want to use. I believe what is happening here is 
> that OMPI is launching all MPI processes on a single slot.This is why my code 
> is freaking out and telling me that a slot is asking for information it 
> already owns. So, in order to answer your second point:
> 
> >Secondly, if you're just running on a single machine with no scheduler and 
> >no hostile, you should be able to: mpirun -np <whatever_you_want> 
> >your_program_name When you get the "There are not enough slots available in 
> >the system..." message, that usually means that *something* is telling Open 
> >MPI a maximum number of processes that can be run, and your -np value is 
> >greater than that. This is *usually* a scheduler, but can also be a hostile 
> >and/or an environment variable or file-based MCA parameter. 
> 
> I wanted to force MPI to only assign a single process per each slot, so I 
> used the -nooversubscribe option. This is when I get the error about there 
> not being enough slots in the system to fulfill my request. I can use mpirun 
> with np set to whatever I want and it will launch succesfully, but then my 
> code kills itself because the processes are being oversubscribed to a single 
> slot, which doesn't do me or my code any good at all.
> 
> So the problem is that even though I have 8, 24, and 48 core machines, OMPI 
> thinks each one of them only has a single core, and will launch all MPI 
> processes on that one core.
> 
> Kyle
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to