Not sure I fully grok this thread, but will try to provide an answer.

When you start a singleton, it spawns off a daemon that is the equivalent of 
"mpirun". This daemon is created for the express purpose of allowing the 
singleton to use MPI dynamics like comm_spawn - without it, the singleton would 
be unable to execute those functions.

The first thing the daemon does is read the local allocation, using the same 
methods as used by mpirun. So whatever allocation is present that mpirun would 
have read, the daemon will get. This includes hostfiles and SGE allocations.

The exception to this is when the singleton gets started in an altered 
environment - e.g., if SGE changes the environmental variables when launching 
the singleton process. We see this in some resource managers - you can get an 
allocation of N nodes, but when you launch a job, the envar in that job only 
indicates the number of nodes actually running processes in that job. In such a 
situation, the daemon will see the altered value as its "allocation", 
potentially causing confusion.

For this reason, I generally recommend that you run dynamic applications using 
miprun when operating in RM-managed environments to avoid confusion. Or at 
least use "printenv" to check that the envars are going to be right before 
trying to start from a singleton.

HTH
Ralph

On Jan 31, 2012, at 12:19 PM, Reuti wrote:

> Am 31.01.2012 um 20:12 schrieb Jeff Squyres:
> 
>> I only noticed after the fact that Tom is also here at Cisco (it's a big 
>> company, after all :-) ).
>> 
>> I've contacted him using our proprietary super-secret Cisco handshake (i.e., 
>> the internal phone network); I'll see if I can figure out the issues 
>> off-list.
> 
> But I would be interested in a statement about a hostlist for singleton 
> startups. Or whether it's honoring the tight integration nodes more by 
> accident than by design. And as said: I see a wrong allocation, as the 
> initial ./Mpitest doesn't count as process. I get a 3+1 allocation instead of 
> 2+2 (what is granted by SGE). If started with "mpiexec -np 1 ./Mpitest" all 
> is fine.
> 
> -- Reuti
> 
> 
>> On Jan 31, 2012, at 1:08 PM, Dave Love wrote:
>> 
>>> Reuti <re...@staff.uni-marburg.de> writes:
>>> 
>>>> Maybe it's a side effect of a tight integration that it would start on
>>>> the correct nodes (but I face an incorrect allocation of slots and an
>>>> error message at the end if started without mpiexec), as in this case
>>>> it has no command line option for the hostfile. How to get the
>>>> requested nodes if started from the command line?
>>> 
>>> Yes, I wouldn't expect it to work without mpirun/mpiexec and, of course,
>>> I basically agree with Reuti about the rest.
>>> 
>>> If there is an actual SGE problem or need for an enhancement, though,
>>> please file it per https://arc.liv.ac.uk/trac/SGE#mail
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to