Hi,

Am 15.07.2011 um 21:14 schrieb Terry Dontje:

> On 7/15/2011 1:46 PM, Paul Kapinos wrote:
>> Hi OpenMPI volks (and Oracle/Sun experts), 
>> 
>> we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of our 
>> cluster. In the part of the cluster where LDAP is activated, the mpiexec  
>> does not try to spawn tasks on remote nodes at all, but exits with an error 
>> message alike below. If 'strace -f' the mpiexec, no exec of "ssh" can be 
>> found at all. Wondering, mpiexec tries to look into /etc/passwd (where user 
>> is not in, because using LDAP!). 
>> 
> Note this is an area that should be no different than from stock Open MPI. 
> I would suspect that the message might be coming from ssh.  I wouldn't 
> suspect mpiexec would be looking into /etc/passwd at all, why would it need 
> to.

the output you listed is titled "[unknown-user]". Maybe referring to the 
password file is a wrong simplification. The test is also on the master node of 
the parallel job by an usual `getpwuid`. The /etc/nsswitch.conf is fine an the 
`mpiexec` machine?

On this node the user is known too? Can they login because they have no 
passphrase or because they have an agent running, or did you setup hostbased 
authentication?


>  It should just be using ssh.  Can you manually ssh to the same node?
>> On the old part of the cluster, where NIS is used as the autentification 
>> method, Sun MPI runs very fine. 
>> 
>> So, is Suns MPI compatible with LDAP autotentification method at all? 
>> 
> In as far as whatever launcher you use is compatible with LDAP.
>> Best wishes, 
>> 
>> Paul 
>> 
>> 
>> P.S. in both parts if the cluster, me (login marked as xxxxx here) can login 
>> to any node by ssh without need to type the password. 

>From the headnode of the cluster to a node or also between nodes?

-- Reuti


>> 
>> 
>> 
>> -------------------------------------------------------------------------- 
>> The user (xxxxx) is unknown to the system (i.e. there is no corresponding 
>> entry in the password file). Please contact your system administrator 
>> for a fix. 
>> -------------------------------------------------------------------------- 
>> [cluster-beta.rz.RWTH-Aachen.DE:31535] [[57885,0],0] ORTE_ERROR_LOG: Fatal 
>> in file plm_rsh_module.c at line 1058 
>> -------------------------------------------------------------------------- 
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> 
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> <Mail-Anhang.gif>
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to