Re: [OMPI users] Fwd: [GE users] Open MPI job fails when run thru SGE

Sangamesh B Mon, 2 Feb 2009 05:31:59 -0500

On Mon, Feb 2, 2009 at 12:15 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 02.02.2009 um 05:44 schrieb Sangamesh B:
>
>> On Sun, Feb 1, 2009 at 10:37 PM, Reuti <re...@staff.uni-marburg.de> wrote:
>>>
>>> Am 01.02.2009 um 16:00 schrieb Sangamesh B:
>>>
>>>> On Sat, Jan 31, 2009 at 6:27 PM, Reuti <re...@staff.uni-marburg.de>
>>>> wrote:
>>>>>
>>>>> Am 31.01.2009 um 08:49 schrieb Sangamesh B:
>>>>>
>>>>>> On Fri, Jan 30, 2009 at 10:20 PM, Reuti <re...@staff.uni-marburg.de>
>>>>>> wrote:
>>>>>>>
>>>>>>> Am 30.01.2009 um 15:02 schrieb Sangamesh B:
>>>>>>>
>>>>>>>> Dear Open MPI,
>>>>>>>>
>>>>>>>> Do you have a solution for the following problem of Open MPI (1.3)
>>>>>>>> when run through Grid Engine.
>>>>>>>>
>>>>>>>> I changed global execd params with H_MEMORYLOCKED=infinity and
>>>>>>>> restarted the sgeexecd in all nodes.
>>>>>>>>
>>>>>>>> But still the problem persists:
>>>>>>>>
>>>>>>>>  $cat err.77.CPMD-OMPI
>>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>>
>>>>>>> I think this might already be the reason why it's not working. A
>>>>>>> mpihello
>>>>>>> program is running fine through SGE?
>>>>>>>
>>>>>> No.
>>>>>>
>>>>>> Any Open MPI parallel job thru SGE runs only if its running on a
>>>>>> single node (i.e. 8processes on 8 cores of a single node). If number
>>>>>> of processes is more than 8, then SGE will schedule it on 2 nodes -
>>>>>> the job will fail with the above error.
>>>>>>
>>>>>> Now I did a loose integration of Open MPI 1.3 with SGE. The job runs,
>>>>>> but all 16 processes run on a single node.
>>>>>
>>>>> What are the entries in `qconf -sconf`for:
>>>>>
>>>>> rsh_command
>>>>> rsh_daemon
>>>>>
>>>> $ qconf -sconf
>>>> global:
>>>> execd_spool_dir              /opt/gridengine/default/spool
>>>> ...
>>>> .....
>>>> qrsh_command                 /usr/bin/ssh
>>>> rsh_command                  /usr/bin/ssh
>>>> rlogin_command               /usr/bin/ssh
>>>> rsh_daemon                   /usr/sbin/sshd
>>>> qrsh_daemon                  /usr/sbin/sshd
>>>> reprioritize                 0
>>>
>>> Do you must use ssh? Often in a private cluster the rsh based one is ok,
>>> or
>>> with SGE 6.2 the built-in mechanism of SGE. Otherwise please follow this:
>>>
>>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>>
>>>
>>>> I think its better to check once with Open MPI 1.2.8
>>>>
>>>>> What is your mpirun command in the jobscript - you are getting there
>>>>> the
>>>>> mpirun from Open MPI? According to the output below, it's not a loose
>>>>> integration, but you prepare alraedy a machinefile, which is
>>>>> superfluous
>>>>> for
>>>>> Open MPI.
>>>>>
>>>> No. I've not prepared the machinefile for Open MPI.
>>>> For Tight integartion job:
>>>>
>>>> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS
>>>> $CPMDBIN/cpmd311-ompi-mkl.x  wf1.in $PP_LIBRARY >
>>>> wf1.out_OMPI$NSLOTS.$JOB_ID
>>>>
>>>> For loose integration job:
>>>>
>>>> /opt/mpi/openmpi/1.3/intel/bin/mpirun -np $NSLOTS -hostfile
>>>> $TMPDIR/machines  $CPMDBIN/cpmd311-ompi-mkl.x  wf1.in $PP_LIBRARY >
>>>> wf1.out_OMPI_$JOB_ID.$NSLOTS
>>>
>>> a) you compiled Open MPI with "--with-sge"?
>>>
>> Yes. But ompi_info shows only one component of sge
>>
>> $ /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
>>                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)
>>
>>> b) when the $SGE_ROOT variable is set, Open MPI will use a Tight
>>> Integration
>>> automatically.
>>>
>> In SGE job submit script, I set SGE_ROOT= <nothing>
>
> This will set the variable to an empty string. You need to use:
>
> unset SGE_ROOT
>
Right.
I used 'unset SGE_ROOT' in the job submission script. Its working now.
Hello world jobs are working now. (single & multiple nodes)


Thank you for the help.

What can be the problem with tight integration?

Regards,
Sangamesh
> Despite the mentioned error message on the list, I can run Open MPI 1.3 with
> tight integration into SGE.
>
> -- Reuti
>
>
>> And run a loose integration job. It failed to run with following error:
>> $ cat err.87.Hello-OMPI
>> [node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
>> in file ess_hnp_module.c at line 126
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>  orte_plm_base_select failed
>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>> [node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
>> in file runtime/orte_init.c at line 132
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>  orte_ess_set_name failed
>>  --> Returned value Not found (-13) instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>> [node-0-18.local:08252] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found
>> in file orterun.c at line 454
>>
>> $ cat out.87.Hello-OMPI
>> /opt/gridengine/default/spool/node-0-18/active_jobs/87.1/pe_hostfile
>> ibc18
>> ibc18
>> ibc18
>> ibc18
>> ibc18
>> ibc18
>> ibc18
>> ibc18
>> ibc17
>> ibc17
>> ibc17
>> ibc17
>> ibc17
>> ibc17
>> ibc17
>> ibc17
>>
>>
>>> c) The machine file you presented looks like being for MPICH(1), the
>>> syntax
>>> for Open MPI in the machine is different:
>>>
>>> ibc17 slots=8
>>> ibc12 slots=8
>>>
>> I tested a helloworld program with Open MPI with machinefile of style
>> MPICH(1).
>> It works.
>>
>> So in a loose integration job,
>> Open MPI may not be able to find $TMPDIR/machines file
>> Or it might be running in a Tight integration style.
>>>
>>> So you would have to adjust the format of the generated file and reset
>>> SGE_ROOT inside your jobscript, to force Open MPI to do a loose
>>> integration
>>> only.
>>>
>>> -- Reuti
>>>
>>>
>>>> I think I should check with Open MPI 1.2.8. That may work..
>>>>
>>>> Thanks,
>>>> Sangamesh
>>>>>>
>>>>>> $ cat out.83.Hello-OMPI
>>>>>> /opt/gridengine/default/spool/node-0-17/active_jobs/83.1/pe_hostfile
>>>>>> ibc17
>>>>>> ibc17
>>>>>> ibc17
>>>>>> ibc17
>>>>>> ibc17
>>>>>> ibc17
>>>>>> ibc17
>>>>>> ibc17
>>>>>> ibc12
>>>>>> ibc12
>>>>>> ibc12
>>>>>> ibc12
>>>>>> ibc12
>>>>>> ibc12
>>>>>> ibc12
>>>>>> ibc12
>>>>>> Greetings: 1 of 16 from the node node-0-17.local
>>>>>> Greetings: 10 of 16 from the node node-0-17.local
>>>>>> Greetings: 15 of 16 from the node node-0-17.local
>>>>>> Greetings: 9 of 16 from the node node-0-17.local
>>>>>> Greetings: 14 of 16 from the node node-0-17.local
>>>>>> Greetings: 8 of 16 from the node node-0-17.local
>>>>>> Greetings: 11 of 16 from the node node-0-17.local
>>>>>> Greetings: 12 of 16 from the node node-0-17.local
>>>>>> Greetings: 6 of 16 from the node node-0-17.local
>>>>>> Greetings: 0 of 16 from the node node-0-17.local
>>>>>> Greetings: 5 of 16 from the node node-0-17.local
>>>>>> Greetings: 3 of 16 from the node node-0-17.local
>>>>>> Greetings: 13 of 16 from the node node-0-17.local
>>>>>> Greetings: 4 of 16 from the node node-0-17.local
>>>>>> Greetings: 7 of 16 from the node node-0-17.local
>>>>>> Greetings: 2 of 16 from the node node-0-17.local
>>>>>>
>>>>>> But qhost -u <user name> shows that it is scheduled/running on two
>>>>>> nodes.
>>>>>>
>>>>>> Any body successful in running Open MPI 1.3 tightly integrated with
>>>>>> SGE?
>>>>>
>>>>> For a Tight Integration there's a FAQ:
>>>>>
>>>>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Sangamesh
>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> A daemon (pid 31947) died unexpectedly with status 129 while
>>>>>>>> attempting
>>>>>>>> to launch so we are aborting.
>>>>>>>>
>>>>>>>> There may be more information reported by the environment (see
>>>>>>>> above).
>>>>>>>>
>>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>>>> shared
>>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>>>> have
>>>>>>>> the
>>>>>>>> location of the shared libraries on the remote nodes and this will
>>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>> process
>>>>>>>> that caused that situation.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpirun was unable to cleanly terminate the daemons on the nodes
>>>>>>>> shown
>>>>>>>> below. Additional manual cleanup may be required - please refer to
>>>>>>>> the "orte-clean" tool for assistance.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>    node-0-19.local - daemon did not report back when launched
>>>>>>>>    node-0-20.local - daemon did not report back when launched
>>>>>>>>    node-0-21.local - daemon did not report back when launched
>>>>>>>>    node-0-22.local - daemon did not report back when launched
>>>>>>>>
>>>>>>>> The hostnames for infiniband interfaces are ibc0, ibc1, ibc2 ..
>>>>>>>> ibc23.
>>>>>>>> May be Open MPI is not able to identify hosts as it is using
>>>>>>>> node-0-..
>>>>>>>> . Is this causing open mpi to fail?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sangamesh
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 26, 2009 at 5:09 PM, mihlon <vacl...@fel.cvut.cz> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> Hello SGE users,
>>>>>>>>>>
>>>>>>>>>> The cluster is installed with Rocks-4.3, SGE 6.0 & Open MPI 1.3.
>>>>>>>>>> Open MPI is configured with "--with-sge".
>>>>>>>>>> ompi_info shows only one component:
>>>>>>>>>> # /opt/mpi/openmpi/1.3/intel/bin/ompi_info | grep gridengine
>>>>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)
>>>>>>>>>>
>>>>>>>>>> Is this acceptable?
>>>>>>>>>
>>>>>>>>> maybe yes
>>>>>>>>>
>>>>>>>>> see: http://www.open-mpi.org/faq/?category=building#build-rte-sge
>>>>>>>>>
>>>>>>>>> shell$ ompi_info | grep gridengine
>>>>>>>>> MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.3)
>>>>>>>>> MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.3)
>>>>>>>>>
>>>>>>>>> (Specific frameworks and version numbers may vary, depending on
>>>>>>>>> your
>>>>>>>>> version of Open MPI.)
>>>>>>>>>
>>>>>>>>>> The Open MPI parallel jobs run successfully through command line,
>>>>>>>>>> but
>>>>>>>>>> fail when run thru SGE(with -pe orte <slots>).
>>>>>>>>>>
>>>>>>>>>> The error is:
>>>>>>>>>>
>>>>>>>>>> $ cat err.26.Helloworld-PRL
>>>>>>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> A daemon (pid 8462) died unexpectedly with status 129 while
>>>>>>>>>> attempting
>>>>>>>>>> to launch so we are aborting.
>>>>>>>>>>
>>>>>>>>>> There may be more information reported by the environment (see
>>>>>>>>>> above).
>>>>>>>>>>
>>>>>>>>>> This may be because the daemon was unable to find all the needed
>>>>>>>>>> shared
>>>>>>>>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>>>>>> have
>>>>>>>>>> the
>>>>>>>>>> location of the shared libraries on the remote nodes and this will
>>>>>>>>>> automatically be forwarded to the remote nodes.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>>> process
>>>>>>>>>> that caused that situation.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> mpirun: clean termination accomplished
>>>>>>>>>>
>>>>>>>>>> But the same job runs well, if it runs on a single node but with
>>>>>>>>>> an
>>>>>>>>>> error:
>>>>>>>>>>
>>>>>>>>>> $ cat err.23.Helloworld-PRL
>>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> WARNING: There was an error initializing an OpenFabrics device.
>>>>>>>>>>
>>>>>>>>>> Local host: node-0-4.local
>>>>>>>>>> Local device: mthca0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>>>>>>>>>> This will severely limit memory registrations.
>>>>>>>>>> [node-0-4.local:07869] 7 more processes have sent help message
>>>>>>>>>> help-mpi-btl-openib.txt / error in device init
>>>>>>>>>> [node-0-4.local:07869] Set MCA parameter
>>>>>>>>>> "orte_base_help_aggregate"
>>>>>>>>>> to
>>>>>>>>>> 0 to see all help / error messages
>>>>>>>>>>
>>>>>>>>>> The following link explains the same problem:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=72398
>>>>>>>>>>
>>>>>>>>>> With this reference, I put 'ulimit -l unlimited' into
>>>>>>>>>> /etc/init.d/sgeexecd in all nodes. Restarted the services.
>>>>>>>>>
>>>>>>>>> Do not set 'ulimit -l unlimited' in /etc/init.d/sgeexecd
>>>>>>>>> but set it in the SGE:
>>>>>>>>>
>>>>>>>>> Run   qconf -mconf   and set    execd_params
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> frontend$> qconf -sconf
>>>>>>>>> ...
>>>>>>>>> execd_params                 H_MEMORYLOCKED=infinity
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Then restart all your sgeexecd hosts.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Milan
>>>>>>>>>
>>>>>>>>>> But still the problem persists.
>>>>>>>>>>
>>>>>>>>>> What could be the way out for this?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Sangamesh
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=99133
>>>>>>>>>>
>>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>>> [users-unsubscr...@gridengine.sunsource.net].
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=99461
>>>>>>>>>
>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>> [users-unsubscr...@gridengine.sunsource.net].
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Fwd: [GE users] Open MPI job fails when run thru SGE

Reply via email to