Hi,

using this script :

#!/bin/bash
#$ -S /bin/bash
#$ -pe orte 64
#$ -cwd
#$ -o ./file.out
#$ -e ./file.err

export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
export OMP_NUM_THREADS=1

CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
PP_PATH=/home/tanzi
/home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x  input
${PP_PATH}/PP/ > out



The program run for about 2 minutes and after I get this error

WARNING: A process refused to die!

Host: compute-2-2.local
PID:  24897

This process may still be running and/or consuming resources.

--------------------------------------------------------------------------
[compute-2-2.local:24889] 25 more processes have sent help message
help-odls-default.txt / odls-default:could-not-kill
[compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate"
to 0 to see all help / error messages
[compute-2-2.local:24889] 27 more processes have sent help message
help-odls-default.txt / odls-default:could-not-kill
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 24896 on
node compute-2-2.local exiting improperly. There are two reasons this
could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-2-2.local:24889] 1 more process has sent help message
help-odls-default.txt / odls-default:could-not-kill


Thanks and Regards Donato




On 03/09/2014 13:19, Reuti wrote:
> Am 03.09.2014 um 13:11 schrieb Donato Pera:
>
>> I get
>>
>> ompi_info | grep grid
>>                 MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
> Good.
>
>
>> and using this script
>>
>> #!/bin/bash
>> #$ -S /bin/bash
>> #$ -pe orte 64
>> #$ -cwd
>> #$ -o ./file.out
>> #$ -e ./file.err
>>
>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>> export OMP_NUM_THREADS=1
>>
>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>> PP_PATH=/home/tanzi
>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>> -machinefile $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/
> In the PE "orte" is no "start_proc_args" defined which could generate the 
> machinefile. Please try to start the application with:
>
> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x  
> input ${PP_PATH}/PP/
>
> -- Reuti
>
>
>>> out
>>
>> I get this error
>>
>> Open RTE was unable to open the hostfile:
>>    /tmp/21213.1.debug.q/machines
>> Check to make sure the path and filename are correct.
>> --------------------------------------------------------------------------
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> base/rmaps_base_support_fns.c at line 207
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> rmaps_rr.c at line 82
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> base/rmaps_base_map_job.c at line 88
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> base/plm_base_launch_support.c at line 105
>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>> plm_rsh_module.c at line 1173
>>
>>
>>
>>
>>
>> Instead using this script
>>
>>
>> #!/bin/bash
>> #$ -S /bin/bash
>> #$ -pe orte 64
>> #$ -cwd
>> #$ -o ./file.out
>> #$ -e ./file.err
>>
>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>> export OMP_NUM_THREADS=1
>>
>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>> PP_PATH=/home/tanzi
>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>> $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/ > out
>>
>>
>> I get
>> Executable: /tmp/21214.1.debug.q/machines
>> Node: compute-2-0.local
>>
>> while attempting to start process rank 0.
>> --------------------------------------------------------------------------
>>
>> can you help me
>>
>>
>> Thanks and Regards Donato
>>
>>
>>
>>
>> On 03/09/2014 12:28, Reuti wrote:
>>> ompi_info | grep grid
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25240.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25242.php
>

Reply via email to