Hi,

the text was on the file.err file in the file.out file I get only the name
of the node where the program run.

Thanks Donato.


On 04/09/2014 15:14, Reuti wrote:
> Hi,
>
> Am 04.09.2014 um 14:43 schrieb Donato Pera:
>
>> using this script :
>>
>> #!/bin/bash
>> #$ -S /bin/bash
>> #$ -pe orte 64
>> #$ -cwd
>> #$ -o ./file.out
>> #$ -e ./file.err
>>
>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>> export OMP_NUM_THREADS=1
>>
>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>> PP_PATH=/home/tanzi
>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x  input
>> ${PP_PATH}/PP/ > out
> Is this text below in out, file.out or file.err - any hint in the other files?
>
> -- Reuti
>
>
>> The program run for about 2 minutes and after I get this error
>>
>> WARNING: A process refused to die!
>>
>> Host: compute-2-2.local
>> PID:  24897
>>
>> This process may still be running and/or consuming resources.
>>
>> --------------------------------------------------------------------------
>> [compute-2-2.local:24889] 25 more processes have sent help message
>> help-odls-default.txt / odls-default:could-not-kill
>> [compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate"
>> to 0 to see all help / error messages
>> [compute-2-2.local:24889] 27 more processes have sent help message
>> help-odls-default.txt / odls-default:could-not-kill
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 0 with PID 24896 on
>> node compute-2-2.local exiting improperly. There are two reasons this
>> could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> [compute-2-2.local:24889] 1 more process has sent help message
>> help-odls-default.txt / odls-default:could-not-kill
>>
>>
>> Thanks and Regards Donato
>>
>>
>>
>>
>> On 03/09/2014 13:19, Reuti wrote:
>>> Am 03.09.2014 um 13:11 schrieb Donato Pera:
>>>
>>>> I get
>>>>
>>>> ompi_info | grep grid
>>>>                MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
>>> Good.
>>>
>>>
>>>> and using this script
>>>>
>>>> #!/bin/bash
>>>> #$ -S /bin/bash
>>>> #$ -pe orte 64
>>>> #$ -cwd
>>>> #$ -o ./file.out
>>>> #$ -e ./file.err
>>>>
>>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>>>> export OMP_NUM_THREADS=1
>>>>
>>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>>>> PP_PATH=/home/tanzi
>>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>>>> -machinefile $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/
>>> In the PE "orte" is no "start_proc_args" defined which could generate the 
>>> machinefile. Please try to start the application with:
>>>
>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x  
>>> input ${PP_PATH}/PP/
>>>
>>> -- Reuti
>>>
>>>
>>>>> out
>>>> I get this error
>>>>
>>>> Open RTE was unable to open the hostfile:
>>>>   /tmp/21213.1.debug.q/machines
>>>> Check to make sure the path and filename are correct.
>>>> --------------------------------------------------------------------------
>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>>> base/rmaps_base_support_fns.c at line 207
>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>>> rmaps_rr.c at line 82
>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>>> base/rmaps_base_map_job.c at line 88
>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>>> base/plm_base_launch_support.c at line 105
>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>>> plm_rsh_module.c at line 1173
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Instead using this script
>>>>
>>>>
>>>> #!/bin/bash
>>>> #$ -S /bin/bash
>>>> #$ -pe orte 64
>>>> #$ -cwd
>>>> #$ -o ./file.out
>>>> #$ -e ./file.err
>>>>
>>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>>>> export OMP_NUM_THREADS=1
>>>>
>>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>>>> PP_PATH=/home/tanzi
>>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>>>> $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/ > out
>>>>
>>>>
>>>> I get
>>>> Executable: /tmp/21214.1.debug.q/machines
>>>> Node: compute-2-0.local
>>>>
>>>> while attempting to start process rank 0.
>>>> --------------------------------------------------------------------------
>>>>
>>>> can you help me
>>>>
>>>>
>>>> Thanks and Regards Donato
>>>>
>>>>
>>>>
>>>>
>>>> On 03/09/2014 12:28, Reuti wrote:
>>>>> ompi_info | grep grid
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/09/25240.php
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/09/25242.php
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25265.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25266.php
>

Reply via email to