Hi,

Am 04.09.2014 um 14:43 schrieb Donato Pera:

> using this script :
> 
> #!/bin/bash
> #$ -S /bin/bash
> #$ -pe orte 64
> #$ -cwd
> #$ -o ./file.out
> #$ -e ./file.err
> 
> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
> export OMP_NUM_THREADS=1
> 
> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
> PP_PATH=/home/tanzi
> /home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x  input
> ${PP_PATH}/PP/ > out

Is this text below in out, file.out or file.err - any hint in the other files?

-- Reuti


> 
> The program run for about 2 minutes and after I get this error
> 
> WARNING: A process refused to die!
> 
> Host: compute-2-2.local
> PID:  24897
> 
> This process may still be running and/or consuming resources.
> 
> --------------------------------------------------------------------------
> [compute-2-2.local:24889] 25 more processes have sent help message
> help-odls-default.txt / odls-default:could-not-kill
> [compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate"
> to 0 to see all help / error messages
> [compute-2-2.local:24889] 27 more processes have sent help message
> help-odls-default.txt / odls-default:could-not-kill
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 24896 on
> node compute-2-2.local exiting improperly. There are two reasons this
> could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> 
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [compute-2-2.local:24889] 1 more process has sent help message
> help-odls-default.txt / odls-default:could-not-kill
> 
> 
> Thanks and Regards Donato
> 
> 
> 
> 
> On 03/09/2014 13:19, Reuti wrote:
>> Am 03.09.2014 um 13:11 schrieb Donato Pera:
>> 
>>> I get
>>> 
>>> ompi_info | grep grid
>>>                MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
>> Good.
>> 
>> 
>>> and using this script
>>> 
>>> #!/bin/bash
>>> #$ -S /bin/bash
>>> #$ -pe orte 64
>>> #$ -cwd
>>> #$ -o ./file.out
>>> #$ -e ./file.err
>>> 
>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>>> export OMP_NUM_THREADS=1
>>> 
>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>>> PP_PATH=/home/tanzi
>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>>> -machinefile $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/
>> In the PE "orte" is no "start_proc_args" defined which could generate the 
>> machinefile. Please try to start the application with:
>> 
>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x  
>> input ${PP_PATH}/PP/
>> 
>> -- Reuti
>> 
>> 
>>>> out
>>> 
>>> I get this error
>>> 
>>> Open RTE was unable to open the hostfile:
>>>   /tmp/21213.1.debug.q/machines
>>> Check to make sure the path and filename are correct.
>>> --------------------------------------------------------------------------
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> base/rmaps_base_support_fns.c at line 207
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> rmaps_rr.c at line 82
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> base/rmaps_base_map_job.c at line 88
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> base/plm_base_launch_support.c at line 105
>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file
>>> plm_rsh_module.c at line 1173
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Instead using this script
>>> 
>>> 
>>> #!/bin/bash
>>> #$ -S /bin/bash
>>> #$ -pe orte 64
>>> #$ -cwd
>>> #$ -o ./file.out
>>> #$ -e ./file.err
>>> 
>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH
>>> export OMP_NUM_THREADS=1
>>> 
>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/
>>> PP_PATH=/home/tanzi
>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64
>>> $TMPDIR/machines  ${CPMD_PATH}cpmd.x  input ${PP_PATH}/PP/ > out
>>> 
>>> 
>>> I get
>>> Executable: /tmp/21214.1.debug.q/machines
>>> Node: compute-2-0.local
>>> 
>>> while attempting to start process rank 0.
>>> --------------------------------------------------------------------------
>>> 
>>> can you help me
>>> 
>>> 
>>> Thanks and Regards Donato
>>> 
>>> 
>>> 
>>> 
>>> On 03/09/2014 12:28, Reuti wrote:
>>>> ompi_info | grep grid
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/09/25240.php
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/09/25242.php
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/09/25265.php

Reply via email to