Hi, the text was on the file.err file in the file.out file I get only the name of the node where the program run.
Thanks Donato. On 04/09/2014 15:14, Reuti wrote: > Hi, > > Am 04.09.2014 um 14:43 schrieb Donato Pera: > >> using this script : >> >> #!/bin/bash >> #$ -S /bin/bash >> #$ -pe orte 64 >> #$ -cwd >> #$ -o ./file.out >> #$ -e ./file.err >> >> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH >> export OMP_NUM_THREADS=1 >> >> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ >> PP_PATH=/home/tanzi >> /home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x input >> ${PP_PATH}/PP/ > out > Is this text below in out, file.out or file.err - any hint in the other files? > > -- Reuti > > >> The program run for about 2 minutes and after I get this error >> >> WARNING: A process refused to die! >> >> Host: compute-2-2.local >> PID: 24897 >> >> This process may still be running and/or consuming resources. >> >> -------------------------------------------------------------------------- >> [compute-2-2.local:24889] 25 more processes have sent help message >> help-odls-default.txt / odls-default:could-not-kill >> [compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate" >> to 0 to see all help / error messages >> [compute-2-2.local:24889] 27 more processes have sent help message >> help-odls-default.txt / odls-default:could-not-kill >> -------------------------------------------------------------------------- >> mpirun has exited due to process rank 0 with PID 24896 on >> node compute-2-2.local exiting improperly. There are two reasons this >> could occur: >> >> 1. this process did not call "init" before exiting, but others in >> the job did. This can cause a job to hang indefinitely while it waits >> for all processes to call "init". By rule, if one process calls "init", >> then ALL processes must call "init" prior to termination. >> >> 2. this process called "init", but exited without calling "finalize". >> By rule, all processes that call "init" MUST call "finalize" prior to >> exiting or it will be considered an "abnormal termination" >> >> This may have caused other processes in the application to be >> terminated by signals sent by mpirun (as reported here). >> -------------------------------------------------------------------------- >> [compute-2-2.local:24889] 1 more process has sent help message >> help-odls-default.txt / odls-default:could-not-kill >> >> >> Thanks and Regards Donato >> >> >> >> >> On 03/09/2014 13:19, Reuti wrote: >>> Am 03.09.2014 um 13:11 schrieb Donato Pera: >>> >>>> I get >>>> >>>> ompi_info | grep grid >>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5) >>> Good. >>> >>> >>>> and using this script >>>> >>>> #!/bin/bash >>>> #$ -S /bin/bash >>>> #$ -pe orte 64 >>>> #$ -cwd >>>> #$ -o ./file.out >>>> #$ -e ./file.err >>>> >>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH >>>> export OMP_NUM_THREADS=1 >>>> >>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ >>>> PP_PATH=/home/tanzi >>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64 >>>> -machinefile $TMPDIR/machines ${CPMD_PATH}cpmd.x input ${PP_PATH}/PP/ >>> In the PE "orte" is no "start_proc_args" defined which could generate the >>> machinefile. Please try to start the application with: >>> >>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x >>> input ${PP_PATH}/PP/ >>> >>> -- Reuti >>> >>> >>>>> out >>>> I get this error >>>> >>>> Open RTE was unable to open the hostfile: >>>> /tmp/21213.1.debug.q/machines >>>> Check to make sure the path and filename are correct. >>>> -------------------------------------------------------------------------- >>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>> base/rmaps_base_support_fns.c at line 207 >>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>> rmaps_rr.c at line 82 >>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>> base/rmaps_base_map_job.c at line 88 >>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>> base/plm_base_launch_support.c at line 105 >>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>> plm_rsh_module.c at line 1173 >>>> >>>> >>>> >>>> >>>> >>>> Instead using this script >>>> >>>> >>>> #!/bin/bash >>>> #$ -S /bin/bash >>>> #$ -pe orte 64 >>>> #$ -cwd >>>> #$ -o ./file.out >>>> #$ -e ./file.err >>>> >>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH >>>> export OMP_NUM_THREADS=1 >>>> >>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ >>>> PP_PATH=/home/tanzi >>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64 >>>> $TMPDIR/machines ${CPMD_PATH}cpmd.x input ${PP_PATH}/PP/ > out >>>> >>>> >>>> I get >>>> Executable: /tmp/21214.1.debug.q/machines >>>> Node: compute-2-0.local >>>> >>>> while attempting to start process rank 0. >>>> -------------------------------------------------------------------------- >>>> >>>> can you help me >>>> >>>> >>>> Thanks and Regards Donato >>>> >>>> >>>> >>>> >>>> On 03/09/2014 12:28, Reuti wrote: >>>>> ompi_info | grep grid >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/09/25240.php >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/09/25242.php >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/09/25265.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25266.php >