Hi, Am 04.09.2014 um 14:43 schrieb Donato Pera:
> using this script : > > #!/bin/bash > #$ -S /bin/bash > #$ -pe orte 64 > #$ -cwd > #$ -o ./file.out > #$ -e ./file.err > > export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH > export OMP_NUM_THREADS=1 > > CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ > PP_PATH=/home/tanzi > /home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x input > ${PP_PATH}/PP/ > out Is this text below in out, file.out or file.err - any hint in the other files? -- Reuti > > The program run for about 2 minutes and after I get this error > > WARNING: A process refused to die! > > Host: compute-2-2.local > PID: 24897 > > This process may still be running and/or consuming resources. > > -------------------------------------------------------------------------- > [compute-2-2.local:24889] 25 more processes have sent help message > help-odls-default.txt / odls-default:could-not-kill > [compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate" > to 0 to see all help / error messages > [compute-2-2.local:24889] 27 more processes have sent help message > help-odls-default.txt / odls-default:could-not-kill > -------------------------------------------------------------------------- > mpirun has exited due to process rank 0 with PID 24896 on > node compute-2-2.local exiting improperly. There are two reasons this > could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -------------------------------------------------------------------------- > [compute-2-2.local:24889] 1 more process has sent help message > help-odls-default.txt / odls-default:could-not-kill > > > Thanks and Regards Donato > > > > > On 03/09/2014 13:19, Reuti wrote: >> Am 03.09.2014 um 13:11 schrieb Donato Pera: >> >>> I get >>> >>> ompi_info | grep grid >>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5) >> Good. >> >> >>> and using this script >>> >>> #!/bin/bash >>> #$ -S /bin/bash >>> #$ -pe orte 64 >>> #$ -cwd >>> #$ -o ./file.out >>> #$ -e ./file.err >>> >>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH >>> export OMP_NUM_THREADS=1 >>> >>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ >>> PP_PATH=/home/tanzi >>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64 >>> -machinefile $TMPDIR/machines ${CPMD_PATH}cpmd.x input ${PP_PATH}/PP/ >> In the PE "orte" is no "start_proc_args" defined which could generate the >> machinefile. Please try to start the application with: >> >> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x >> input ${PP_PATH}/PP/ >> >> -- Reuti >> >> >>>> out >>> >>> I get this error >>> >>> Open RTE was unable to open the hostfile: >>> /tmp/21213.1.debug.q/machines >>> Check to make sure the path and filename are correct. >>> -------------------------------------------------------------------------- >>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>> base/rmaps_base_support_fns.c at line 207 >>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>> rmaps_rr.c at line 82 >>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>> base/rmaps_base_map_job.c at line 88 >>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>> base/plm_base_launch_support.c at line 105 >>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>> plm_rsh_module.c at line 1173 >>> >>> >>> >>> >>> >>> Instead using this script >>> >>> >>> #!/bin/bash >>> #$ -S /bin/bash >>> #$ -pe orte 64 >>> #$ -cwd >>> #$ -o ./file.out >>> #$ -e ./file.err >>> >>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH >>> export OMP_NUM_THREADS=1 >>> >>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ >>> PP_PATH=/home/tanzi >>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64 >>> $TMPDIR/machines ${CPMD_PATH}cpmd.x input ${PP_PATH}/PP/ > out >>> >>> >>> I get >>> Executable: /tmp/21214.1.debug.q/machines >>> Node: compute-2-0.local >>> >>> while attempting to start process rank 0. >>> -------------------------------------------------------------------------- >>> >>> can you help me >>> >>> >>> Thanks and Regards Donato >>> >>> >>> >>> >>> On 03/09/2014 12:28, Reuti wrote: >>>> ompi_info | grep grid >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/09/25240.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/09/25242.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25265.php