Hi, Am 23.11.2011 um 11:28 schrieb mahbube rustaee:
> Thank you very much Mr. Reuti. > > I can run mpi jobs with new open mpi compiled with option --with-sge. some > questions: > 1) job finished and output file was created , but job's state remain running > for a little time. why? this is built in to allow a cleanup of the $TMPDIR on the nodes and in some cases the removal of the processes. Although the delay was improved (i.e. lowered) in the commercial version of GE, it's still there in the open source version AFAIK. > 2) for request many slots(e.g. 800 or upper slots), job ran completely and > at the end output file I got the error: > > mpi-integ-sge-intel.comp:9728 terminated with signal 11 at PC=2b42bb65b60e > SP=7fffd361dcd0. Backtrace: Signal 11 is segmentation fault which is in some way a programming error (either in your application or the library) and happens inside the application while it calls some libraries. When I get you right it never ever happens if you run it outside of SGE. I don't see any direct relation to SGE here, maybe the library isn't checking the available memory and/or tries to access an already freed area. I suggest to raise this issue on the Open MPI mailing list. -- Reuti > mpi-integ-sge-intel.comp:9729 terminated with signal 11 at PC=2afc5c30c60e > SP=7fff87c03e50. Backtrace: > /usr/lib64/libpsm_infinipath.so.1[0x2b42bb65b60e] > /usr/lib64/libpsm_infinipath.so.1[0x2b42bb66c4de] > /usr/lib64/libpsm_infinipath.so.1(ips_ptl_shared_poll+0x230)[0x2b42bb66b4d0] > /usr/lib64/libpsm_infinipath.so.1(psmi_poll_internal+0x50)[0x2b42bb66b080] > /usr/lib64/libpsm_infinipath.so.1(ips_proto_fini+0x1c0)[0x2b42bb65f190] > /usr/lib64/libpsm_infinipath.so.1[0x2afc5c30c60e] > /usr/lib64/libpsm_infinipath.so.1[0x2afc5c31d4de] > /usr/lib64/libpsm_infinipath.so.1(ips_ptl_shared_poll+0x230)[0x2afc5c31c4d0] > /usr/lib64/libpsm_infinipath.so.1(psmi_poll_internal+0x50)[0x2afc5c31c080] > /usr/lib64/libpsm_infinipath.so.1(ips_proto_fini+0x1c0)[0x2afc5c310190] > /usr/lib64/libpsm_infinipath.so.1[0x2b42bb65b480] > /usr/lib64/libpsm_infinipath.so.1(psm_ep_close+0x1a9)[0x2b42bb651de9] > /home/mrustaee/PF/openmpi-1.4.2/intel/lib/openmpi/mca_mtl_psm.so[0x2b42bb42d734] > > 3) I defined a PE with old open mpi (without integration of sge):pe_name > mpifillamd > slots 9999 > user_lists NONE > xuser_lists NONE > start_proc_args /opt/gridengine/mpi/startmpi.sh $pe_hostfile > stop_proc_args /opt/gridengine/mpi/stopmpi.sh > allocation_rule $fill_up > control_slaves FALSE > job_is_first_task FALSE > urgency_slots min > accounting_summary FALSE > > In this case, submited jobs could not run with high request slots. > Is that mean "users can not run job with high request slots on loosly > integration open mpi " ? > > > Thx > > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users