Hi,

Am 23.11.2011 um 11:28 schrieb mahbube rustaee:

> Thank you very much Mr. Reuti.
> 
> I can run mpi jobs with new open mpi compiled with option --with-sge. some 
> questions:
> 1) job finished and output file was created , but job's state remain running 
> for a little time. why?

this is built in to allow a cleanup of the $TMPDIR on the nodes and in some 
cases the removal of the processes. Although the delay was improved (i.e. 
lowered) in the commercial version of GE, it's still there in the open source 
version AFAIK.


> 2) for request many slots(e.g. 800 or upper slots), job ran completely and 
> at the end output file I got the error:
> 
> mpi-integ-sge-intel.comp:9728 terminated with signal 11 at PC=2b42bb65b60e 
> SP=7fffd361dcd0.  Backtrace:

Signal 11 is segmentation fault which is in some way a programming error 
(either in your application or the library) and happens inside the application 
while it calls some libraries. When I get you right it never ever happens if 
you run it outside of SGE. I don't see any direct relation to SGE here, maybe 
the library isn't checking the available memory and/or tries to access an 
already freed area.

I suggest to raise this issue on the Open MPI mailing list.

-- Reuti


> mpi-integ-sge-intel.comp:9729 terminated with signal 11 at PC=2afc5c30c60e 
> SP=7fff87c03e50.  Backtrace:
> /usr/lib64/libpsm_infinipath.so.1[0x2b42bb65b60e]
> /usr/lib64/libpsm_infinipath.so.1[0x2b42bb66c4de]
> /usr/lib64/libpsm_infinipath.so.1(ips_ptl_shared_poll+0x230)[0x2b42bb66b4d0]
> /usr/lib64/libpsm_infinipath.so.1(psmi_poll_internal+0x50)[0x2b42bb66b080]
> /usr/lib64/libpsm_infinipath.so.1(ips_proto_fini+0x1c0)[0x2b42bb65f190]
> /usr/lib64/libpsm_infinipath.so.1[0x2afc5c30c60e]
> /usr/lib64/libpsm_infinipath.so.1[0x2afc5c31d4de]
> /usr/lib64/libpsm_infinipath.so.1(ips_ptl_shared_poll+0x230)[0x2afc5c31c4d0]
> /usr/lib64/libpsm_infinipath.so.1(psmi_poll_internal+0x50)[0x2afc5c31c080]
> /usr/lib64/libpsm_infinipath.so.1(ips_proto_fini+0x1c0)[0x2afc5c310190]
> /usr/lib64/libpsm_infinipath.so.1[0x2b42bb65b480]
> /usr/lib64/libpsm_infinipath.so.1(psm_ep_close+0x1a9)[0x2b42bb651de9]
> /home/mrustaee/PF/openmpi-1.4.2/intel/lib/openmpi/mca_mtl_psm.so[0x2b42bb42d734]
> 
> 3) I defined a PE with old open mpi (without integration of sge):pe_name      
>       mpifillamd
> slots              9999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /opt/gridengine/mpi/startmpi.sh $pe_hostfile
> stop_proc_args     /opt/gridengine/mpi/stopmpi.sh
> allocation_rule    $fill_up
> control_slaves     FALSE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
> 
> In this case, submited jobs could not run with high request slots.
>  Is that mean "users can not run job with high request slots on loosly 
> integration open mpi " ?
> 
> 
> Thx 
> 
> 
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to