Hi,

> Am 14.11.2018 um 01:06 schrieb ad...@genome.arizona.edu:
> 
> We have a cluster with gridengine 6.5u2 and noticing a strange behavior when 
> running MPI jobs.  Our application will finish, yet the processes continue to 
> run and use up the CPU.  We did configure a parallel environment for MPI as 
> follows:
> 
> pe_name            mpi
> slots              500
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $round_robin
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
> 
> Then we have run our application "Maker" like this,
> qsub -cwd -N <NAME> -b y -V -pe mpi <CPUs> /opt/mpich-install/bin/mpiexec  
> maker <maker options>

Which version of MPICH are you using? Maybe it's not tightly integrated.

-- Reuti


> It seems to run fine and qstat will show it running.  Once it has completed, 
> qstat is empty again and we have the desired output. However, the "maker" 
> process have continued to run on the compute nodes until I login to each node 
> and "kill -9" the processes.  We did not have this problem when running 
> mpiexec directly with Maker, or running Maker in stand-alone mode (without 
> MPI), so I guess it is a problem with our qsub command or parallel 
> environment?  Any Ideas?
> 
> Thanks,
> -- 
> Chandler / Systems Administrator
> Arizona Genomics Institute
> www.genome.arizona.edu
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to