Hi, > Am 14.11.2018 um 01:06 schrieb ad...@genome.arizona.edu: > > We have a cluster with gridengine 6.5u2 and noticing a strange behavior when > running MPI jobs. Our application will finish, yet the processes continue to > run and use up the CPU. We did configure a parallel environment for MPI as > follows: > > pe_name mpi > slots 500 > user_lists NONE > xuser_lists NONE > start_proc_args NONE > stop_proc_args NONE > allocation_rule $round_robin > control_slaves TRUE > job_is_first_task FALSE > urgency_slots min > accounting_summary FALSE > > Then we have run our application "Maker" like this, > qsub -cwd -N <NAME> -b y -V -pe mpi <CPUs> /opt/mpich-install/bin/mpiexec > maker <maker options>
Which version of MPICH are you using? Maybe it's not tightly integrated. -- Reuti > It seems to run fine and qstat will show it running. Once it has completed, > qstat is empty again and we have the desired output. However, the "maker" > process have continued to run on the compute nodes until I login to each node > and "kill -9" the processes. We did not have this problem when running > mpiexec directly with Maker, or running Maker in stand-alone mode (without > MPI), so I guess it is a problem with our qsub command or parallel > environment? Any Ideas? > > Thanks, > -- > Chandler / Systems Administrator > Arizona Genomics Institute > www.genome.arizona.edu > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users