We are using the SUN SGE to manage a small cluster system. There are about fifty computing nodes. Users could submit their jobs on one independent manager node using qsub command.
Recently we have faced a problem: One could submit a job and the SGE could dispatch the job into one free computing node. And suppose the returned job id is 1000. When using qstat -j 1000, it shows the job status is r which means the SGE thought the job is running and at the same time, it also shows which machine the job is running on, suppose machine1. But when we using the ssh machine1 and using the top command to show the usage of resources by the running processes related to the job, it shows nothing. Ideally, we expect it shows the CPU usage by the submitting job, but it didn't. We also tried the `qrsh` to login into that node, and there are also no information about the processes about the job. Another problem is, when one submit multiple jobs at the same time, the SGE will dispatch these jobs into one or few computing nodes. But in fact, there are many other computing nodes are free or not busy. What's the possible problem with the SGE? We expect the SGE could preferentially dispatch the latest submitted jobs into the free computing nodes. Could anyone help give some advice or references, please? Thanks!
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
