Hello,

we have an aggregated cluster (ScaleMP vSMP System) with 192 Cores and 2TB of RAM and have some trouble with an simple:

 for i in `seq 1 300`; do qsub simple.sh; done

mostly it hangs after round about 120 submitted jobs and the sge_shepherd's are all using 100% cpuload and the simple.sh isn't executed. How could i solve this?

the second problem we have, where i would need help:
we need to use "numactl --physcpubind" for the shellscripts submitted to qsub, they need to run bind to a specific core (due to the hugh size of this aggregated machine) but i don't get it how i can push the numactl in front of the submitted script for qsub, so the user don't need to bother with it and which core is not used etc. any suggestions ? Since qsub mostly needs scripts which are submitted.


the facts:
we have SoGE 8.1.1
we use centos 6.2 on the system
all CPUs are Xeons with 6 Cores (HT disabled) (16 Nodes, 32 Sockets, 192 Cores) and 128GB RAM

thanks
Adam

--
Adam Podstawka
Leibniz-Institut DSMZ-Deutsche Sammlung von Mikro-
organismen und Zellkulturen GmbH
Inhoffenstraße 7 B
38124 Braunschweig
Germany
http://www.dsmz.de

Director: Prof. Dr. Jörg Overmann
Local court: Braunschweig HRB 2570
Chairman of the supervisory board: MR Dr. Axel Kollatschny

DSMZ - A member of the Leibniz Association (WGL)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to