Hello,
we have an aggregated cluster (ScaleMP vSMP System) with 192 Cores and
2TB of RAM and have some trouble with an simple:
for i in `seq 1 300`; do qsub simple.sh; done
mostly it hangs after round about 120 submitted jobs and the
sge_shepherd's are all using 100% cpuload and the simple.sh isn't
executed. How could i solve this?
the second problem we have, where i would need help:
we need to use "numactl --physcpubind" for the shellscripts submitted to
qsub, they need to run bind to a specific core (due to the hugh size of
this aggregated machine) but i don't get it how i can push the numactl
in front of the submitted script for qsub, so the user don't need to
bother with it and which core is not used etc. any suggestions ? Since
qsub mostly needs scripts which are submitted.
the facts:
we have SoGE 8.1.1
we use centos 6.2 on the system
all CPUs are Xeons with 6 Cores (HT disabled) (16 Nodes, 32 Sockets, 192
Cores) and 128GB RAM
thanks
Adam
--
Adam Podstawka
Leibniz-Institut DSMZ-Deutsche Sammlung von Mikro-
organismen und Zellkulturen GmbH
Inhoffenstraße 7 B
38124 Braunschweig
Germany
http://www.dsmz.de
Director: Prof. Dr. Jörg Overmann
Local court: Braunschweig HRB 2570
Chairman of the supervisory board: MR Dr. Axel Kollatschny
DSMZ - A member of the Leibniz Association (WGL)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users