[gridengine users] sge_shepherd 100% cpuload problem and running all jobs with numactl ?

A. Podstawka Fri, 24 Aug 2012 03:18:12 -0700

Hello,

we have an aggregated cluster (ScaleMP vSMP System) with 192 Cores and2TB of RAM and have some trouble with an simple:


 for i in `seq 1 300`; do qsub simple.sh; done

mostly it hangs after round about 120 submitted jobs and thesge_shepherd's are all using 100% cpuload and the simple.sh isn'texecuted. How could i solve this?


the second problem we have, where i would need help:

we need to use "numactl --physcpubind" for the shellscripts submitted toqsub, they need to run bind to a specific core (due to the hugh size ofthis aggregated machine) but i don't get it how i can push the numactlin front of the submitted script for qsub, so the user don't need tobother with it and which core is not used etc. any suggestions ? Sinceqsub mostly needs scripts which are submitted.



the facts:
we have SoGE 8.1.1
we use centos 6.2 on the system

all CPUs are Xeons with 6 Cores (HT disabled) (16 Nodes, 32 Sockets, 192Cores) and 128GB RAM


thanks
Adam

--
Adam Podstawka
Leibniz-Institut DSMZ-Deutsche Sammlung von Mikro-
organismen und Zellkulturen GmbH
Inhoffenstraße 7 B
38124 Braunschweig
Germany
http://www.dsmz.de

Director: Prof. Dr. Jörg Overmann
Local court: Braunschweig HRB 2570
Chairman of the supervisory board: MR Dr. Axel Kollatschny

DSMZ - A member of the Leibniz Association (WGL)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

[gridengine users] sge_shepherd 100% cpuload problem and running all jobs with numactl ?

Reply via email to