Re: [gridengine users] sge_shepherd 100% cpuload problem and running all jobs with numactl ?

A. Podstawka Fri, 24 Aug 2012 03:51:18 -0700

Hi William,

On 24.08.2012 12:38, William Hay wrote:

On 24 August 2012 11:16, A. Podstawka <adam.podsta...@dsmz.de> wrote:

Hello,


we have an aggregated cluster (ScaleMP vSMP System) with 192 Cores and
2TB of RAM and have some trouble with an simple:

   for i in `seq 1 300`; do qsub simple.sh; done

mostly it hangs after round about 120 submitted jobs and the
sge_shepherd's are all using 100% cpuload and the simple.sh isn't
executed. How could i solve this?

I'd start with an strace of the shepherd to see what it was up to...

ok will try - nice tip, haven't thought about strace


the second problem we have, where i would need help:
we need to use "numactl --physcpubind" for the shellscripts submitted to

You could use a starter_method.  As a recent version of Grid Engine

ok will look at it.

An other idea of mine was a "wrapper" for qsub, so the original qsubwould be called afterwards from the wrapper with an extra script withnumactl in it.. just as an idea, but i prefer native functions, so thanks (:

though I think
SoGE has the ability to bind cores itself though so you may not need to.

have tried it, the binding seems not working, but because of theproblematic with sge_shepherd can't say this 100%



thanks
Adam
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] sge_shepherd 100% cpuload problem and running all jobs with numactl ?

Reply via email to