> Am 29.02.2016 um 23:27 schrieb Ben Daniel Pere <ben.p...@gmail.com>: > > It's the other way round. The load used in the load_formula is already > adjusted. You adjust individual values, not the result of any computation > already made with them. > > The computed load_formula will then be used to sort the machines. > > Oh load formula is just for machines priority? so I do see the sense in > normalizing this load by the number of cores (otherwise we'll kill machines > with 24 cores while machines with 56 cores are barely doing anything)
With the current setting of the load_formular you should observe the opposite. The 56 cores machines show up as -56 on an empty machine and have a lower value than -24. These machines are then filled up to 32 cores and all machines have the same load_formular of -24 when now also these other machines a being used for further scheduling. Are these quad-CPU machines without hyperthreading to gain a total of 56 cores? -- Reuti > - and I suppose that's exactly what the default "np_load_avg" does.. awesome! > > > we basically have 2 kinds of queue - a workhorse queue "all.q" which has 1 > > slot per core and an interactive queue which also has 1 slot per core but > > gets a better priority. we set the load_thresholds to 1.3 to allow 30% > > oversubscription to ensure interactive jobs can always run.. we never ever > > put our nodes in alarm mode, we use zabbix to monitor machine's health and > > we automatically take it out of the cluster (by disabling all of it's > > queues) in cases of "mess" (disk failures, out of space, mounting issues, > > stuff like that). > > Are these interactive job generating load, is it used only to allow users to > peek on a machine? > > yes they're generating load, but there aren't many of them and they are > usually very short (seconds to minute-ish), absolutley all our tasks > single-threaded, 100% cpu taking.. we work super hard to relieve other > bottlenecks (filesystem, databases, etc) - doesn't always work perfectly but > for most of our tasks, cpu is our only boundary. > Our cluster is 50 execution hosts, each with 128-256GB RAM and 24-56 cores, > and we have some "support" hardware like an fhgfs cluster for information not > on local disks, mysql servers, etc - we intend to double the size of the > cluster this year and we're preparing by making uses of our "shared" > resources (database, fhgfs-storage) more efficient and by looking at our sge > configuration and trying to figure out what we're doing wrong =) the most > common complaint in our halls is that the cluster isn't responsive enough so > we've created a cluster task force that tried to tackle some issues - I'm a > software engineer but helping with fhgfs and sge configuration as well, so > you're probably going to hear a lot from me soon ;) _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users