IMHO
Do local spool

Sent from my iPhone

On May 6, 2012, at 2:14 PM, Chris Jewell <[email protected]> wrote:

> Hi All,
> 
> Apologies for cross-posting -- not sure which list is the most active these 
> days…?
> 
> I'm currently having a real issue with our shared SGE_ROOT directory, which 
> also contains spool directories.  It is XFS-formatted on the server, which is 
> also hosts the sgemaster daemon, and shared via NFSv4.
> 
> The cluster has 108 processors, spread over 11 execution nodes, wired up with 
> 1GE.  Under heavy fast scheduling (ie *large* task arrays of very short jobs) 
> we are experiencing server crashes: spinning rpciod and nfsd processes both 
> on clients and on the server cause very high loadavg, alarm states, sgeexecd 
> to go into uninterruptible sleep states, machines falling over etc etc.
> 
> I would have thought that the NFSv4 shared directory would cope with this 
> load, since the cluster is not massive.  However, we have our scheduling 
> delay set to 0, so I'm wondering if this is causing the issue.  I'd like to 
> check your collective experience on this one, before changing the cluster 
> config to use local spool dirs.
> 
> Many thanks,
> 
> Chris
> --
> Dr Chris Jewell
> Department of Statistics
> University of Warwick
> Coventry
> CV4 7AL
> UK
> Tel: +44 (0)24 7615 0778
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to