my $.02

SGE can run 100% local without NFS - the main thing (in my experience) that you lose in this config is the easy troublshooting ability of going into a central $SGE_ROOT/$SGE_CELL/ and seeing all of the various node spool and message files. It's annoying but not a dealbreaker especially after seeing what you are experiencing.

That said, I do a ton of SGE work with classic spooling on EMC Isilon storage - some environments that do close to 1 million jobs/month in throughput and we've never seen a catastrophic loss of jobs or spool data. Most are without Bright although I know of at least one group running Bright on 1000 cores sitting on top of Isilon storage and they've not seen anything like this either.

If you go 100% local my recommendation would just be to put the whole $SGE_ROOT out on the local nodes. The time it would take to winnow down to the minimal file set is not worth it relative to the size of the whole thing.

-Chris


Peskin, Eric <mailto:[email protected]>
November 12, 2014 at 8:26 AM
All,

Does SGE have to use NFS or can it work locally on each node?
If parts of it have to be on NFS, what is the minimal subset?
How much of this changes if you want redundant masters?

We have a cluster running CentOS 6.3, Bright Cluster Manager 6.0, and SGE 2011.11. Specifically, SGE is provided by a Bright package: sge-2011.11-360_cm6.0.x86_64

Twice, we have lost all the running SGE jobs when the cluster failed over from one head node to the other. =( Not supposed to happen. Since then, we have also had many individual jobs get lost. The later situation correlates with messages in the system logs saying


That file lives on an NFS mount on our Isilon storage.
Surely, the executables don't have to be on NFS?
Interesting, we are using local spooling, the spool directory on each node is /cm/local/apps/sge/var/spool , which is, indeed local.
But the $SGE_ROOT , /cm/shared/apps/sge/2011.11 lives on NFS.
Does any of it need to?
Maybe just the var part would need to: /cm/shared/apps/sge/var ?

Thanks,
Eric



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to