Hi Dale,

We are trying to determine where the spool directory should reside based on 
performance
 Versus ease of administration.  Can somebody explain how ease of 
administration would
 be made easier?

Here is a short answer:

When the spool directory is shared it is far easier for an administrator to troubleshoot node-specific job issues. This is because you can see/access all of the spool/<nodename/messages files in one convenient location without having to hop to a specific machine.

When spool is not shared your spool data and messages are on local disk on the compute nodes. This means that you have to connect to that node in order to read or examine the files.

More detail ...


The decision to do shared or not-shared generally revolves around the power of your NFS server, what else is talking on that same network/subnet/vlan/wire and probably more importantly how many jobs you might be running through your system during a day. The number of jobs entering and existing the system is the real factor on how often and hard your spool share is getting hit. Some of my pharma clusters run hours-long jobs and might only do a few hundred or thousand jobs per day. Another biotech cluster of similar size might be doing 150,000 jobs per day running short chemical simulations.

My gut answer is usually to do shared-spool first and only move away from that if performance demands it. Changing the spooling location post-install is not a huge deal.

I'm also a classic spooling zealot. I hate berkeleydb spooling and even on the 2000 core cluster that does 150,000 jobs per day we still use classic spooling on a NFS shared SGE Root and spool. We are, however, using Isilon scale-out NAS for the NFS and that means we have no real performance issues at all.

My $.02

-Chris



_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to