Re: [gridengine users] SGE and NFS

Bill Bryce Wed, 12 Nov 2014 09:00:58 -0800

more suggestions....

Since you are using Bright Cluster Manager to mange the configuration of Grid 
Engine, you should talk to Bright support and make sure there are no unwanted 
side effects caused by changing the configuration.  Several of our customers 
forget that Bright is managing the Grid Engine cluster and modify the Grid 
Engine configuration directly - then 10 minutes later Bright 'rewrites' the 
config.


If everything is local you can't have a shadow master since the spool directory 
is on the local grid engine master.  The accounting files will also be written 
on the master - so accessing them from other machines won't work unless you 
copy things around.

Regards, 

Bill.

On Nov 12, 2014, at 11:31 AM, Chris Dagdigian <[email protected]> wrote:

> my $.02
> 
> SGE can run 100% local without NFS - the main thing (in my experience) that 
> you lose in this config is the easy troublshooting ability of going into a 
> central $SGE_ROOT/$SGE_CELL/ and seeing all of the various node spool and 
> message files. It's annoying but not a dealbreaker especially after seeing 
> what you are experiencing.
> 
> That said, I do a ton of SGE work with classic spooling on EMC Isilon storage 
> - some environments that do close to 1 million jobs/month in throughput and 
> we've never seen a catastrophic loss of jobs or spool data. Most are without 
> Bright although I know of at least one group running Bright on 1000 cores 
> sitting on top of Isilon storage and they've not seen anything like this 
> either.
> 
> If you go 100% local my recommendation would just be to put the whole 
> $SGE_ROOT out on the local nodes. The time it would take to winnow down to 
> the minimal file set is not worth it relative to the size of the whole thing.
> 
> -Chris
> 
> 
>> Peskin, Eric <mailto:[email protected]>
>> November 12, 2014 at 8:26 AM
>> All,
>> 
>> Does SGE have to use NFS or can it work locally on each node?
>> If parts of it have to be on NFS, what is the minimal subset?
>> How much of this changes if you want redundant masters?
>> 
>> We have a cluster running CentOS 6.3, Bright Cluster Manager 6.0, and SGE 
>> 2011.11. Specifically, SGE is provided by a Bright package: 
>> sge-2011.11-360_cm6.0.x86_64
>> 
>> Twice, we have lost all the running SGE jobs when the cluster failed over 
>> from one head node to the other. =( Not supposed to happen.
>> Since then, we have also had many individual jobs get lost. The later 
>> situation correlates with messages in the system logs saying
>> 
>> 
>> That file lives on an NFS mount on our Isilon storage.
>> Surely, the executables don't have to be on NFS?
>> Interesting, we are using local spooling, the spool directory on each node 
>> is /cm/local/apps/sge/var/spool , which is, indeed local.
>> But the $SGE_ROOT , /cm/shared/apps/sge/2011.11 lives on NFS.
>> Does any of it need to?
>> Maybe just the var part would need to: /cm/shared/apps/sge/var ?
>> 
>> Thanks,
>> Eric
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SGE and NFS

Reply via email to