Okay got it! Thanks for the information!
From: Bill Bryce [mailto:[email protected]] Sent: Friday, June 03, 2016 6:23 AM To: Coleman, Marcus [JRDUS Non-J&J] Cc: [email protected] Subject: Re: [gridengine users] Shadow master Hi Marcus, There are several things you need to do: 1) You need to make sure that $SGE_ROOT is shared between the master and all potential shadow_master machines (the master needs to run a shadow_master as well) this is how the new qmaster will take over if things go wrong on the real qmaster. the shadow_master will start a new qmaster which will simply read the configuration open the spooling and start rebuilding the state of the system. (Most smaller clusters just put $SGE_ROOT on a shared filesystem like NFS, larger clusters use an Appliance like NetApp or Isilon, or something similar. 2) You need to run the shadow_master installation on all hosts that will be shadow masters and the qmaster, there is an installation script '$SGE_ROOT/inst_sge -sm', it does all sorts of things including installing shadow masters. (Note: if you google you will probably find instructions saying - create a shadow_master file in $SGE_ROOT/default and put your host names in there. Yes this does work but it doesn’t check a few more things to make sure the shadow_master will work) 3) The shadow master daemon needs to be running on the qmaster machine and all machines that could be masters (all potential backup machines) need to be running a shadow_master. If you ran the ‘inst_sge -sm’ above then it also installed and started the shadow daemon for you. Make sure you run inst_sge -sm on all shadow master candidate machines. You make your changes using ‘qconf’, really try to avoid making any changes to the system by editing the files, while it is possible it is much better to do it with ‘qconf’. You don’t need to create queues on the shadow_masters. It is a pretty simple system. The shadow_master daemons are really just heartbeat daemons. They talk to each other and the one on the master pings the qmaster to see if it is alive. If it is dead then they ‘wait a bit’ and start the process of migrating the master. The master candidate machine shadow_master daemon will launch a new qmaster which promptly takes over and writes its name into the act_qmaster file and becomes the master of the cluster. So you see the shadow_master doesn’t run anything - it starts a new qmaster which reads configuration, opens spooling, rebuilds state and the cluster runs as usual. Regards, Bill. On Jun 2, 2016, at 7:35 PM, Coleman, Marcus [JRDUS Non-J&J] <[email protected]<mailto:[email protected]>> wrote: Another Question I have about shadow master is do I need to configure/create the same queues that I have on the setup on the master? How will the shadow master run the jobs without queues? From: Coleman, Marcus [JRDUS Non-J&J] Sent: Thursday, June 02, 2016 4:19 PM To: '[email protected]<mailto:[email protected]>' Subject: Shadow master Hi All My main question is do we make all the configuration changes on the Slave or Master? Does the “shadow_masters” file need to be on the slave or master? I have the file on both Does the shadowd needs to be running on the slave or master? I have the daemon running on the slave I have the $SGE_ROOT/$SGE_CELL/common and /spool directory shared via NFS in FSTAB… Is this correctly configured or am I missing somethings… _______________________________________________ users mailing list [email protected]<mailto:[email protected]> https://gridengine.org/mailman/listinfo/users William Bryce | VP Products Univa Corporation, Toronto E: [email protected]<mailto:[email protected]> | D: 647-9742841 | Toll-Free (800) 370-5320 W: Univa.com<http://Univa.com> | FB: facebook.com/univa.corporation<http://facebook.com/univa.corporation> | T: twitter.com/Grid_Engine<http://twitter.com/Grid_Engine>
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
