Thank you.. will try this and get back. Any other step being missed here for migration?
Thankyou, Purvesh On Mon, 24 Apr 2023 at 12:08, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk> wrote: > On 4/24/23 08:09, Purvesh Parmar wrote: > > thank you, however, because this is change in the data center, the names > > of the servers contain datacenter names as well in its hostname and in > > fqdn as well, hence i have to change both, hostnames as well as ip > > addresses, compulsorily, to given hostnames as per new DC names. > > Could your data center be persuaded to introduce DNS CNAME aliases for the > old names to point to the new DC names? > > If you're forced to use new DNS names only, then it's simple to change DNS > names of compute nodes and partitions in slurm.conf: > > NodeName=... > PartitionName=xxx Nodes=... > > as well as the slurmdb server name: > > AccountingStorageHost=... > > What I have never tried before is to change the DNS name of the slurmctld > host: > > ControlMachine=... > > The critical aspect here is that you need to stop all batch jobs, plus > slurmdbd and slurmctld. Then you can backup (tar-ball) and transfer the > Slurm state directories: > > StateSaveLocation=/var/spool/slurmctld > > However, I don't know if the name of the ControlMachine is hard-coded in > the StateSaveLocation files? > > I strongly suggest that you try to make a test migration of the cluster to > the new DC to find out if it works or not. Then you can always make > multiple attempts without breaking anything. > > Best regards, > Ole > > > > On Mon, 24 Apr 2023 at 11:25, Ole Holm Nielsen < > ole.h.niel...@fysik.dtu.dk > > <mailto:ole.h.niel...@fysik.dtu.dk>> wrote: > > > > On 4/24/23 06:58, Purvesh Parmar wrote: > > > thank you, but its change of hostnames as well, apart from ip > > addresses > > > as well of the slurm server, database serverver name and slurmd > > compute > > > nodes as well. > > > > I suggest that you talk to your networking people and request that > the > > old > > DNS names be created in the new network's DNS for your Slurm cluster. > > Then Ryan's solution will work. Changing DNS names is a very simple > > matter! > > > > My 2 cents, > > Ole > > > > > > > On Mon, 24 Apr 2023 at 10:04, Ryan Novosielski > > <novos...@rutgers.edu <mailto:novos...@rutgers.edu> > > > <mailto:novos...@rutgers.edu <mailto:novos...@rutgers.edu>>> > wrote: > > > > > > I think it’s easier than all of this. Are you actually > changing > > names > > > of all of these things, or just IP addresses? It they all > > resolve to > > > an IP now and you can bring everything down and change the > > hosts files > > > or DNS, it seems to me that if the names aren’t changing, > > that’s that. > > > I know that “scontrol show cluster” will show the wrong IP > > address but > > > I think that updates itself. > > > > > > The names of the servers are in slurm.conf, but again, if the > names > > > don’t change, that won’t matter. If you have IPs there, you > > will need > > > to change them. > > > > > > Sent from my iPhone > > > > > > > On Apr 23, 2023, at 14:01, Purvesh Parmar > > <purveshp0...@gmail.com <mailto:purveshp0...@gmail.com> > > > <mailto:purveshp0...@gmail.com > > <mailto:purveshp0...@gmail.com>>> wrote: > > > > > > > > Hello, > > > > > > > > We have slurm 21.08 on ubuntu 20. We have a cluster of 8 > nodes. > > > Entire slurm communication happens over 192.168.5.x network > (LAN). > > > However as per requirement, now we are migrating the cluster > to > > other > > > premises and there we have 172.16.1.x (LAN). I have to > migrate the > > > entire network including SLURMDBD (mariadb), SLURMCTLD, > SLURMD. > > ALso > > > the cluster network is also changing from 192.168.5.x to > 172.16.1.x > > > and each node will be assigned the ip address from the > 172.16.1.x > > > network. > > > > The cluster has been running for the last 3 months and it > is > > > required to maintain the old usage stats as well. > > > > > > > > > > > > Is the procedure correct as below : > > > > > > > > 1) Stop slurm > > > > 2) suspend all the queued jobs > > > > 3) backup slurm database > > > > 4) change the slurm & munge configuration i.e. munge conf, > > mariadb > > > conf, slurmdbd.conf, slurmctld.conf, slurmd.conf (on compute > > nodes), > > > gres.conf, service file > > > > 5) Later, do the update in the slurm database by executing > below > > > command > > > > sacctmgr modify node where node=old_name set name=new_name > > > > for all the nodes. > > > > ALso, I think, slurm server name and slurmdbd server names > > are also > > > required to be updated. How to do it, still checking > > > > 6) Finally, start slurmdbd, slurmctld on server and slurmd > on > > > compute nodes > > > > > > > > Please help and guide for above. > > > > > > > > Regards, > > > > > > > > Purvesh Parmar > > > > INHAIT > > > > >