I will +1 Paul's recommendation and temper it with the fact that we're still running 14.11.11 (at least for now)
In early testing of archive with purge and restore functionality in advance of upgrade to newer Slurm version I found that large archive files crash the restore process and cannot be restored using sacctmgr. For example... [root@dev db]# sacctmgr archive load file=/state/partition1/slurm_archive/db/job_archive_2015-04-01T00\:00\:00_2016-05-31T23\:59\:59 sacctmgr: error: slurmdbd: Getting response to message type 1460 sacctmgr: error: slurmdbd: DBD_ARCHIVE_LOAD failure: No error Problem loading archive file: Unspecified error [root@dev db]# ls -lh /state/partition1/slurm_archive/db/job_archive_2015-04-01T00\:00\:00_2016-05-31T23\:59\:59 -rw------- 1 slurm slurm 223M Jun 1 06:02 /state/partition1/slurm_archive/db/job_archive_2015-04-01T00:00:00_2016-05-31T23:59:59 Other smaller archives can be restored... [root@dev db]# sacctmgr archive load file=/state/partition1/slurm_archive/db/resv_archive_2015-04-01T00\:00\:00_2017-04-30T23\:59\:59 sacctmgr: SUCCESS Would you like to commit changes? (You have 30 seconds to decide) (N/y): y [root@dev db]# ls -lh /state/partition1/slurm_archive/db/resv_archive_2015-04-01T00\:00\:00_2017-04-30T23\:59\:59 -rw------- 1 slurm slurm 331K Jun 1 08:44 /state/partition1/slurm_archive/db/resv_archive_2015-04-01T00:00:00_2017-04-30T23:59:59 System in question has 32GB of RAM. I don't know how large the archive(s) have to be to cause the failure or whether this problem exists in newer versions but would recommends testing before you purge if you think you will want to restore the archives (even to another, non-production DB). Trevor > On Jun 8, 2017, at 8:54 AM, Paul Edmon <[email protected]> wrote: > > We use purge here with out much of a problem. Though I will caution that if > you have a large database that you have not ever purged before (when we > initially did it we had 3 years of data accrued and we purged down to 6 > months), you should do it in stages. So do maybe a few months at a time to > walk it up to the level you need. You can also have the purge archive to > disk which can be handy if you want to maintain historical info. The > purge itself runs monthly at midnight on the 1st of the month. > -Paul Edmon- > > On 06/08/2017 11:36 AM, Rohan Gadalkar wrote: >> Hello Dr. Loris, >> >> I will go through the document and will try to work on it. I'll reply you on >> the same as soon as I work on this. >> >> >> Thanks for the example. >> >> >> Regards, >> Rohan >> >> On Thu, Jun 8, 2017 at 3:17 PM, Loris Bennett <[email protected]> >> wrote: >> >> Rohan Gadalkar <[email protected]> writes: >> >> > Re: [slurm-dev] Re: understanding of Purge in Slurmdb.conf >> > >> > Hello Dr.Lorris, >> > >> > I want to understand how the purge works. >> > >> > I mentioned the points like PurgeJobAfter etc; which I did not understand. >> > >> > How do I use these parameters in slurmdb.conf ?? >> > >> > Is there any way that you can make me understand this ?? >> >> I'm not sure about that, but if, for example, you write >> >> PurgeJobAfter=6 >> >> in your slurmdb.conf, all database entries referring to jobs which are >> older than 6 months will be deleted at the beginning of each month. >> >> Disclaimer: I haven't used these settings - I am repeating what it >> says in the documentation. >> >> Cheers, >> >> Loris >> >> -- >> Dr. Loris Bennett (Mr.) >> ZEDAT, Freie Universität Berlin Email [email protected] >> >
