I will +1 Paul's recommendation and temper it with the fact that we're still 
running 14.11.11 (at least for now)

In early testing of archive with purge and restore functionality in advance of 
upgrade to newer Slurm version I found that large archive files crash the 
restore process and cannot be restored using sacctmgr.

For example...

[root@dev db]# sacctmgr archive load 
file=/state/partition1/slurm_archive/db/job_archive_2015-04-01T00\:00\:00_2016-05-31T23\:59\:59
sacctmgr: error: slurmdbd: Getting response to message type 1460
sacctmgr: error: slurmdbd: DBD_ARCHIVE_LOAD failure: No error
 Problem loading archive file: Unspecified error

[root@dev db]# ls -lh 
/state/partition1/slurm_archive/db/job_archive_2015-04-01T00\:00\:00_2016-05-31T23\:59\:59
-rw------- 1 slurm slurm 223M Jun  1 06:02 
/state/partition1/slurm_archive/db/job_archive_2015-04-01T00:00:00_2016-05-31T23:59:59

Other smaller archives can be restored...

[root@dev db]# sacctmgr archive load 
file=/state/partition1/slurm_archive/db/resv_archive_2015-04-01T00\:00\:00_2017-04-30T23\:59\:59
sacctmgr: SUCCESS
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y

[root@dev db]# ls -lh 
/state/partition1/slurm_archive/db/resv_archive_2015-04-01T00\:00\:00_2017-04-30T23\:59\:59
-rw------- 1 slurm slurm 331K Jun  1 08:44 
/state/partition1/slurm_archive/db/resv_archive_2015-04-01T00:00:00_2017-04-30T23:59:59

System in question has 32GB of RAM.

I don't know how large the archive(s) have to be to cause the failure or 
whether this problem exists in newer versions but would recommends testing 
before you purge if you think you will want to restore the archives (even to 
another, non-production DB).

Trevor


> On Jun 8, 2017, at 8:54 AM, Paul Edmon <[email protected]> wrote:
> 
> We use purge here with out much of a problem.  Though I will caution that if 
> you have a large database that you have not ever purged before (when we 
> initially did it we had 3 years of data accrued and we purged down to 6 
> months), you should do it in stages.  So do maybe a few months at a time to 
> walk it up to the level you need.  You can also have the purge archive to 
> disk which       can be handy if you want to maintain historical info.  The 
> purge itself runs monthly at midnight on the 1st of the month.
> -Paul Edmon-
> 
> On 06/08/2017 11:36 AM, Rohan Gadalkar wrote:
>> Hello Dr. Loris,
>> 
>> I will go through the document and will try to work on it. I'll reply you on 
>> the same as soon as I work on this.
>> 
>> 
>> Thanks for the example.
>> 
>> 
>> Regards,
>> Rohan
>> 
>> On Thu, Jun 8, 2017 at 3:17 PM, Loris Bennett <[email protected]> 
>> wrote:
>> 
>> Rohan Gadalkar <[email protected]> writes:
>> 
>> > Re: [slurm-dev] Re: understanding of Purge in Slurmdb.conf
>> >
>> > Hello Dr.Lorris,
>> >
>> > I want to understand how the purge works.
>> >
>> > I mentioned the points like PurgeJobAfter etc; which I did not understand.
>> >
>> > How do I use these parameters in slurmdb.conf ??
>> >
>> > Is there any way that you can make me understand this ??
>> 
>> I'm not sure about that, but if, for example, you write
>> 
>> PurgeJobAfter=6
>> 
>> in your slurmdb.conf, all database entries referring to jobs which are
>> older than 6 months will be deleted at the beginning of each month.
>> 
>> Disclaimer: I haven't used these settings - I am repeating what it
>> says in the documentation.
>> 
>> Cheers,
>> 
>> Loris
>> 
>> --
>> Dr. Loris Bennett (Mr.)
>> ZEDAT, Freie Universität Berlin         Email [email protected]
>> 
> 

Reply via email to