I'm chasing my tail here.... OF COURSE files are "disappearing" from the corpus directory, they get updated with todays/this weeks content, they don't get renamed/deleted they get changed to logs from today - I've been looking in the wrong place.
Looks like corpus-hourly shouldn't be working from the corpus directory when re-calculating the class files for previous days but I clearly need to have a break and relax Paul On Sat, 25 May 2019 at 18:05, Paul Stead <paul.st...@gmail.com> wrote: > The 14:05 run has finished, here's the before and after in terms of output > on ruleqa (attached) > > I saw files disappear in the /usr/local/spamassassin/automc/rsync/corpus > from 18 May but still can't find the trigger that is removing these files. > > Will come back to this later if no one has any ideas > > On Sat, 25 May 2019 at 17:54, Paul Stead <paul.st...@gmail.com> wrote: > >> TLDR; >> Any pointers on what might be clearing up the old or "invalid" files in >> /usr/local/spamassassin/automc/rsync/corpus? >> >> ---- >> >> I'm going on the opinion that some function is cleaning up the >> >> /usr/local/spamassassin/automc/rsync/corpus >> >> directory underneath the corpus-hourly script - though I've so far been >> unable to distinguish what. There seems to be a lot of superfluous scripts >> hanging around in the svn directories. >> >> As far as I can tell it isn't the corpus-hourly cron, nor the >> /usr/local/bin/checkMasscheckContribs.sh script. >> >> During my investigations I've noticed that the hourly does seem to take >> more than an hour to run, thus two processes can run at the same time >> >> automc 7749 13.9 0.1 40632 19040 ? RN 15:05 3:27 >> /usr/bin/perl -w >> /usr/local/spamassassin/automc/svn/masses/rule-qa/corpus-hourly >> --dir=/usr/local/spamassassin/automc/rsync/corpus >> automc 8708 99.7 0.8 164560 145008 ? RN 15:09 20:10 >> /usr/bin/perl -w ./hit-frequencies -TxpaP -o >> /usr/local/spamassassin/automc/tmp/spam.log.25383 >> /usr/local/spamassassin/automc/tmp/ham.log.25383 >> automc 25383 9.3 0.1 38880 17480 ? SN 14:05 7:56 >> /usr/bin/perl -w >> /usr/local/spamassassin/automc/svn/masses/rule-qa/corpus-hourly >> --dir=/usr/local/spamassassin/automc/rsync/corpus >> >> I'm not 100% that this is causing a problem, I see some protection >> against this for the running files, but I'm not sure about the resulting >> class files that are output. >> >> Paul >> >> On Sat, 25 May 2019 at 13:00, Paul Stead <paul.st...@gmail.com> wrote: >> >>> I'm investingating the problem with disappearing corpus - see the bug >>> report here - >>> >>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7715 >>> >>> Whilst that is an issue, I've realised this might not be everything >>> involved. >>> >>> I'm on the system but I can't find the process that is "cleaning" up the >>> directory at >>> >>> /usr/local/spamassassin/automc/rsync/corpus >>> >>> At first I thought it was the hourly script but I don't think this is >>> true. >>> >>> I've checked through cron.d run scripts and just can't seem to find it - >>> I've a feeling something is deleting logs from the corpus directory >>> prematurely, which then stops it being captured during the hourly when it >>> should - it's a case of < 1 hour. >>> >>> It's possible this script has code to figure out if it's running at UTC >>> or needs an offset similar to the one in the bug. >>> >>> It seems that the script is aware if it is running a nightly or weekly >>> and doesn't run the nightly on a Saturday. >>> >>> Hope you might have an idea of which script I'm referring to? >>> >>> I've "fixed" my problem by moving my corpus check to make sure it >>> completes after 10:00 UTC - this will like fix everyone's but I'd like to >>> make sure that when we say mass check after 09:00 UTC we mean it. >>> >>> Paul >>> >>