I'm chasing my tail here....

OF COURSE files are "disappearing" from the corpus directory, they get
updated with todays/this weeks content, they don't get renamed/deleted they
get changed to logs from today - I've been looking in the wrong place.

Looks like corpus-hourly shouldn't be working from the corpus directory
when re-calculating the class files for previous days but I clearly need to
have a break and relax


Paul

On Sat, 25 May 2019 at 18:05, Paul Stead <paul.st...@gmail.com> wrote:

> The 14:05 run has finished, here's the before and after in terms of output
> on ruleqa (attached)
>
> I saw files disappear in the /usr/local/spamassassin/automc/rsync/corpus
> from 18 May but still can't find the trigger that is removing these files.
>
> Will come back to this later if no one has any ideas
>
> On Sat, 25 May 2019 at 17:54, Paul Stead <paul.st...@gmail.com> wrote:
>
>> TLDR;
>> Any pointers on what might be clearing up the old or "invalid" files in
>> /usr/local/spamassassin/automc/rsync/corpus?
>>
>> ----
>>
>> I'm going on the opinion that some function is cleaning up the
>>
>> /usr/local/spamassassin/automc/rsync/corpus
>>
>> directory underneath the corpus-hourly script - though I've so far been
>> unable to distinguish what. There seems to be a lot of superfluous scripts
>> hanging around in the svn directories.
>>
>> As far as I can tell it isn't the corpus-hourly cron, nor the
>> /usr/local/bin/checkMasscheckContribs.sh script.
>>
>> During my investigations I've noticed that the hourly does seem to take
>> more than an hour to run, thus two processes can run at the same time
>>
>> automc    7749 13.9  0.1  40632 19040 ?        RN   15:05   3:27
>> /usr/bin/perl -w
>> /usr/local/spamassassin/automc/svn/masses/rule-qa/corpus-hourly
>> --dir=/usr/local/spamassassin/automc/rsync/corpus
>> automc    8708 99.7  0.8 164560 145008 ?       RN   15:09  20:10
>> /usr/bin/perl -w ./hit-frequencies -TxpaP -o
>> /usr/local/spamassassin/automc/tmp/spam.log.25383
>> /usr/local/spamassassin/automc/tmp/ham.log.25383
>> automc   25383  9.3  0.1  38880 17480 ?        SN   14:05   7:56
>> /usr/bin/perl -w
>> /usr/local/spamassassin/automc/svn/masses/rule-qa/corpus-hourly
>> --dir=/usr/local/spamassassin/automc/rsync/corpus
>>
>> I'm not 100% that this is causing a problem, I see some protection
>> against this for the running files, but I'm not sure about the resulting
>> class files that are output.
>>
>> Paul
>>
>> On Sat, 25 May 2019 at 13:00, Paul Stead <paul.st...@gmail.com> wrote:
>>
>>> I'm investingating the problem with disappearing corpus - see the bug
>>> report here -
>>>
>>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7715
>>>
>>> Whilst that is an issue, I've realised this might not be everything
>>> involved.
>>>
>>> I'm on the system but I can't find the process that is "cleaning" up the
>>> directory at
>>>
>>> /usr/local/spamassassin/automc/rsync/corpus
>>>
>>> At first I thought it was the hourly script but I don't think this is
>>> true.
>>>
>>> I've checked through cron.d run scripts and just can't seem to find it -
>>> I've a feeling something is deleting logs from the corpus directory
>>> prematurely, which then stops it being captured during the hourly when it
>>> should - it's a case of < 1 hour.
>>>
>>> It's possible this script has code to figure out if it's running at UTC
>>> or needs an offset similar to the one in the bug.
>>>
>>> It seems that the script is aware if it is running a nightly or weekly
>>> and doesn't run the nightly on a Saturday.
>>>
>>> Hope you might have an idea of which script I'm referring to?
>>>
>>> I've "fixed" my problem by moving my corpus check to make sure it
>>> completes after 10:00 UTC - this will like fix everyone's but I'd like to
>>> make sure that when we say mass check after 09:00 UTC we mean it.
>>>
>>> Paul
>>>
>>

Reply via email to