Hah, do not be too hard on yourself.  There are like 4 people on the planet
that have really dug into these scripts so I appreciate you working on it.

On Sat, May 25, 2019, 16:09 Paul Stead <paul.st...@gmail.com> wrote:

> I'm chasing my tail here....
>
> OF COURSE files are "disappearing" from the corpus directory, they get
> updated with todays/this weeks content, they don't get renamed/deleted they
> get changed to logs from today - I've been looking in the wrong place.
>
> Looks like corpus-hourly shouldn't be working from the corpus directory
> when re-calculating the class files for previous days but I clearly need to
> have a break and relax
>
>
> Paul
>
> On Sat, 25 May 2019 at 18:05, Paul Stead <paul.st...@gmail.com> wrote:
>
> > The 14:05 run has finished, here's the before and after in terms of
> output
> > on ruleqa (attached)
> >
> > I saw files disappear in the /usr/local/spamassassin/automc/rsync/corpus
> > from 18 May but still can't find the trigger that is removing these
> files.
> >
> > Will come back to this later if no one has any ideas
> >
> > On Sat, 25 May 2019 at 17:54, Paul Stead <paul.st...@gmail.com> wrote:
> >
> >> TLDR;
> >> Any pointers on what might be clearing up the old or "invalid" files in
> >> /usr/local/spamassassin/automc/rsync/corpus?
> >>
> >> ----
> >>
> >> I'm going on the opinion that some function is cleaning up the
> >>
> >> /usr/local/spamassassin/automc/rsync/corpus
> >>
> >> directory underneath the corpus-hourly script - though I've so far been
> >> unable to distinguish what. There seems to be a lot of superfluous
> scripts
> >> hanging around in the svn directories.
> >>
> >> As far as I can tell it isn't the corpus-hourly cron, nor the
> >> /usr/local/bin/checkMasscheckContribs.sh script.
> >>
> >> During my investigations I've noticed that the hourly does seem to take
> >> more than an hour to run, thus two processes can run at the same time
> >>
> >> automc    7749 13.9  0.1  40632 19040 ?        RN   15:05   3:27
> >> /usr/bin/perl -w
> >> /usr/local/spamassassin/automc/svn/masses/rule-qa/corpus-hourly
> >> --dir=/usr/local/spamassassin/automc/rsync/corpus
> >> automc    8708 99.7  0.8 164560 145008 ?       RN   15:09  20:10
> >> /usr/bin/perl -w ./hit-frequencies -TxpaP -o
> >> /usr/local/spamassassin/automc/tmp/spam.log.25383
> >> /usr/local/spamassassin/automc/tmp/ham.log.25383
> >> automc   25383  9.3  0.1  38880 17480 ?        SN   14:05   7:56
> >> /usr/bin/perl -w
> >> /usr/local/spamassassin/automc/svn/masses/rule-qa/corpus-hourly
> >> --dir=/usr/local/spamassassin/automc/rsync/corpus
> >>
> >> I'm not 100% that this is causing a problem, I see some protection
> >> against this for the running files, but I'm not sure about the resulting
> >> class files that are output.
> >>
> >> Paul
> >>
> >> On Sat, 25 May 2019 at 13:00, Paul Stead <paul.st...@gmail.com> wrote:
> >>
> >>> I'm investingating the problem with disappearing corpus - see the bug
> >>> report here -
> >>>
> >>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7715
> >>>
> >>> Whilst that is an issue, I've realised this might not be everything
> >>> involved.
> >>>
> >>> I'm on the system but I can't find the process that is "cleaning" up
> the
> >>> directory at
> >>>
> >>> /usr/local/spamassassin/automc/rsync/corpus
> >>>
> >>> At first I thought it was the hourly script but I don't think this is
> >>> true.
> >>>
> >>> I've checked through cron.d run scripts and just can't seem to find it
> -
> >>> I've a feeling something is deleting logs from the corpus directory
> >>> prematurely, which then stops it being captured during the hourly when
> it
> >>> should - it's a case of < 1 hour.
> >>>
> >>> It's possible this script has code to figure out if it's running at UTC
> >>> or needs an offset similar to the one in the bug.
> >>>
> >>> It seems that the script is aware if it is running a nightly or weekly
> >>> and doesn't run the nightly on a Saturday.
> >>>
> >>> Hope you might have an idea of which script I'm referring to?
> >>>
> >>> I've "fixed" my problem by moving my corpus check to make sure it
> >>> completes after 10:00 UTC - this will like fix everyone's but I'd like
> to
> >>> make sure that when we say mass check after 09:00 UTC we mean it.
> >>>
> >>> Paul
> >>>
> >>
>

Reply via email to