On 2018-03-20 13:27, Dave Jones wrote:
On 03/20/2018 01:19 PM, Kevin A. McGrail wrote:
That is an interesting question because you are right, they are
supposed to
be immutable. Dave, is something happening in an 18 hour window as he
describes?
From what I learned trying to reconstruct everything about 10 months
ago there are 2 updates based on the SVN commit numbers that should
generate 2 sets of files that are immutable. They are from the rule
promotions and the nightly masscheck jobs which are like a tick-tock
working together.
The tick is the rule promotion that has a dependency on 3 days of good
masschecks. It's 72_scores.cf will be behind based on what is currently
in the SVN trunk when the cron job runs. It used to not have a
72_scores.cf in it but I added it into the script logic once I
discovered it was missing. The reason behind this is rules are no
longer distributed with SA so a first sa-update needs to be run. If the
latest ruleset didn't have a 72_scores.cf on a fresh installation, then
that could be really bad/off on scoring until the tock cycle of sa-update.
The tock is the night masscheck jobs that have several dependencies like
enough contributors and number of ham/spam in the corpora. This
generates a new 72_scores.cf based on the garescorer C program.
This is why it takes at least 2 days (tick, tock) for any rule updates
to roll out. If we had more development activity with rules that needed
to go out faster, I would like to get to maybe an 8 hour (2 4-hour runs)
cycle. We just don't have the need at this point in time.
From a rule generation point of view, this makes some sense, thanks.
However, the result currently is that both tick and tock publish
rulesets with the same number (different 72_scores.cf files), so the
reality is that there is no immutable "ruleset 1827165", and which set
you get depends on the time of day sa-update happens to run.
A lot of the scores have changed a fair amount, there are several with a
0.5-1 point difference, and a few larger than that when comparing the
two versions of 1827165.
Are these intended to publish on alternating days? Or if twice a day
(which is the current situation) why isn't the number being incremented
in some fashion?