This seems to make sense from here (as a sysadmin. As a mirror operator I don't care).

On 2018-03-26 14:46, Kevin A. McGrail wrote:
Until we have a good score file, while publish the file anyway? DNS should
stay with the last known good file.

I would generate the tick file some where temporary, then update it with
the tock data and publish/update DNS.

The mirrors shouldn't even see the temporary file.

Thoughts?

KAM

That

--
Kevin A. McGrail
Asst. Treasurer & VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Sun, Mar 25, 2018 at 12:07 PM, Dave Jones <da...@apache.org> wrote:

On 03/20/2018 06:18 PM, Bill Cole wrote:

On 20 Mar 2018, at 16:18, Dave Jones wrote:
[...]

I thought they were different numbers. They should be.  The SVN version
number is shared across all Apache projects using SVN so hours later there
should be a different SVN commit number between the tick and tock.


But mkupdate-with-scores doesn't use the SVN revision number. It uses the
highest revision number in the 9th single-space-delimited token of the
first line of any of the trunk/rulesrc/scores/scores-set* files. That is
more than slightly fragile, but it isn't broken yet.

The goal of this seems to be to get a score set that is evolved using the
current active rule set. From that perspective, it makes sense to tag the
later update bundle with revision number of the active rule set. It seems
less sensible to be releasing the earlier (run-nightly) update bundle at
all, since it is essentially an interim state between logically coherent
set of rules and scores.

Which raises the issue of whether anyone has ever tried a rigorous
comparison between the GA and Perceptron rescoring models. If the reason
for publishing an interim update and a revised one after an 18hr pause gap
is how long GA is taking to run, we may want to examine how much accuracy
we are really buying with GA.


I looked into this more this morning and now I am not sure I want to
change anything without other's consensus.  A year or so ago, we didn't
have consistent masscheck updates like we have the past 8+ months so only
the "tick" would happen regularly for basic rule updates with the same/last
72_scores.cf.  Now that we have a consistent "tock" this issue has been
brought to light.

I could remove the 72_scores.cf from the "tick" like it was before but
that still leaves a hole for fresh installs of SA that run an sa-update to
receive the first ruleset that may not contain a 72_scores.cf.  If
another sa-update is never run, then that SA instance could have some very
odd scoring with many rules defaulting to 1.0.

Thoughts?  Leave it as is until we can examine the GA rescorer? I
understand we have the same ruleset updating twice a day now but it seems
to have been OK the past 8+ months.

Dave




Reply via email to