Hi,

incidentally I looked some weeks ago on the web server access log file of the SpamAssassin rules update files mirror sa-update.fossies.org and found surprisingly that at noon (midday) the log file has a size much more than the roughly expected half of a complete daily log.

Just for curiosity I plotted the number of the GET requests for update files (tarballs) per hour and saw an interesting characteristics with a great peak between 6 and 7 a.m. (GMT+2). Ok, the main reason is probably the publication time (mostly between 5 and 6 a.m. GMT+2) with a delay til the user's sa-update scripts are running. But the structure of the curves with the some curious (?) mimima is a little bit "surprisingly" to me but it is constant and reproducible.

A simple example text plot for a single day is attached (more accurate plots are available under the URL given below).

But more interesting and "irritating" was the fact that I found in the main update time often (at least 100-1000) entries with the HTTP status 404 ("Not Found"). That motivated me to write a primitive script to analyze the reason by monitoring the update status resp. update times of the new published rules update files.

First I checked the local web log files assuming that a 404 request to an update file means that an external client had the information about a new file that the local mirror sa-update.fossies.org has not yet available resp. not yet fetched (via rsync).

Additionally I checked the local DNS server (of the server provider) and the DNS servers I found responsible for the domain spamassassin.org

ns2.pccc.com.
ns2.ena.com.
c.auth-ns.sonic.net.
b.auth-ns.sonic.net.
a.auth-ns.sonic.net.

via the command

dig @<server> 3.3.3.updates.spamassassin.org txt +short

The plots and an extract of the script output you can find under

https://fossies.org/~schleusener/sa-update.mirror_analysis/
 User: sa
 PW: update

The main reason for the 404 errors seems to be that the mirroring script is started as cronjob on sa-update.fossies.org only every 10 minutes.

Probably better would be to check the original nameservers (the local nameserver answers according the TTL only with a freshness delay of max. one hour) and start only a rsync job if the response shows that a new file is available.

If all mirror servers would use update frequencies not smaller than 10 minutes an idea may be also to set/change the DNS TXT entry only 10 minutes after the release (availability) of a new update file.

Additionally I found that the synchronization of the above DNS servers seems delayed by some minutes. The "best" DNS server seems to be "ns2.ena.com" since it always as first one provides the new versions.

Maybe this behaviour is a little bit related to the current thread with the subject "repeated sa-update problems" on the users list.

Looking at the offered data again I found it difficult to read so I compressed them again and added it also to this mail as text attachment.

Another "problem" I found is that some clients downloaded the identical update tarball several times a day (the top IP roughly 300 times). Ok, that is meaningless (a HTTP HEAD or a DNS request would be sufficient)
but it may be bearable.

Regards

Jens

--
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/
Discrepant dates of sa-update file changes (extracted from local web server log
files and different DNS servers)     

Time in seconds til a new rules update file resp. update version is visible on
different DNS servers respectively is requested on the mirror server
"sa-update.fossies.org" with HTTP status code 404 (Not Found) or 200 (Ok).
The monitoring period was 13.-20. September 2018.

Type Source              13.09. 14.09. 15.09. 16.09. 17.09. 18.09. 19.09. 20.09.
==== =================== ====== ====== ====== ====== ====== ====== ====== ======
 DNS ns2.ena.com              0      0      0      0      0      0      0      0
 DNS a.auth-ns.sonic.net    243    393    162    324    497    150    267    289
 DNS b.auth-ns.sonic.net    347    393    150    486    439    451    209     46
 DNS c.auth-ns.sonic.net    382    300     81    324    196    220    510    231
---- ------------------- ------ ------ ------ ------ ------ ------ ------ ------
 DNS local                 2245    300   1704    498   1967     81   1159    405
---- ------------------- ------ ------ ------ ------ ------ ------ ------ ------
 LOG 404 (Not Found)         11     11      0      0     11      0      0      0
 LOG 200 (Ok)               208    393    416    544    208    150    545    358

Since the DNS server ns2.ena.com was always the first one showing a new version
that publication date was defined as start date (zero point) for the shown time
differences.

The "DNS local" values are probably caused by the DNS TTL value of 3600.

The "404 (Not Found)" values are probably strongly correlated to the appearance
of a new version on the DNS server ns2.ena.com (so first clients are immediately
trying to fetch that version from the mirror servers).

The "200 (Ok)" values (new update file on mirror server available) are probably
caused by the rsync cronjob repeat time of 600 seconds.

On the DNS servers (a|b|c).auth-ns.sonic.net the information about new versions
seems delayed by roughly 0-500 seconds.

Reply via email to