That is an interesting question because you are right, they are supposed to
be immutable.  Dave, is something happening in an 18 hour window as he
describes?

--
Kevin A. McGrail
Asst. Treasurer & VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Tue, Mar 20, 2018 at 2:04 PM, Dave Warren <d...@thedave.ca> wrote:

> When I first set up my mirror, I was told "Because the items are release
> artifacts, they are never altered or removed, just added.", so I configured
> my caching around this design. This does not seem to be the case in
> general, it wasn't just a one-off problem on 1827131.
>
> Looking at the original 1827131 issue, I was suspicious: rsync ran over
> 200 times between when I first got the file, and when rsync later updated
> the file, so it seemed unlikely that an incomplete file somehow slipped
> through. rsync doesn't normally allow this, but even if it did, it should
> have been fixed within 5 minutes, not 18 hours later.
>
> From logs, I can tell that 1827131 and 1827165 were each written, then ~18
> hours later, updated. With 1827165 I was able to capture the original files
> including the .sha1, which rules out any sort of incomplete copy as the
> .sha1 wouldn't match if there was any sort of error or truncation during
> the copy process.
>
> tl;dr:
>
> My concern is this: Is it expected behaviour that the files are first
> published, then later an updated version of the same update number is
> published?
>
> And second: Given that the scores are modified between these two versions,
> what is the impact on SpamAssassin users who obtain the first vs the
> second? There are several score differences, some significant.
>
>
>
> On 2018-03-20 11:27, Kevin A. McGrail wrote:
>
>> Not sure what you mean.  Original ticket was about  1827131.tar.gz
>>
>> --
>> Kevin A. McGrail
>> Asst. Treasurer & VP Fundraising, Apache Software Foundation
>> Chair Emeritus Apache SpamAssassin Project
>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>
>> On Tue, Mar 20, 2018 at 1:10 PM, Dave Warren <d...@thedave.ca> wrote:
>>
>> Interestingly the file really is changing and wasn't just a poorly timed
>>> copy, check this out:
>>>
>>> Date: Mon, 19 Mar 2018 02:38:08 -0600 (MDT), the files were created:
>>>
>>> .d..t...... ./
>>>
>>>> f+++++++++ 1827165.tar.gz
>>>> f+++++++++ 1827165.tar.gz.asc
>>>> f+++++++++ 1827165.tar.gz.sha1
>>>>
>>>
>>> And the .sha1 hash validates (which obviously wouldn't happen if I had an
>>> incomplete copy).
>>>
>>> # sha1sum 1827165.tar.gz;cat 1827165.tar.gz.sha1
>>> a3abb2aad004a3401acfad9167e77b0ca31ef9c4  1827165.tar.gz
>>> a3abb2aad004a3401acfad9167e77b0ca31ef9c4 /usr/local/spamassassin/automc
>>> /tmp/stage/3.4.2/update.tgz
>>>
>>>
>>> Date: Mon, 19 Mar 2018 20:48:17 -0600 (MDT), the files were updated:
>>>
>>> f.st...... 1827165.tar.gz
>>>> f..t...... 1827165.tar.gz.asc
>>>> f.st...... 1827165.tar.gz.sha1
>>>>
>>>
>>> And once again, the .sha1 hash validates the new file:
>>>
>>> # sha1sum 1827165.tar.gz;cat 1827165.tar.gz.sha1
>>> ea74b1eb682bbb25c2028ffe01a8e20bd1943885  1827165.tar.gz
>>> ea74b1eb682bbb25c2028ffe01a8e20bd1943885 /usr/local/spamassassin/automc
>>> /tmp/mkupdate-with-scores/1827165.tar.gz
>>>
>>>
>>> I don't know if any of this is actually a problem, but it's not what I
>>> expected to see.
>>>
>>> If anyone is curious, I placed copies of the files named -first and
>>> -second as appropriate, including uncompressed copies of the .tar.gz
>>> files.
>>> The files are here: https://mirrors.razx.cloud/sa-update-backup/1827165/
>>>
>>> This is curiosity more than anything else at this stage, I will leave my
>>> caching to be less aggressive to allow files to be updated.
>>>
>>>
>>> On 2018-03-19 13:36, Kevin A. McGrail wrote:
>>>
>>> I would guess you caught it mid copy and it arose because of the caching.
>>>> Just a guess but glad we know what's going on.
>>>>
>>>> On Mon, Mar 19, 2018, 15:09 Dave Warren <d...@thedave.ca> wrote:
>>>>
>>>> Howdy. I'm on this list.
>>>>
>>>>>
>>>>> Okay, so this is a bit odd, it looks like the file 1827131.tar.gz was
>>>>> actually modified by rsync many hours after the initial write:
>>>>>
>>>>> Date: Sun, 18 Mar 2018 02:36:30 -0600 (MDT)
>>>>> .d..t...... ./
>>>>>    >f+++++++++ 1827131.tar.gz
>>>>>    >f+++++++++ 1827131.tar.gz.asc
>>>>>    >f+++++++++ 1827131.tar.gz.sha1
>>>>>
>>>>> My cron runs every 5 minutes (with up to 220 seconds variability). I
>>>>> see
>>>>> "MIRROR.CHECK" being updated at 03:18, 04:21, 05:23, 06:18, etc)
>>>>> confirming rsync was running.
>>>>>
>>>>>
>>>>> 1827131.tar.gz is modified just over 18 hours later:
>>>>>
>>>>> Date: Sun, 18 Mar 2018 20:47:39 -0600 (MDT)
>>>>>    >f.st...... 1827131.tar.gz
>>>>>    >f..t...... 1827131.tar.gz.asc
>>>>>    >f.st...... 1827131.tar.gz.sha1
>>>>>
>>>>> I was under the impression that the *.tar.gz* files were immutable, but
>>>>> looking through my rsync logs, this is definitely not the case, I see
>>>>> the files being created and later updated nearly daily (although not
>>>>> every day, March 8th I see 1826189.tar.gz was created and never
>>>>> modified), the only reference to it is here:
>>>>>
>>>>> 8 Mar 2018 19:46:40 -0700 (MST)
>>>>> .d..t...... ./
>>>>>    >f+++++++++ 1826189.tar.gz
>>>>>    >f+++++++++ 1826189.tar.gz.asc
>>>>>    >f+++++++++ 1826189.tar.gz.sha1
>>>>>
>>>>>
>>>>> Due to my belief in the immutable nature of these files, the files were
>>>>> being cached without verifying whether the on-disk source had changed.
>>>>> For the moment, I will cache less aggressively which should resolve the
>>>>> problem.
>>>>>
>>>>>
>>>>> Can anyone confirm why the files are being modified? Is this
>>>>> intentional/expected?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2018-03-19 07:52, Dave Jones wrote:
>>>>>
>>>>> I found an email address in the SA archives from 2013.  Hopefully this
>>>>>> makes it to him.
>>>>>>
>>>>>> On 03/19/2018 08:33 AM, Dave Jones wrote:
>>>>>>
>>>>>> Is Dave Warren on this list?  If no response, does anyone have an old
>>>>>>> email with his contact info so I can ask him how his rsync's are
>>>>>>> setup?
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>> On 03/19/2018 08:26 AM, bugzilla-dae...@bugzilla.spamassassin.org
>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>
>>>>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7566
>>>>>>
>>>>>>>
>>>>>>>> Dave Jones <da...@apache.org> changed:
>>>>>>>>
>>>>>>>>               What    |Removed                     |Added
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>>
>>>>>>> ----------------
>>>>>
>>>>>
>>>>>>                     CC|                            |da...@apache.org
>>>>>>>>
>>>>>>>> --- Comment #2 from Dave Jones <da...@apache.org> ---
>>>>>>>> I guess I can add logic to our hourly script to check sha1 values on
>>>>>>>> the latest
>>>>>>>> tar.gz to catch rsync'ing issues.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to