Neil,

I'm not aware of this problem for ListS3.  I do not suggest there are no
issues, rather that many users might not notice or have come to accept some
variance in the accuracy of ListS3.  If you can persuade ListS3 to do it
again, that would be great :).

We did recently hear a report of similar behavior in the
similarly-implemented ListGCSBucket processor that does the same list
operation for Google Cloud Storage.  In my brief experience troubleshooting
ListGCSBucket, the issue appears to be that GCS would report different last
modified timestamps in different list API responses, despite what I
believed to be a single write.  I rationalized that as a product of
eventual consistency when write and list operations were taking place
within a few seconds.  That explanation would not make sense with a 10-week
old file.

One outcome of the ListGCSBucket episode was that using a DetectDuplicates
processor after the list processor to check for unique keys can be an
effective workaround.

Thanks,

James

On Wed, Dec 6, 2017 at 11:24 AM, Neil Derraugh <
neil.derra...@intellifylearning.com> wrote:

> I have a slowly changing S3 bucket.  It has about 10 files in it.
>
> Prior to today the bucket's most recently modified file was modified
> on September 15, 2017 2:54:40 PM.
>
> One of the files just got updated today (December 6, 2017 4:58:22 PM) and
> ListS3 emitted it properly.  But It also (re-)emitted that file last
> modified on September 15, 2017 2:54:40 PM.  I checked the etags from
> September and today on the spurious file and they match.  Confusing
> behavior.
>
> Anybody seen anything like this before, or know why it happened?
>
> Thanks,
> Neil
>

Reply via email to