Hi, I think I found a critical pagination bug in the ListS3 processor. I just upgraded from 1.28 to 2.8. With 1.28 everything runs well, the problems occurs with 2.8.
The problem is independent from the "Listing Strategy" but depends on the "List Type". The problem occurs only with "List Type = List Objects V1", the "List Objects V2" is working correctly so can be used as a workaround. If you have a bucket with less than 1000 objects (or have restricted the number of objects with a "Prefix" to less than 1000), everything works fine. But if there is a bucket with more than 1000 objects and you use "List Objects V1" ListS3 fetches the first 1000 objects and then gets the first 1000 again and again. The loop never stops. I turned on TRACE level logging, there it's easy to see that it starts after 1000 objects again with the first one and not with the next ones. I assume that the ContinuationToken is not passed correctly to subsequent requests causing infinite "while(isTruncated())" loops for buckets with >1000 objects. I have this problem with 2 totally different AWS buckets. One is a private and the other one the publicly available openalex bucket. I added a test case (see attached JSON). With "List Objects V1" the loops runs indefinitely, if you change to "List Objects V2" you get the correct number of 5997 flowfiles. Possibly you need to add a proxy. Can anyone confirm this problem? How to proceed (I don't have a Jira account). Thanks Martin ------------------------------------------------------------------------------ FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH. Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. Geschäftsführer: Prof. Dr. Wolfram Horstmann. Vorsitzende des Aufsichtsrats: MinR’in Dr. Elise Grauer. FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".
Test.json
Description: Test.json
