Hi,

With a > 0 value for http.redirect.max there's a possibility for fetching and 
parsing duplicates, this is especially true for fetch lists with many domains, 
even with just a few (+10) records per domain/host queue.

Assuming there's only one thread per queue, how can we use http.redirect.max 
and prevent fetch and parse of duplicates?

I'm not a big fan of keeping a map of fetched records in memory as it'll blow 
up the heap. We can also not safely remove a record from the fetch queue as 
the queue feeder may not have finished and duplicates may still enter a queue.

Any thoughts?

Thanks,
Markus

Reply via email to