Re: http.redirect.max and duplicate fetch/parse

Markus Jelsma Tue, 18 Oct 2011 06:01:13 -0700

That sounds creepy indeed. It would still need a similar amount of RAM plus 
network overhead. Would a bloom filter be useful at all? It takes a lot less 
space and i can live with a non-deterministic approach.


On Tuesday 18 October 2011 01:45:20 Sergey A Volkov wrote:
> Hi
> 
> I think some external key-value storage may replace map. They are fast
> enough and overhead will be unsignificant (for many threads)
> But this is very creepy solution.
> 
> Sergey Volkov.
> 
> On Tue 18 Oct 2011 03:15:33 AM MSK, Markus Jelsma wrote:
> > Anyone?
> > 
> >> Hi,
> >> 
> >> With a>  0 value for http.redirect.max there's a possibility for
> >> fetching and parsing duplicates, this is especially true for fetch
> >> lists with many domains, even with just a few (+10) records per
> >> domain/host queue.
> >> 
> >> Assuming there's only one thread per queue, how can we use
> >> http.redirect.max and prevent fetch and parse of duplicates?
> >> 
> >> I'm not a big fan of keeping a map of fetched records in memory as it'll
> >> blow up the heap. We can also not safely remove a record from the fetch
> >> queue as the queue feeder may not have finished and duplicates may still
> >> enter a queue.
> >> 
> >> Any thoughts?
> >> 
> >> Thanks,
> >> Markus

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: http.redirect.max and duplicate fetch/parse

Reply via email to