That sounds creepy indeed. It would still need a similar amount of RAM plus network overhead. Would a bloom filter be useful at all? It takes a lot less space and i can live with a non-deterministic approach.
On Tuesday 18 October 2011 01:45:20 Sergey A Volkov wrote: > Hi > > I think some external key-value storage may replace map. They are fast > enough and overhead will be unsignificant (for many threads) > But this is very creepy solution. > > Sergey Volkov. > > On Tue 18 Oct 2011 03:15:33 AM MSK, Markus Jelsma wrote: > > Anyone? > > > >> Hi, > >> > >> With a> 0 value for http.redirect.max there's a possibility for > >> fetching and parsing duplicates, this is especially true for fetch > >> lists with many domains, even with just a few (+10) records per > >> domain/host queue. > >> > >> Assuming there's only one thread per queue, how can we use > >> http.redirect.max and prevent fetch and parse of duplicates? > >> > >> I'm not a big fan of keeping a map of fetched records in memory as it'll > >> blow up the heap. We can also not safely remove a record from the fetch > >> queue as the queue feeder may not have finished and duplicates may still > >> enter a queue. > >> > >> Any thoughts? > >> > >> Thanks, > >> Markus -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

