Re: High Capacity (Distributed) Crawler

2003-06-10 Thread Leo Galambos
Otis Gospodnetic wrote: What interface do you need for Lucene? Will you use PUSH (=the robot will modify Lucene's index) or PULL (=the engine will get deltas from the robot) mode? Tell me what you need and I will try to do all my best. I'd imagine one would want to use it in the PUSH mode (

Re: High Capacity (Distributed) Crawler

2003-06-10 Thread Otis Gospodnetic
Leo, > The first beta is done (without NIO). It needs, however, further > testing. Unfortunatelly, I could not find enough servers which I may > hit. Nice. Pretty much any site is a candidate, as long as you are nice to it. You could, for instance, hit all dmoz URLs. Or you could extract a set

Re: High Capacity (Distributed) Crawler

2003-06-09 Thread Leo Galambos
Hi Otis. The first beta is done (without NIO). It needs, however, further testing. Unfortunatelly, I could not find enough servers which I may hit. I wanted to commit the robot as a part of egothor (it will use it in PULL mode), but we have a nice weather here, so I lost any motivation to play

Re: High Capacity (Distributed) Crawler

2003-06-09 Thread Otis Gospodnetic
Leo, Have you started this project? Where is it hosted? It would be nice to see a few alternative implementations of a robust and scalable java web crawler with the ability to index whatever it fetches. Thanks, Otis --- Leo Galambos <[EMAIL PROTECTED]> wrote: > Hi. > > I would like to write $S