On Thu, May 12, 2011 at 2:24 PM, webdev1977 <[email protected]> wrote:
> I was saying that based on what the previous poster stated.  Also the fact
> that I have read through quite a bit of posts stating that the problem with
> crawling in a vertical environment has to do with the way fetcher2 was
> built.  The fetches are grouped by domain name and if you have a lot of urls
> from the same domain then you are not able to do quick mapreduce jobs.
>
It is true that the more domains you have, the faster you will be able
to crawl. But I don't agree that this has anything to do with the way
the fetching is done. It's true for any fetcher:
If you need to crawl ~1000 domains or so (fairly typical for a
vertical search), you should be able to run quite a few threads
partitioned across all of them without having to allow more than five
concurrent threads per host.

Reply via email to