Re: Nutch not crawling all URLs

Sebastian Nagel Mon, 13 Dec 2021 15:41:33 -0800

Hi Ayhan,

you mean?
https://stackoverflow.com/questions/69352136/nutch-does-not-crawl-sites-that-allows-all-crawler-by-robots-txt


Sebastian

On 12/13/21 20:59, Ayhan Koyun wrote:
> Hi,
> 
> as I wrote before, it seems that I am not the only one who can not crawl all 
> the seed.txt url's. I couldn't
> find a solution really. I collected 450 domains and approximately 200 nutch 
> will or can not crawl. I want to
> know why this happens, is there a solution to force crawling sites?
> 
> It would be great to get a satisfying answer, to know why this happens and 
> maybe how to solve it.
> 
> Thanks in advance
> 
> Ayhan
> 
>

Re: Nutch not crawling all URLs

Reply via email to