Hi Ayhan, you mean? https://stackoverflow.com/questions/69352136/nutch-does-not-crawl-sites-that-allows-all-crawler-by-robots-txt
Sebastian On 12/13/21 20:59, Ayhan Koyun wrote: > Hi, > > as I wrote before, it seems that I am not the only one who can not crawl all > the seed.txt url's. I couldn't > find a solution really. I collected 450 domains and approximately 200 nutch > will or can not crawl. I want to > know why this happens, is there a solution to force crawling sites? > > It would be great to get a satisfying answer, to know why this happens and > maybe how to solve it. > > Thanks in advance > > Ayhan > >