Hello Markus I am receiving of status of 202 for Chinese site and receiving http status 403 for German site and crawling is stopping without crawling single URL
Thanks and Regards Raj Chidara Mobile: +91-7680929509 ----- Original Message ----- From: Markus Jelsma (markus.jel...@openindex.io) Date: 23-11-2022 16:52 To: user@nutch.apache.org Subject: Re: Few websites not crawling Hello, The German site is crawlable, but it does produce awful URLs with some ;jsessionid=<> attached to it. The Chinese site is all Javascript, it requires HtmlUnit or Selenium protocol plugin for it to work at all. No guarantee if it will. Regards, Markus Op wo 23 nov. 2022 om 11:07 schreef Raj Chidara <raj.chid...@ddismart.com>: > > I am not able to crawl these websites. They do not have robots.txt file. > Can any one suggest a solution for this > > https://www.cmde.org.cn/ > > https://www.bfarm.de/EN/Home/_node.html > > > Thanks and Regards > > Raj Chidara > > > >