Hello Raj, This site loads its content via Javascript, so you need a protocol plugin that supports it. HtmlUnit does not seem to work with this site, but Selenium does. Please change your protocol plugin accordingly in you plugin.includes configuration directive.
I tested it with our own parser as i have no Nutch here at the moment. But it has support for Selenium so it should work, even though the version is a bit outdated. Regards, Markus Op za 17 dec. 2022 om 10:28 schreef Raj Chidara <raj.chid...@ddismart.com>: > > Hi > I am not able to crawl this site https://www.ich.org/. Can any one > suggest a solution for this. This site does not has robots.txt file. When > I try to check robots.txt, site is shown as under construction and > returning response status 200. Could it be any reason for issue? > > > > Thanks and Regards > > Raj Chidara > > > > > > >