Already unsubscribed. Why do I still get this email? Thanks Steven
On Mon, Jan 30, 2023 at 7:06 AM Markus Jelsma <markus.jel...@openindex.io> wrote: > Yes, remove the other protocol-* plugins from the configuration. With all > three active it is not always determined which one is going to do the work. > > Op ma 30 jan. 2023 om 12:50 schreef Raj Chidara <raj.chid...@ddismart.com > >: > > > > > Hello Markus > > Sorry for duplicate question. I added selenium plugin in > > conf/nutch-default.xml and included following > > > > <name>plugin.includes</name> > > > > > <value>protocol-http|protocol-httpclient|protocol-selenium|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value> > > > > Still the site is not crawling. Are there any additional steps to be > > followed for installation of selenium. Please suggest > > > > > > Thanks and Regards > > > > Raj Chidara > > > > ----- Original Message ----- > > From: Markus Jelsma (markus.jel...@openindex.io) > > Date: 30-01-2023 16:26 > > To: user@nutch.apache.org > > Subject: Re: Siet is not crawling > > > > Hello Raj, > > > > I think the same question about the same site was asked here some time > ago. > > Anyway, this site loads its content via Javascript. You will need a > > protocol plugin that supports it, either protocol-htmlunit, or > > protocol-selenium, instead of protocol-http or any other. > > > > Change the configuration for plugin.includes, and it should work. > > > > Markus > > > > Op ma 30 jan. 2023 om 10:39 schreef Raj Chidara < > raj.chid...@ddismart.com > > >: > > > > > > > > Hello, > > > > > > Nutch is not able crawl this site. Are there any nutch configuration > > > changes required for this site? > > > > > > https://www.ich.org/ > > > > > > > > > Thanks and Regards > > > > > > Raj Chidara > > > > > > > > > > > > > >