Yes, remove the other protocol-* plugins from the configuration. With all
three active it is not always determined which one is going to do the work.

Op ma 30 jan. 2023 om 12:50 schreef Raj Chidara <raj.chid...@ddismart.com>:

>
> Hello Markus
>   Sorry for duplicate question.  I added selenium plugin in
> conf/nutch-default.xml and included following
>
> <name>plugin.includes</name>
>
> <value>protocol-http|protocol-httpclient|protocol-selenium|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>
> Still the site is not crawling.  Are there any additional steps to be
> followed for installation of selenium. Please suggest
>
>
> Thanks and Regards
>
> Raj Chidara
>
> ----- Original Message -----
> From: Markus Jelsma (markus.jel...@openindex.io)
> Date: 30-01-2023 16:26
> To: user@nutch.apache.org
> Subject: Re: Siet is not crawling
>
> Hello Raj,
>
> I think the same question about the same site was asked here some time ago.
> Anyway, this site loads its content via Javascript. You will need a
> protocol plugin that supports it, either protocol-htmlunit, or
> protocol-selenium, instead of protocol-http or any other.
>
> Change the configuration for plugin.includes, and it should work.
>
> Markus
>
> Op ma 30 jan. 2023 om 10:39 schreef Raj Chidara <raj.chid...@ddismart.com
> >:
>
> >
> > Hello,
> >
> >   Nutch is not able crawl this site.  Are there any nutch configuration
> > changes required for this site?
> >
> > https://www.ich.org/
> >
> >
> > Thanks and Regards
> >
> > Raj Chidara
> >
> >
> >
>
>

Reply via email to