Hello Markus
  Sorry for duplicate question.  I added selenium plugin in 
conf/nutch-default.xml and included following

<name>plugin.includes</name>
  
<value>protocol-http|protocol-httpclient|protocol-selenium|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>

Still the site is not crawling.  Are there any additional steps to be followed 
for installation of selenium. Please suggest


Thanks and Regards

Raj Chidara

----- Original Message -----
From: Markus Jelsma (markus.jel...@openindex.io)
Date: 30-01-2023 16:26
To: user@nutch.apache.org
Subject: Re: Siet is not crawling

Hello Raj,

I think the same question about the same site was asked here some time ago.
Anyway, this site loads its content via Javascript. You will need a
protocol plugin that supports it, either protocol-htmlunit, or
protocol-selenium, instead of protocol-http or any other.

Change the configuration for plugin.includes, and it should work.

Markus

Op ma 30 jan. 2023 om 10:39 schreef Raj Chidara <raj.chid...@ddismart.com>:

>
> Hello,
>
>   Nutch is not able crawl this site.  Are there any nutch configuration
> changes required for this site?
>
> https://www.ich.org/
>
>
> Thanks and Regards
>
> Raj Chidara
>
>
>

Reply via email to