Hello Markus

  Now, I have removed all other protocol-* and given only protocol-selenium.  
Now it crawled few pages.  However, there is no content read from pages.  All 
pages are shown as only with text Home



Thanks and Regards

Raj Chidara








---- On Mon, 30 Jan 2023 18:35:06 +0530 Markus Jelsma 
<markus.jel...@openindex.io> wrote ---



Yes, remove the other protocol-* plugins from the configuration. With all 
three active it is not always determined which one is going to do the work. 
 
Op ma 30 jan. 2023 om 12:50 schreef Raj Chidara 
<mailto:raj.chid...@ddismart.com>: 
 
> 
> Hello Markus 
>   Sorry for duplicate question.  I added selenium plugin in 
> conf/nutch-default.xml and included following 
> 
> <name>plugin.includes</name> 
> 
> <value>protocol-http|protocol-httpclient|protocol-selenium|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>  
> 
> Still the site is not crawling.  Are there any additional steps to be 
> followed for installation of selenium. Please suggest 
> 
> 
> Thanks and Regards 
> 
> Raj Chidara 
> 
> ----- Original Message ----- 
> From: Markus Jelsma (mailto:markus.jel...@openindex.io) 
> Date: 30-01-2023 16:26 
> To: mailto:user@nutch.apache.org 
> Subject: Re: Siet is not crawling 
> 
> Hello Raj, 
> 
> I think the same question about the same site was asked here some time ago. 
> Anyway, this site loads its content via Javascript. You will need a 
> protocol plugin that supports it, either protocol-htmlunit, or 
> protocol-selenium, instead of protocol-http or any other. 
> 
> Change the configuration for plugin.includes, and it should work. 
> 
> Markus 
> 
> Op ma 30 jan. 2023 om 10:39 schreef Raj Chidara 
> <mailto:raj.chid...@ddismart.com 
> >: 
> 
> > 
> > Hello, 
> > 
> >   Nutch is not able crawl this site.  Are there any nutch configuration 
> > changes required for this site? 
> > 
> > https://www.ich.org/ 
> > 
> > 
> > Thanks and Regards 
> > 
> > Raj Chidara 
> > 
> > 
> > 
> 
>

Reply via email to