How to parse specific html tag in nutch+solr while crawling

Vishal Sharma Thu, 27 Nov 2014 08:59:31 -0800

I tried this on Google also. But, nothing useful. Appreciate any help.

Is there a way to parse specific html tag while doing the crawling with
nutch and then indexing it to solr.


For-example I don't want all html page to go to content node. I would want
to parse h1 h2 tags into separate nodes.



*Vishal Sharma**TL, SFDC*T: +1 650 288 6711
E: [email protected] <[email protected]>
www.grazitti.com [image: Description: LinkedIn]
<http://www.linkedin.com/company/grazitti-interactive>[image: Description:
Twitter] <https://twitter.com/grazitti>[image: fbook]
<https://www.facebook.com/grazitti.interactive>*Zak*Calendar
Salesforce1TM Calendar
App for Teams
<https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3>

How to parse specific html tag in nutch+solr while crawling

Reply via email to