I tried this on Google also. But, nothing useful. Appreciate any help.

Is there a way to parse specific html tag while doing the crawling with
nutch and then indexing it to solr.

For-example I don't want all html page to go to content node. I would want
to parse h1 h2 tags into separate nodes.



*Vishal Sharma**TL, SFDC*T: +1 650 288 6711
E: [email protected] <[email protected]>
www.grazitti.com [image: Description: LinkedIn]
<http://www.linkedin.com/company/grazitti-interactive>[image: Description:
Twitter] <https://twitter.com/grazitti>[image: fbook]
<https://www.facebook.com/grazitti.interactive>*Zak*Calendar
Salesforce1TM Calendar
App for Teams
<https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3>

Reply via email to