I tried this on Google also. But, nothing useful. Appreciate any help. Is there a way to parse specific html tag while doing the crawling with nutch and then indexing it to solr.
For-example I don't want all html page to go to content node. I would want to parse h1 h2 tags into separate nodes. *Vishal Sharma**TL, SFDC*T: +1 650 288 6711 E: [email protected] <[email protected]> www.grazitti.com [image: Description: LinkedIn] <http://www.linkedin.com/company/grazitti-interactive>[image: Description: Twitter] <https://twitter.com/grazitti>[image: fbook] <https://www.facebook.com/grazitti.interactive>*Zak*Calendar Salesforce1TM Calendar App for Teams <https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3>

