[ https://issues.apache.org/jira/browse/NUTCH-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche resolved NUTCH-1458. ---------------------------------- Resolution: Duplicate > Support for raw HTML field added to Solr > ---------------------------------------- > > Key: NUTCH-1458 > URL: https://issues.apache.org/jira/browse/NUTCH-1458 > Project: Nutch > Issue Type: New Feature > Components: indexer, parser > Affects Versions: 1.5.1 > Reporter: Max Dzyuba > Labels: html, nutch, raw, solr > Fix For: 1.9 > > > At the moment, the “content” field holds only the parsed text from the page. > It would be nice to have a separate field in Solr document that would hold > raw HTML from the crawled page. -- This message was sent by Atlassian JIRA (v6.2#6252)