I'm uploading .htm files to be extracted - some of these files are "include" files that have snippets of HTML rather than fully formed html documents.
solr-cell stores the raw HTML for these items, rather than extracting the text. Is there any way I can get solr to encode this content prior to storing it? At the moment, I have the problem that when the highlighted snippets are retrieved via search, I need to parse the snippet and HTML encode the bits of HTML that where indexed, whilst *not* encoding the bits that where added by the highlighter, which is messy and time consuming. Thanks! Mark,