HTML encode extracted docs

Mark Roberts Mon, 08 Mar 2010 05:50:46 -0800

I'm uploading .htm files to be extracted - some of these files are "include" 
files that have snippets of HTML rather than fully formed html documents.


solr-cell stores the raw HTML for these items, rather than extracting the text. 
Is there any way I can get solr to encode this content prior to storing it?

At the moment, I have the problem that when the highlighted snippets are  
retrieved via search, I need to parse the snippet and HTML encode the bits of 
HTML that where indexed, whilst *not* encoding the bits that where added by the 
highlighter, which is messy and time consuming.

Thanks! Mark,

HTML encode extracted docs

Reply via email to