Look at HTMLStripCharFilter, which accepts HTML as its source text, which preserves all the HTML tags in the stored value, but then strips off the HTML tags for tokenization into terms. So, you can search for the actual text terms, but the HTML will still be in the returned field value for highlighting.

See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

-- Jack Krupansky

-----Original Message----- From: Divyanand Tiwari
Sent: Monday, February 18, 2013 7:28 AM
To: solr-user@lucene.apache.org
Subject: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

Hi everyone, i am new to solr technology and not getting a way to get back
the original HTML document with Hits highlighted into it. what
configuration and where i can do to instruct SolrCell/ Tika so that it does
not strips down the tags of HTML document in the content field.

Any support would be greatly appreciated.

A
waiting for your quick

reply..

Thank you!!!
--
Regards,
Divyanand Tiwari

Reply via email to