The HTMLStripCharFilter will strip the html for the *indexed* terms,
it does not effect the *stored* field.
If you don't want html in the stored field, can you just strip it out
before passing to solr?
On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:
Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?
SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example <center>content</center> if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
<add><doc boost="1.0"><field name="id">http://haha.com</field><field
name="text"><center>content</center></field></doc></add>
Any help is highly appreciated. Thanks.
--
Aseem