Hi All,

I am trying to parse some text that contains embedded HTML elements and am
getting the following error:

FATAL: Solr returned an error #400 Unexpected close tag </PROD_DESC>;
expected </br>.

My set up is as follows:

schema.xml

<fieldType name="html_text" class="solr.TextField" indexed="true">
      <analyzer>
       <charFilter class="solr.HTMLStripCharFilterFactory"/>
       <tokenizer class="solr.StandardTokenizerFactory"/>
      </analyzer>
</fieldType>

<field name="PROD_DESC" type="html_text" indexed="false" stored="true"/>
<field name="DOCUMENTID" type="string" indexed="true" stored="true"
required="true"/>

XML snippet:
<PRODUCT><DOCUMENTID>1</DOCUMENTID><PROD_DESC>Bose's best bookshelf speakers
are updated to provide an even more spacious, natural listening experience.
They're great for stereo, or as a front- or rear-channel solution for home
theater.<br><br>Learn more about Bose products and proprietary technologies
in our Bose Store.</PROD_DESC></PRODUCT>

According to the documentation the <br> should be removed correctly.

Anything I am missing?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTML-Indexing-error-tp3918174p3918174.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to