I think you will have to write an UpdateProcessor to strip out html tags.

http://wiki.apache.org/solr/UpdateRequestProcessor

As per Solr 4.0 you can also use scripting languages like Python, Ruby and Javascript to write scripts for use as updateprocessors too.

-----Mensagem Original----- From: Pratyul Kapoor
Sent: Friday, October 26, 2012 3:56 AM
To: solr-user@lucene.apache.org
Subject: Filtering HTML content in Solr 4.0.0

Hi,

I am using Solr 4.0.0. I have a HTML content as description of a product.
If I index it without any filtering it is giving errors on search.
How can I filter an HTML content.

Pratyul

Reply via email to