SolrJ and HTMLStripCharFilterFactory

Indika Tantrigoda Fri, 26 Mar 2010 22:13:55 -0700

Hello to all,

I've been working with Solr for a few weeks and I have gotten indexing and
searching to work.
However I am having trouble with indexing HTML content and using
HTMLStripCharFilterFactory.


My schema.xml looks like this

  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
      ------
  --------/>

and I am indexing the HTML content using SolrJ as the client (with Spring
being the framework).

However when I do a search for all documents, the HTML content is also in my
text field.

But when I did an analysis using the Solr admin panel with HTML content it
shows the tokens extracted
properly with HTML tags removed.

I found a similar issue at
http://www.mail-archive.com/solr-user@lucene.apache.org/msg28736.html
but I am still unable to get it working. I am using Solr 1.4

Any help regarding this is this much appreciated.

Thanks in advance.

Regards,
Indika

SolrJ and HTMLStripCharFilterFactory

Reply via email to