The char filter is not broken. There's a bug in 4.7 in the analysis UI:

https://issues.apache.org/jira/browse/SOLR-5800

It was unclear to me if it would be part of a 4.7.1 release. I hope so,
as it'll probably save people a lot of time from thinking their
analyzers are broken.


Sent from my Windows Phone From: Dmitry Kan
Sent: 3/15/2014 1:58 PM
To: solr-user@lucene.apache.org
Subject: [solr 4.7.0] analysis page: issue with
HTMLStripCharFilterFactory
Hello,

The following type does not get analyzed properly on the solr 4.7.0
analysis page:

    <fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
<!-- <tokenizer class="solr.WhitespaceTokenizerFactory"/> -->
<tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

Example text:
fox jumps

Screenshot:
http://pbrd.co/1lEVEIa

This works fine in solr 4.6.1.

-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan

Reply via email to