I'm not sure if this is a bug, but it does break searches that work fine in 4.7.2if we put the same config and index on 4.9.1.
Here's a slightly redacted bit of text that's been sent to the index, and is also used as a phrase query: RRR-COLECCION: COLECCIÓN: Gracita Morales foobar Here are the final positions and terms that 4.7.2 yields for this on query analysis: 1 rrr-coleccion 1 rrr 2 coleccion 2 rrrcoleccion 3 coleccion 4 gracita 5 morales 6 foobar This is what 4.9.1 does with it: 1 rrr-coleccion 2 rrr 2 coleccion 2 rrrcoleccion 3 coleccion 4 gracita 5 morales 6 foobar In both versions, this is what the index analysis generates: 1 rrr 2 coleccion 3 coleccion 4 gracita 5 morales 6 bleh Remember that it's a phrase query. As you can see, only the query analysis from 4.7.2 matches. I'm not an expert, but the 4.9.1 WDF position output seems wrong. The difference in these positions happens on the WordDelimiterFilter step. I going to try my fieldType on the 5.2.1 to example to see what it does, see if maybe the problem has already been fixed. Unfortunately, due to a third-party component that has not been tested with anything newer, I cannot upgrade beyond 4.9.1 at this time. This is the fieldType present in both versions. The 4.7 config has a luceneMatchVersion of LUCENE_47, the 4.9.1 has LUCENE_4_9. <fieldType name="genText" class="solr.TextField" sortMissingLast="true" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.ICUTokenizerFactory" rulefiles="Latn:Latin-break-only-on-whitespace.rbbi"/> <filter class="solr.PatternReplaceFilterFactory" pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2" /> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" preserveOriginal="1" /> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.CJKBigramFilterFactory" outputUnigrams="true"/> <filter class="solr.LengthFilterFactory" min="1" max="512"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.ICUTokenizerFactory" rulefiles="Latn:Latin-break-only-on-whitespace.rbbi"/> <filter class="solr.PatternReplaceFilterFactory" pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2" /> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="0" /> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.CJKBigramFilterFactory" outputUnigrams="false"/> <filter class="solr.LengthFilterFactory" min="1" max="512"/> </analyzer> </fieldType> Thanks, Shawn