Solr 4.4, enablePositionIncrements=true and PhraseQueries

Ronald K. Braun Wed, 21 Aug 2013 13:38:49 -0700

Hello,

I'm working on an upgrade from solr 1.4.1 to 4.4.  One of my field
analyzers uses StopWordFilter, which as of 4.4 is forbidden to set
enablePositionIncrements to false.  As a consequence, some hand-constructed
phrase queries (basically generated via calls to
SolrPluginUtils.parseQueryStrings on field:value text snippets) seem to now
be failing relative to 1.4.1 because (I think) of the created "gaps" in
phrase query content.


By way of example, I have indexed text of the form "Old Ones" and query
text of the form "The Old Ones".  Debug output shows my phrase query being
generated as field:"? Old Ones" and that seems to not match indexed source
text of "Old Ones", presumably since there is no initial token to "fill the
gap".

With positionIncrements set to false (tested by setting LUCENE_43
temporarily in solrconfig) to bypass the forced 4.4 restriction, it does
what I expect (and what 1.4.1 does) in just outright ignoring the stop
words with a generated query of field:"Old Ones" that matches my source
text.

Is there a way to configure phrase queries to ignore gaps, or otherwise
ignore positioning information for missing/removed tokens?  Fiddling with
slops is not a viable option -- I need exact sequential matching on my
token sequences apart from stopword presence.  A workaround that occurred
was perhaps adding a position normalizer filter that resets the term
positions to sequential, but I'm hoping there may be some other
configuration option to restore backwards-compatible phrase matching given
the neutering of enablePositionIncrements.

Thanks!

Ron

Solr 4.4, enablePositionIncrements=true and PhraseQueries

Reply via email to