Interesting. First, an apology for an error in my e-book - it says that the enablePositionIncrements parameter for the stop filter defaults to "false", but it actually defaults to "true". The question mark represents a "position increment". In your case you don't want position increments, so add the enablePositionIncrements="false" parameter to the stop filter, and be sure to reindex your data. The position increment leaves a "hole" where each stop word was removed. The question mark represents the hole. All bets are off as to what phrase query does when the phrase starts with a hole. I think the basic idea is that there must be some term in the index at that position that can be "skipped".

This is actually a change in behavior, which occurred as a side effect of LUCENE-4963 in 4.4. The default for enablePositionIncrements was false, but that release changed it to true.

I suspect that I wrote that section of my e-book before 4.4 came out. Unfortunately, the change is not well documented - nothing in the Javadoc, and this is another example of where an underlying change in Lucene that impacts Solr users is not well highlighted for Solr users. Sorry about that.

In any case, try adding enablePositionIncrements="false", reindex, and see what happens.

-- Jack Krupansky

-----Original Message----- From: heaven
Sent: Monday, August 25, 2014 3:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Help with StopFilterFactory

A valid search:
http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
An Invalid search:
http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww

What weird I found is that the valid query has:
"parsedquery_toString": "+(url_words_ngram:\"twitter com zer0sleep\")"
And the invalid one has:
"parsedquery_toString": "+(url_words_ngram:\"? twitter com zer0sleep\")"

So "https" part was replaced with a "?".



--
View this message in context: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839p4154957.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to