Hi, please help me figure out what's going on. I have the next field type:

<fieldType name="words_ngram" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\d\w]+" />
    <filter class="solr.StopFilterFactory" words="url_stopwords.txt"
ignoreCase="true" />
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="20" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\d\w]+" />
    <filter class="solr.StopFilterFactory" words="url_stopwords.txt"
ignoreCase="true" />
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

And the next string indexed:
http://plus.google.com/111950520904110959061/profile

Here is what the analyzer shows:
http://img607.imageshack.us/img607/5074/fn1.png

Then I do the next query:
fq=type:Site&
sort=score desc&
q=https\\:\\/\\/plus.google.com\\/111950520904110959061\\/profile&
fl=* score&
qf=url_words_ngram&
defType=edismax&
start=0&
rows=20&
mm=1

And have no results.

These queries do match:
1. https://plus.google
2. https://plus.google.com
3. 11195052090

And these do not:
1. https://plus.google.com/111950520904110959061/profile
2. 111950520904110959061/profile
3. 111950520904110959061

The reason is that "111950520904110959061" length is 21 when I have max gram
size set to 20. Tried to increase max gram size to 200 and it works, but is
there any way to match given query without doing that? The query analyzer
show there are exact matches at PT, SF and LCF or does it work that way so
in index we have only the output from the last filter factory (ENGTF in my
example)? If so, is there an option to preserve the original tokens also?

So that for maxGramSize="5" and indexed string awesomeness I'd have:
"a", "aw", "awe", "awes", "aweso", "awesomeness"

Best,
Alex



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-tp4086967.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to