Tokenization at query time

Andrea Gazzarini Mon, 12 Aug 2013 02:14:50 -0700

Hi all,
I have a field (among others)in my schema defined like this:


<fieldtype name="mytype" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.*KeywordTokenizerFactory*" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory"
            generateWordParts="0"
            generateNumberParts="0"
            catenateWords="0"
            catenateNumbers="0"
            catenateAll="1"
            splitOnCaseChange="0" />
    </analyzer>
</fieldtype>

<field name="myfield" type="mytype" indexed="true"/>

Basically, both at index and query time the field value is normalizedlike this.


Mag. 778 G 69 => mag778g69

Now, in my solrconfig I'm using a search handler like this:

<requestHandler ....>
    ...
    <str name="defType">dismax</str>
    ...
    <str name="mm">100%</str>
    <str name="qf">myfield^3000</str>
    <str name="pf">myfield^30000</str>

</requestHandler>

What I'm expecting is that if I index a document with a value for myfield "Mag. 778 G 69", I will be able to get this document by querying


1. Mag. 778 G 69
2. mag 778 g69
3. mag778g69

But that doesn't wotk: i'm able to get the document only and if only Iuse the "normalized2 form: mag778g69

After doing a little bit of debug, I see that, even I used aKeywordTokenizer in my field type declaration, SOLR is doing soemthignlike this:

// +((DisjunctionMaxQuery((//myfield://*mag*//^3000.0)~0.1)DisjunctionMaxQuery((//myfield://*778*//^3000.0)~0.1)DisjunctionMaxQuery((//myfield://*g*//^3000.0)~0.1)DisjunctionMaxQuery((//myfield://*69*//^3000.0)~0.1))~4)DisjunctionMaxQuery((//myfield://*mag778g69*//^30000.0)~0.1)/

That is, it is tokenizing the original query string (mag + 778 + g + 69)and obviously querying the field for separate tokens doesn't matchanything (at least this is what I think)


Does anybody could please explain me that?

Thanks in advance
Andrea

Tokenization at query time

Reply via email to