KeywordTokenizerFactory with whitespace

Aleksander Akerø Wed, 29 Jan 2014 05:50:22 -0800

Hi

According to solr documentation the solr.KeywordTokenizerFactory should not
do any tokenizing at all, but to me it seems to be splitting on whitespace
e.g. space.


For example i have the value "FE 009" stored in the index to the field
"number", and what i search for is the exact same string "FE 009" (without
quotes). But it will return results like "EE 009", "ED 009" and similar
ones. Why is that?

I'm using the extended DisMax query parser, and "number" is the only
defined field in the qf parameter.


I want exact matches, but need to ignore case. Hence the use of
"solr.LowerCaseFilterFactory", and why I not use the default "string"
fieldType.

This is the fieldType definition:
*        <fieldType name="keyword" class="solr.TextField"
positionIncrementGap="100">*
*            <analyzer type="index">*
*                <tokenizer class="solr.KeywordTokenizerFactory"/>*
*                <filter class="solr.LowerCaseFilterFactory"/>*
*            </analyzer>*
*<analyzer type="index">*
*                <tokenizer class="solr.KeywordTokenizerFactory"/>*
*                <filter class="solr.LowerCaseFilterFactory"/>*
*            </analyzer>*
*        </fieldType>*

and this the field:
*        <field name="number" type="keyword" indexed="true" stored="true"
required="false" />*

Later if I get this to work I would also like to add the
"solr.EdgeNGramFilterFactory" to add trailing or leading wildcard matches.
E.g. return "FE 009-1", "FE 009-2" as well as "FE 009" when searching for
"FE 009". Would this be a way to do it?

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no

KeywordTokenizerFactory with whitespace

Reply via email to