PatternTokenizer question

j philoon Tue, 24 Nov 2009 07:16:03 -0800

I have defined a comma-delimited pattern tokenizer as follows:
    <fieldType name="text_comma" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.PatternTokenizerFactory" pattern=","/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>


<field name="commafld" type="text_comma" indexed="true" stored="true"/>

This appears to work fine when adding documents, since if I add a field
commafld as "word1,WORD2,word 3" I see terms in the index as expected:
"word1", "word2", and "word 3".

When I query, I am expecting that the same tokenization would take place, so
a query that has 'commafld:(word 3)' would match term "word 3".  However, I
find I have to submit the query as 'commafld:("word 3")'.  That is, it seems
as if whitespace tokenization is taking place, not the comma-delimited
tokenization.

Am I misunderstanding what should be happening or making some basic mistake? 
Thanks. 
-- 
View this message in context: 
http://old.nabble.com/PatternTokenizer-question-tp26497675p26497675.html
Sent from the Solr - User mailing list archive at Nabble.com.

PatternTokenizer question

Reply via email to