Sorry, previous post got sent prematurely.

Here is the complete post:

This is easy if I only reqdefine a custom field to identify the desired
patterns (numbers, in my case)

For example, I could define a field thus:
    <!-- A text field that identifies numberical entities-->
    <fieldType name="text_num" class="solr.TextField" >
      <analyzer>
<tokenizer class="solr.PatternTokenizerFactory"
pattern="\s*[0-9][0-9-]*[0-9]?\s*" group="0"/>
      </analyzer>
    </fieldType>

Input:
hello, world bye 123-45 abcd 5555 sdfssdf --- aaa

Output:
123-45 , 5555

However, I also want to retain the behavior of the default text_general
field , that is recognize the usual text tokens (hello, world, bye etc
...). What is the best way to achieve this.
I've looked at PatternCaptureGroupFilterFactory (
http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.html
) but I suspect that it too is subject to the behavior of the prior
tokenizer (which for text_general is StandardTokenizerFactory ).

Thanks

>
>

Reply via email to