Sorry, previous post got sent prematurely. Here is the complete post:
This is easy if I only reqdefine a custom field to identify the desired patterns (numbers, in my case) For example, I could define a field thus: <!-- A text field that identifies numberical entities--> <fieldType name="text_num" class="solr.TextField" > <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="\s*[0-9][0-9-]*[0-9]?\s*" group="0"/> </analyzer> </fieldType> Input: hello, world bye 123-45 abcd 5555 sdfssdf --- aaa Output: 123-45 , 5555 However, I also want to retain the behavior of the default text_general field , that is recognize the usual text tokens (hello, world, bye etc ...). What is the best way to achieve this. I've looked at PatternCaptureGroupFilterFactory ( http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.html ) but I suspect that it too is subject to the behavior of the prior tokenizer (which for text_general is StandardTokenizerFactory ). Thanks > >