Hi! I'm having unexpected results with the solr.SimplePatternSplitTokenizerFactory. The pattern used is actually from an example in the SOLR documentation and I do not understand where I made a mistake or why it does not work as expected. If we take the example input "operative", the analyzer shows that during indexing, the input gets split into the tokens "ope", "a" and "ive", that is the tokenizer splits at the characters "r" and "t", and not at the expected whitespace characters (CR, TAB). Just to be sure I also tried to use more than one backspace in the pattern (e.g. \t and \\t<file:///\\t>), but this did not change how the input is tokenized during indexing.
What am I missing? SOLR version used is 7.5.0. The definition of the field type in the schema is as follows: <fieldType name="text_custom" class="solr.TextField" positionIncrementGap="100" multiValued="true"> <analyzer type="index"> <tokenizer class="solr.SimplePatternSplitTokenizerFactory" pattern="[ \t\r\n]+"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.SimplePatternSplitTokenizerFactory" pattern="[ \t\r\n]+"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> Many thanks in advance for any help you can provide!