On 5/2/23 13:16, Bill Tantzen wrote:
This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions: Periods (dots) that are not followed by whitespace are kept as part of the token, including Internet domain names.
I checked on a dev version (9.3.0-SNAPSHOT) and StandardTokenizer does indeed do exactly what the docs say.
The analysis definition in the fieldType probably has things beyond the StandardTokenizer, one or more filters that DO break up terms on a period.
Thanks, Shawn
