Re: Search term automatically split at non-alphanumeric

Ciprian Dimofte - Opensolr.com Sat, 27 Dec 2025 13:42:24 -0800

Hi Scott,

This is a classic Solr text analysis issue. The default tokenizer (usually 
StandardTokenizer or ClassicTokenizer) treats # as a delimiter, so soul#person 
gets split into two separate tokens: soul and person.


Where to look:
Your field type definition in schema.xml (or managed-schema) - specifically the 
<analyzer> section for your text field.

Options to fix it:
        1.      Use WhitespaceTokenizer - Only splits on whitespace, so 
soul#person stays as a single token. In your field type, change the tokenizer 
to: solr.WhitespaceTokenizerFactory
        2.      Use PatternTokenizer with a custom regex - Gives you 
fine-grained control over what characters split tokens.
        3.      Add a WordDelimiterGraphFilter with specific settings - You can 
configure exactly which characters cause splits. Set splitOnNumerics=“0”, 
splitOnCaseChange=“0”, generateWordParts=“0”, generateNumberParts=“0”, 
catenateWords=“1”, preserveOriginal=“1”.
        4.      Use a MappingCharFilter - Map # to something that won’t cause a 
split before tokenization.
Documentation links:
        ∙       Tokenizers: 
https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html
        ∙       Filters: 
https://solr.apache.org/guide/solr/latest/indexing-guide/filters.html
Important: Whatever you change at index time, you need to apply the same 
analysis at query time, then reindex your data. 

Ciprian

Opensolr SRL
Your Path to Ai Search
https://opensolr.com

> On 27 Dec 2025, at 23:36, Scott Derrick <[email protected]> wrote:
> 
> Hi,
> 
>     I just noticed that when searching for a term that has an embedded 
> non-alphanumeric, the default schema for solr splits it into multiple terms.
> 
>     The example was soul#person, which caused a search for soul or person.  
> The behavior we want would be the equivalent of "soul#person".  We don't want 
> the user to have to enter their search term in quotes .
> 
>     Looking for directions to the specific documentation so I can get this 
> fixed...
> 
> thanks
> 
> Scott

Re: Search term automatically split at non-alphanumeric

Reply via email to