Hi All,

This is likely a rudimentary question, but I can’t seem to find a 
straight-forward answer on forums or the documentation…is there a way to 
protect tokens from ANY analysis? I know things like the 
KeywordMarkerFilterFactory protect tokens from stemming, but we have some terms 
we don’t even want our tokenizer to touch. Mostly, these are IBM-specific 
acronyms, such as IT:ibm. In this case, we would want to maintain the colon and 
the capitalization (otherwise “it” would be taken out as a stopword).

Any advice is appreciated!

Thank you,
Audrey

--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com

Reply via email to