Hi All, This is likely a rudimentary question, but I can’t seem to find a straight-forward answer on forums or the documentation…is there a way to protect tokens from ANY analysis? I know things like the KeywordMarkerFilterFactory protect tokens from stemming, but we have some terms we don’t even want our tokenizer to touch. Mostly, these are IBM-specific acronyms, such as IT:ibm. In this case, we would want to maintain the colon and the capitalization (otherwise “it” would be taken out as a stopword).
Any advice is appreciated! Thank you, Audrey -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com