you can actually plug in customized grammars and stuff like that, but the simplest approach is to configure mappingcharfilter before your tokenizer, with mappings like: "c++" => "cplusplus"
On Tue, Apr 10, 2012 at 11:50 AM, Demian Katz <demian.k...@villanova.edu> wrote: > It has been brought to my attention that ICUTokenizerFactory drops tokens > like the ++ in "The C++ Programming Language." Is there any way to persuade > it to preserve these types of tokens? > > thanks, > Demian -- lucidimagination.com