you can actually plug in customized grammars and stuff like that, but
the simplest approach is to configure mappingcharfilter before your
tokenizer,
with mappings like: "c++" => "cplusplus"

On Tue, Apr 10, 2012 at 11:50 AM, Demian Katz <demian.k...@villanova.edu> wrote:
> It has been brought to my attention that ICUTokenizerFactory drops tokens 
> like the ++ in "The C++ Programming Language."  Is there any way to persuade 
> it to preserve these types of tokens?
>
> thanks,
> Demian



-- 
lucidimagination.com

Reply via email to