Modify the StandardTokenizerFactory to concatenate all words

2013-11-05 Thread Kevin
Currently I'm using StandardTokenizerFactory which tokenizes the words bases on spaces. For Toy Story it will create tokens toy and story. Ideally, I would want to extend the functionality ofStandardTokenizerFactory to create tokens toy, story, and toy story. How do I do that?

Re: Modify the StandardTokenizerFactory to concatenate all words

2013-11-05 Thread Benson Margulies
How would you expect to recognize that 'Toy Story' is a thing? On Tue, Nov 5, 2013 at 6:32 PM, Kevin glidekensing...@gmail.com wrote: Currently I'm using StandardTokenizerFactory which tokenizes the words bases on spaces. For Toy Story it will create tokens toy and story. Ideally, I would