Re: Tokenize a dictionary of phrases

2011-08-22 Thread Erick Erickson
Hmmm, would it work for your case to use Synonyms? If you set expand=false and in your synonyms file have: quick brown => quickbrown it might do what you want Best Erick On Sun, Aug 21, 2011 at 3:53 PM, Xiyang Chen wrote: > Hi, > > I have a dictionary of multi-word phrases and I'd like to

Re: Tokenize a dictionary of phrases

2011-08-21 Thread govind bhardwaj
Hi Xlyang, You should use KeywordAnalyzer() as it treats the entire string (multi-word phrase in your case) as it is without splitting the constituent words. Thanks, Govind On Mon, Aug 22, 2011 at 1:23 AM, Xiyang Chen wrote: > Hi, > > I have a dictionary of multi-word phrases and I'd like to a

Tokenize a dictionary of phrases

2011-08-21 Thread Xiyang Chen
Hi, I have a dictionary of multi-word phrases and I'd like to analyze documents such that anything that appears in the dictionary will be treated as one single token. For example, if the dictionary contains "brown fox", then the sentence The quick brown fox jumps over the lazy dog. Will be tok