On Mon, Nov 29, 2010 at 5:30 PM, Jacob Elder <jel...@locamoda.com> wrote: > The problem is that the field is not guaranteed to contain just a single > language. I'm looking for some way to pass it first through CJK, then > Whitespace. > > If I'm totally off-target here, is there a recommended way of dealing with > mixed-language fields? >
maybe you should consider a tokenizer like StandardTokenizer, that works reasonably well for most languages.