On Mon, Nov 29, 2010 at 5:30 PM, Jacob Elder <jel...@locamoda.com> wrote:
> The problem is that the field is not guaranteed to contain just a single
> language. I'm looking for some way to pass it first through CJK, then
> Whitespace.
>
> If I'm totally off-target here, is there a recommended way of dealing with
> mixed-language fields?
>

maybe you should consider a tokenizer like StandardTokenizer, that
works reasonably well for most languages.

Reply via email to