Hi, I remember reading in this list a while ago that Solr will only tokenize on whitespace even when using CJKAnalyzer. That would make Solr unusable on Chinese or any other languages that don't use whitespace as separator.
1) I remember reading about a workaround. Unfortunately I can't find the post that mentioned it. Could someone give me pointers on how to address this issue? 2) Let's say I have fixed this issue and have properly analyzed and indexed the Chinese documents. My documents are in multiple languages. I plan to use separate fields for documents in different languages: text_en, text_zh, text_ja, text_fr, etc. Each field will be associated with the appropriate analyzer. My problem now is how to deal with the query string. I don't know what language the query is in, so I won't be able to select the appropriate analyzer for the query string. If I just use the standard analyzer on the query string, any query that's in Chinese won't be tokenized correctly. So would the whole system still work in this case? This must be a pretty common use case, handling multi-language search. What is the recommended way of dealing with this problem? Thanks. Andy