I don't think the text_all field would work too well for multilingual setup. Any reason you cannot use edismax to search over a bunch of fields instead?
Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Jun 19, 2014 at 8:31 AM, Allison, Timothy B. <talli...@mitre.org> wrote: > All, > > In one index I’m working with, the setup is the typical langid mapping to > language specific fields. There is also a text_all field that everything is > copied to. The documents can contain a wide variety of languages including > non-whitespace languages. We’ll be using the ICUTokenFilter in the analysis > chain, but what should we use for the tokenizer for the “text_all” field? My > inclination is to go with the ICUTokenizer. Are there any reasons to prefer > the StandardTokenizer or another tokenizer for this field? > > Thank you. > > Best, > > Tim