Re: ICUTokenizer or StandardTokenizer or ??? for "text_all" type field that might include non-whitespace langs

Alexandre Rafalovitch Wed, 18 Jun 2014 18:36:06 -0700

I don't think the text_all field would work too well for multilingual
setup. Any reason you cannot use edismax to search over a bunch of
fields instead?


Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Jun 19, 2014 at 8:31 AM, Allison, Timothy B. <talli...@mitre.org> wrote:
> All,
>
> In one index I’m working with, the setup is the typical langid mapping to 
> language specific fields.  There is also a text_all field that everything is 
> copied to.  The documents can contain a wide variety of languages including 
> non-whitespace languages.  We’ll be using the ICUTokenFilter in the analysis 
> chain, but what should we use for the tokenizer for the “text_all” field?  My 
> inclination is to go with the ICUTokenizer.  Are there any reasons to prefer 
> the StandardTokenizer or another tokenizer for this field?
>
> Thank you.
>
>        Best,
>
>               Tim

Re: ICUTokenizer or StandardTokenizer or ??? for "text_all" type field that might include non-whitespace langs

Reply via email to