Re: multi-language searching with Solr

Mike Klaas Wed, 07 May 2008 15:51:26 -0700

I don't really see how that would help, no. All the benefits fromusing separate indices would be gained by using one field perlanguage, ISTM.

By the way, there are tools available that make field-per-languagestuff much easier, especially if there are many fields. By usingdynamic fields, you don't have to explicitly declare all fields:


<dynamicField name="*_fr" type="text_fr" ... />

-Mike

On 7-May-08, at 12:46 PM, Gereon Steffens wrote:

I have the same requirement, and from what I understand thedistributed search feature will help implementing this, by havingone shard per language. Am I right?
Gereon


Mike Klaas wrote:
On 5-May-08, at 1:28 PM, Eli K wrote:
Wouldn't this impact both indexing and search performance and thesize
of the index?
It is also probable that I will have more then one free text fields
later on and with at least 20 languages this approach does not seem
very manageable.  Are there other options for making this work with
stemming?
If you want stemming, then you have to execute one query perlanguage anyway, since the stemming will be different in everylanguage.This is a fundamental requirement: you somehow need to track thelanguage of every token if you want correct multi-languagestemming. The easiest way to do this would be to split eachlanguage into its own field. But there are other options: youcould prefix every indexed token with the language:
en:The en:quick en:brown en:fox en:jumped ...
fr:Le fr:brun fr:renard fr:vite fr:a fr:sauté ...
Separate fields seems easier to me, though.
-Mike

Re: multi-language searching with Solr

Reply via email to