Re: multi-language searching with Solr

Gereon Steffens Wed, 07 May 2008 12:46:47 -0700

I have the same requirement, and from what I understand the distributedsearch feature will help implementing this, by having one shard perlanguage. Am I right?


Gereon



Mike Klaas wrote:

On 5-May-08, at 1:28 PM, Eli K wrote:
Wouldn't this impact both indexing and search performance and the size
of the index?
It is also probable that I will have more then one free text fields
later on and with at least 20 languages this approach does not seem
very manageable.  Are there other options for making this work with
stemming?
If you want stemming, then you have to execute one query per languageanyway, since the stemming will be different in every language.
This is a fundamental requirement: you somehow need to track thelanguage of every token if you want correct multi-language stemming.The easiest way to do this would be to split each language into its ownfield. But there are other options: you could prefix every indexedtoken with the language:
en:The en:quick en:brown en:fox en:jumped ...
fr:Le fr:brun fr:renard fr:vite fr:a fr:sauté ...

Separate fields seems easier to me, though.

-Mike

Re: multi-language searching with Solr

Reply via email to