Hi, Jörg.

At the Tibetan Himalayan Digital Library, we are working with XML files that have fields that might be in Tibetan, Chinese, Nepalese, or English. Our solr schema.xml file looks like this:

<dynamicField name="*_eng" type="string" indexed="true" stored="true" multiValued="true"/> <dynamicField name="*_chi" type="string" indexed="true" stored="true" multiValued="true"/> <dynamicField name="*_tib" type="string" indexed="true" stored="true" multiValued="true"/> <dynamicField name="*_nep" type="string" indexed="true" stored="true" multiValued="true"/>

I run all of our XML data through a XSL transformation that puts it in solr indexable form and also figures out what language a field is in and gives it an appropriate name, e.g., "location_eng" or "formalname_tib". So far this is working very well for us.

Currently, we are assigning all fields, no matter what language to type string, defined as

<fieldtype name="string" class="solr.StrField" sortMissingLast="true"/>

This does string matching very well, but doesn't do any stop words, or stemming, or anything fancy. We are toying with the idea of a custom Tibetan indexer to better break up the Tibetan into discrete words, but for this particular project (because it mostly has to do with proper names, not long passages of text) this hasn't been a problem yet, and the above solution seems to be doing the trick.

I hope this helps.

Good luck!

Bess

On Jan 16, 2007, at 10:23 AM, Jörg Pfründer wrote:

Hello,

is there anyone who has experience on internationalization (internationalisation) with SOLR?

How do you setup a multi language data index? Should we use a dynamic field like text_en, text_fr, text_es?

Is there a GermanPorterFilterFactory or FrenchPorterFilterFactory?

Thank you very much.

Jörg Pfründer

_____________________________________________________
Gratis Emailpostfach mit 2 GB Speicher -
10 SMS - http://www.xemail.de
Spam? mailto:[EMAIL PROTECTED]


Elizabeth (Bess) Sadler
Head, Technical and Metadata Services
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

[EMAIL PROTECTED]
(434) 243-2305


Reply via email to