See below: But also search the archives for multilanguage, this topic has been discussed many times before. Lucid Imagination maintains a Solr-powered (of course) searchable list at: http://www.lucidimagination.com/search/
<http://www.lucidimagination.com/search/> On Wed, Oct 20, 2010 at 9:03 AM, Jakub Godawa <jakub.god...@gmail.com>wrote: > Hi everyone! (my first post) > > I am new, but really curious about usefullness of lucene/solr in documents > search from the web applications. I use Ruby on Rails to create one, with > plugin "acts_as_solr_reloaded" that makes connection between web app and > solr easy. > > So I am in a point, where I know that good solution is to prepare > multi-language documents with fields like: > question_en, answer_en, > question_fr, answer_fr, > question_pl, answer_pl... etc. > > I need to create an index that would work with 6 languages: english, > french, > german, russian, ukrainian and polish. > > My questions are: > 1. Is it doable to have just one search field that behaves like Google's > for > all those documents? It can be an option to indicate a language to search. > This depends on what you mean by do-able. Are you going to allow a French user to search an English document (& etc)? But the real answer is "yes, you can if you .....". There'll be tradeoffs. Take a look at the dismax handler. It's kind of hard to grok all at once, but you can cause it to search across multiple fields. That is, the user types "language", and you can turn it into a complex query under the covers like lang_en:language lang_fr:language lang_ru:language, etc. You can also apply boosts. Note that this has obvious problems with, say, Russian. Half your job will be figuring out what will satisfy the user..... You could also have a #different# dismax handler defined for various languages. Say the user was coming from Spanish. Consider a browseES handler. See solrconfig.xml for the default dismax handler. The Solr book mentioned above describes this. > 2. How should I begin changing the solr/conf/schema.xml (or other) file to > tailor it to my needs? As I am a real rookie here, I am still a bit > confused > about "fields", "fieldTypes" and their connection with particular field > (ex. > answer_fr) and the "tokenizers" and "analyzers". If someone can provide a > basic step by step tutorial on how to make it work in two languages I would > be more that happy. > You have several choices here: > books "Lucene in Action" and "Solr 1.4, Enterprise SearchServer" both have discussions here. > Spend some time on the solr/admin/analysis page. That page allows you to see pretty much exactly what each of the steps in an analyzer chain accomplish. > 3. Do all those languages are supported (officially/unofficialy) by > lucene/solr? > See: http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/Analyzer.html Remember that Solr is built on Lucene, so these analyzers are available. > > Thank you for help, > Jakub Godawa. > Best Erick