What leads you to believe that the user is not interested in occurrences of the French phrase in English text? I mean, we English-speakers and writers like to use French phrases to show how sophisticated we are! It's part of our... raison d'être. If I do a Google search for "raison d'être", it doesn't mysteriously show me only French documents.

So, usually, it needs to be a user preference - the user's preferred language, and whether they want to search across documents in all languages or just a subset of languages. And then, on the results page you can show the language and a button to restrict a re-query to the specific language.

If you really need to do this query language detection, the best approach is to do it within your application layer (you can use the Google code for language detection) and then send the query to the appropriate query request handler, with a separate query request handler for each language that optimizes the settings for that language, such as the language-specific fields to use for the "qf" parameter.

-- Jack Krupansky

-----Original Message----- From: benjelloun
Sent: Friday, July 4, 2014 10:52 AM
To: solr-user@lucene.apache.org
Subject: multilingual search

Hello,

what i need to do is to detect language of my fields then when i search with
"/select  RequestHandler"
how can i define for a search to detect the language of words to choose
which field_langid use.

my conf:

<updateRequestProcessorChain name="langid">
      <processor
class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
  <lst name="defaults">
    <bool name="langid">true</bool>
    <str name="langid.fl">NomDocument,ContenuDocument,Postit,
</str>
        <str name="langid.langField">language_s</str>
        <str name="langid.whitelist">en,fr,ar</str>
        <str name="langid.fallback">fr</str>
        <float name="langid.threshold">0.6</float>
        <bool name="langid.map">true</bool>
        <bool name="langid.map.individual">true</bool>
        <bool name="langid.map.keepOrig">true</bool>

  </lst>
</processor>

<field name="AllChamp_ar" type="text_ar" multiValued="true" indexed="true"
required="false" stored="false"/>
<field name="AllChamp_fr" type="text_fr" multiValued="true" indexed="true"
required="false" stored="false"/>
<field name="AllChamp_en" type="text_en" multiValued="true" indexed="true"
required="false" stored="false"/>

<dynamicField name="*_en" type="text_en" indexed="true" stored="false"
required="false" multiValued="true"/>
<dynamicField name="*_fr" type="text_fr" indexed="true" stored="false"
required="false" multiValued="true"/>
<dynamicField name="*_ar" type="text_ar" indexed="true" stored="false"
required="false" multiValued="true"/>

<copyField source="*_ar" dest="AllChamp_ar"/>
<copyField source="*_fr" dest="AllChamp_fr"/>
<copyField source="*_en" dest="AllChamp_en"/>

<requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
  <str name="defType">edismax</str>
      <str name="qf">
  AllChamp^2.0 AllChamp_ar^2.0 AllChamp_en^2.0 AllChamp_fr^5.0
  </str>
    </lst>
</requestHandler>

exemple for search in Solr Admin:  "nous présentons" it is frensh language.
and "nous" is a stopwords_fr.
but when i search for "nous présontons" i find nous becaus i have some
english docs which contain "nous".

this is just one exemple for on language. i dont want to add stopwords_fr in
stopwords_en.
what i want is to detect the language before the select search then choose
the field_langid for search.

Best regards,
Anass BENJELLOUN








--
View this message in context: http://lucene.472066.n3.nabble.com/multilingual-search-tp4145639.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to