Indeed, a Solr search component to customize the incoming query for query
language can work as well. Add it to the search components before the
"query" component, have it call the language detection code on the q
parameter, and then modify the "qf" parameter based on the language
discovered.
Two possible approaches come to mind:
1. Modify the qf parameter directly by either adding the "_xx" language
suffix to each field in qf, or replacing the "xx" for any qf fields that
already have an "_xx" suffix.
2. Have separate "qf_xx" parameters which are customized for specific
languages and then copy the language-specific "qf_xx" parameter to the main
qf parameter based on the language that is detected.
-- Jack Krupansky
-----Original Message-----
From: Paul Libbrecht
Sent: Friday, July 4, 2014 11:36 AM
To: solr-user@lucene.apache.org
Subject: Re: multilingual search
To do just what Jack described, I often write a solr query component that
does "query expansion".
Based on some parameters I can recognize to be a language hint (e.g. the
language of the environment they search in, the browser's accept-language) I
reformulate the query into a query in the fields in these languages in a
preference order.
I am sure that doing this produces some noise. E.g. because the search
corpus is not uniformly spread, but… I have to accept it.
There are many other example's than the fine "raison d'être" example of Jack
(I like particularly the way he describes the motivation to using it, I
almost hear people trying to carefully articulate this! ;-)).
Other examples of language cross-use include the "gallicisms" e.g. in
German: http://de.wikipedia.org/wiki/Liste_von_Gallizismen or other
languages linked there.
E.g. "direction" which has a different meanings in French (where it can mean
the management staff) and in English (where it can mean the teacher's
instruction), "demonstration" too, "sitting" (which is an english word used
in French).
paul
On 4 juil. 2014, at 17:15, "Jack Krupansky" <j...@basetechnology.com> wrote:
What leads you to believe that the user is not interested in occurrences
of the French phrase in English text? I mean, we English-speakers and
writers like to use French phrases to show how sophisticated we are! It's
part of our... raison d'être. If I do a Google search for "raison d'être",
it doesn't mysteriously show me only French documents.
So, usually, it needs to be a user preference - the user's preferred
language, and whether they want to search across documents in all
languages or just a subset of languages. And then, on the results page you
can show the language and a button to restrict a re-query to the specific
language.
If you really need to do this query language detection, the best approach
is to do it within your application layer (you can use the Google code for
language detection) and then send the query to the appropriate query
request handler, with a separate query request handler for each language
that optimizes the settings for that language, such as the
language-specific fields to use for the "qf" parameter.
-- Jack Krupansky
-----Original Message----- From: benjelloun
Sent: Friday, July 4, 2014 10:52 AM
To: solr-user@lucene.apache.org
Subject: multilingual search
Hello,
what i need to do is to detect language of my fields then when i search
with
"/select RequestHandler"
how can i define for a search to detect the language of words to choose
which field_langid use.
my conf:
<updateRequestProcessorChain name="langid">
<processor
class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
<lst name="defaults">
<bool name="langid">true</bool>
<str name="langid.fl">NomDocument,ContenuDocument,Postit,
</str>
<str name="langid.langField">language_s</str>
<str name="langid.whitelist">en,fr,ar</str>
<str name="langid.fallback">fr</str>
<float name="langid.threshold">0.6</float>
<bool name="langid.map">true</bool>
<bool name="langid.map.individual">true</bool>
<bool name="langid.map.keepOrig">true</bool>
</lst>
</processor>
<field name="AllChamp_ar" type="text_ar" multiValued="true" indexed="true"
required="false" stored="false"/>
<field name="AllChamp_fr" type="text_fr" multiValued="true" indexed="true"
required="false" stored="false"/>
<field name="AllChamp_en" type="text_en" multiValued="true" indexed="true"
required="false" stored="false"/>
<dynamicField name="*_en" type="text_en" indexed="true" stored="false"
required="false" multiValued="true"/>
<dynamicField name="*_fr" type="text_fr" indexed="true" stored="false"
required="false" multiValued="true"/>
<dynamicField name="*_ar" type="text_ar" indexed="true" stored="false"
required="false" multiValued="true"/>
<copyField source="*_ar" dest="AllChamp_ar"/>
<copyField source="*_fr" dest="AllChamp_fr"/>
<copyField source="*_en" dest="AllChamp_en"/>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="defType">edismax</str>
<str name="qf">
AllChamp^2.0 AllChamp_ar^2.0 AllChamp_en^2.0 AllChamp_fr^5.0
</str>
</lst>
</requestHandler>
exemple for search in Solr Admin: "nous présentons" it is frensh
language.
and "nous" is a stopwords_fr.
but when i search for "nous présontons" i find nous becaus i have some
english docs which contain "nous".
this is just one exemple for on language. i dont want to add stopwords_fr
in
stopwords_en.
what i want is to detect the language before the select search then choose
the field_langid for search.
Best regards,
Anass BENJELLOUN
--
View this message in context:
http://lucene.472066.n3.nabble.com/multilingual-search-tp4145639.html
Sent from the Solr - User mailing list archive at Nabble.com.