Edismax query using different strings for different fields

David Zimmermann Fri, 05 Jun 2020 07:11:01 -0700

I could need some advice on how to handle a particular cross language search 
with Solr. I posted it on Stackoverflow 2 months ago, but could not find a 
solution.
I have documents in 3 languages (English, German, French). For simplicity let's 
assume it's just two languages (English and German). The documents are 
standardised in the sense that they contain the same parts (text_part1 and 
text_part2), just the language they are written in is different. The language 
of the documents is known. In my index schema I use one core with different 
fields for each language.

For a German document the index will look something like this:

* text_part1_en: empty
* text_part2_en: empty
* text_part1_de: German text
* text_part2_de: Another German text

For an English document it will be the other way around.

What I want to achieve: A user entering a query in English should receive both,
English and German documents that are relevant to his search. Further
conditions are:

* I want results with hits in text_part1 and text_part2 to be higher ranked
than results with hits only in one field (tie value > 0).
* The queries will not be single words, but full sentences (stop word
removal needed and partial hits [only a few words out of the sentences] must be
valid).
* English and German documents must output into one ranking. I need to be
able to compare the relevance of an English document to the relevance of a
German document.
* the text parts need to stay separate, I want to boost the importance of
(let's say part1) over the other.

My general approach so far has been to get a German translation of the user's
query by sending it to a translation API. Then I want use an edismax query,
since it seems to fulfill all of my requirements. The problem is that I cannot
manage to search for the German query in the German fields and the English
query in the English fields only. The Solr edismax
documentation<https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html>
states that it supports the full Lucene query parser syntax, but I can't find
a way to address different fields with different inputs. I tried:

q=text_part1_en: (A sentence in English) text_part1_de: (Ein Satz auf Deutsch)
text_part2_en: (A sentence in English) text_part2_de: (Ein Satz auf Deutsch)
qf=text_part1_en text_part2_en text_part1_de text_part2_de

This syntax should be in line with what MatsLindh wrote in this
thread<https://stackoverflow.com/questions/53371028/different-search-term-on-different-fields-using-edismax-query-parser-in-solr>.
I tried different versions of writing this q, but whatever I do Solr always
search for the full q string in all four fields given by qf, which totally
messes up the result. Am I just making mistakes in the query syntax or is it
even possible to do what I'm trying to do using edismax?

Any help would be highly appreciated.

Edismax query using different strings for different fields

Reply via email to