I could need some advice on how to handle a particular cross language search 
with Solr. I posted it on Stackoverflow 2 months ago, but could not find a 
solution.
I have documents in 3 languages (English, German, French). For simplicity let's 
assume it's just two languages (English and German). The documents are 
standardised in the sense that they contain the same parts (text_part1 and 
text_part2), just the language they are written in is different. The language 
of the documents is known. In my index schema I use one core with different 
fields for each language.

For a German document the index will look something like this:

  *   text_part1_en: empty
  *   text_part2_en: empty
  *   text_part1_de: German text
  *   text_part2_de: Another German text

For an English document it will be the other way around.

What I want to achieve: A user entering a query in English should receive both, 
English and German documents that are relevant to his search. Further 
conditions are:

  *   I want results with hits in text_part1 and text_part2 to be higher ranked 
than results with hits only in one field (tie value > 0).
  *   The queries will not be single words, but full sentences (stop word 
removal needed and partial hits [only a few words out of the sentences] must be 
valid).
  *   English and German documents must output into one ranking. I need to be 
able to compare the relevance of an English document to the relevance of a 
German document.
  *   the text parts need to stay separate, I want to boost the importance of 
(let's say part1) over the other.

My general approach so far has been to get a German translation of the user's 
query by sending it to a translation API. Then I want use an edismax query, 
since it seems to fulfill all of my requirements. The problem is that I cannot 
manage to search for the German query in the German fields and the English 
query in the English fields only. The Solr edismax 
documentation<https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html>
 states that it supports the full Lucene query parser syntax, but I can't find 
a way to address different fields with different inputs. I tried:

q=text_part1_en: (A sentence in English) text_part1_de: (Ein Satz auf Deutsch) 
text_part2_en: (A sentence in English) text_part2_de: (Ein Satz auf Deutsch)
qf=text_part1_en text_part2_en text_part1_de text_part2_de


This syntax should be in line with what MatsLindh wrote in this 
thread<https://stackoverflow.com/questions/53371028/different-search-term-on-different-fields-using-edismax-query-parser-in-solr>.
 I tried different versions of writing this q, but whatever I do Solr always 
search for the full q string in all four fields given by qf, which totally 
messes up the result. Am I just making mistakes in the query syntax or is it 
even possible to do what I'm trying to do using edismax?

Any help would be highly appreciated.

Reply via email to