Hi, We're testing a large multi lingual index with _LANG fields for each language and using dismax to query them all. Users provide, explicit or implicit, language preferences that we use for either additive or multiplicative boosting on the language of the document. However, additive boosting is not adequate because it cannot overcome the extremely high IDF values for the same word in another language so regardless of the the preference, foreign documents are returned. Multiplicative boosting solves this problem but has the other downside as it doesn't allow us with standard qf=field^boost to prefer documents in another language above the preferred language because the multiplicative is so strong. We do use the def function (boost=def(query($qq),.3)) to prevent one boost query to return 0 and thus a product of 0 for all boost queries. But it doesn't help that much
This all comes down to IDF differences between the languages, even common words such as country names like `india` show large differences in IDF. Is here anyone with some hints or experiences to share about skewed IDF in such an index? Thanks, Markus