Hi, I have the requirement to index internationalized fields ('name') with Solr. For this purpose, I want to use dynamic fields and have e.g. 'name_en', 'name_de', 'name_fr' in my Solr documents.
When querying the index, I need to know which language a match was found in. For this, I want to use Solr highlighting. My problem is now, that the highlighting seems to work inconsistently which is a problem in my use case. The field configuration for e.g. my dynamic field '*_en' field is as follows: <dynamicField name="*_en" type="text_en" indexed="true" stored="true" multiValued="false"/> The field type 'text_en' is configured as follows: <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <!-- Case insensitive stop word removal. --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory: <filter class="solr.EnglishMinimalStemFilterFactory"/> --> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory: <filter class="solr.EnglishMinimalStemFilterFactory"/> --> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> My index contains the following document: <doc> <int name="id">25</int> <str name="name_it">Note Test</str> <str name="description_it"/> <str name="name_en">Note Test Translation</str> <str name="description_en"/> <long name="_version_">1504065955969368064</long> </doc> The query defType=edismax&q=Translation&hl=on&hl.fl=name_* returns the above document but does not highlight anything. The query defType=edismax&q=name_en:Translation&hl=on&hl.fl=name_* returns the above document AND highlights 'Translation' as expected. Since translation does occur in any other field, I do not understand how the match could have occurred on a different than 'name_en' (which would explain why 'name_en' is not highlighted). I already tried: http://stackoverflow.com/questions/23755097/solr-highlighting-hl-simple-pre-post-doesnt-appear-sometime http://lucene.472066.n3.nabble.com/Urgent-Highlighting-not-working-as-expected-td3983755.html http://stackoverflow.com/questions/9842886/why-is-this-simple-solr-highlighting-attempt-failing Neither worked. Moreover, when I run defType=edismax&q=Note&hl=on&hl.fl=name_* the result is <doc> <int name="id">25</int> <str name="name_it">Note Test</str> <str name="description_it"/> <str name="name_en">Note Test Translation</str> <str name="description_en"/> <long name="_version_">1504067222466723840</long> </doc> <doc> <int name="id">27</int> <str name="name_de">Note Test child</str> <str name="description_de"/> <long name="_version_">1504067222528589824</long> </doc> However, the highlighting only contains fields of document 25 but not 27: <lst name="highlighting"> <lst name="25"> <arr name="name_it"> <str><em>Note</em> Test</str> </arr> <arr name="name_en"> <str><em>Note</em> Test Translation</str> </arr> </lst> <lstname="27"/> </lst> I really do not understand what is happening here and what I can do to make the highlighting consistent. Also, is my approach with the 'name_en', 'name_de', ... for localized field indexing reasonable or is there a much more preferable way? Thank you for your help and best regards Moritz Becker Softwareentwicklung curecomp Software Services GmbH Hafenstrasse 47-51 4020 Linz web: www.curecomp.com<http://www.curecomp.com/> e-Mail: m.bec...@curecomp.com<mailto:m.bec...@curecomp.com> [Beschreibung: Beschreibung: premium SRM for premium customers]