Hi All, I have an author suggester (searchcomponent and the related request handler) defined in solrconfig: <searchComponent name="suggest" class="solr.SuggestComponent"> <!-- All suggester component must have different filepath to avoid write lock issues-->> <lst name="suggester"> <str name="name">author</str> <str name="lookupImpl">AnalyzingInfixLookupFactory</str> <str name="dictionaryImpl">DocumentDictionaryFactory</str> <str name="field">BOOK_productAuthor</str> <str name="suggestAnalyzerFieldType">short_text_hu</str> <str name="indexPath">suggester_infix_author</str> <str name="buildOnStartup">false</str> <str name="buildOnCommit">false</str> <str name="minPrefixChars">2</str> </lst> </searchComponent>
<requestHandler name="/suggesthandler" class="solr.SearchHandler" startup="lazy" > <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.count">10</str> <str name="suggest.dictionary">author</str> </lst> <arr name="components"> <str>suggest</str> </arr> </requestHandler> Author field has just a minimal text processing in query and index time based on the following definition: <fieldType name="short_text_hu" class="solr.TextField" positionIncrementGap="100" multiValued="true"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords_hu.txt" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords_hu.txt" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/> <fieldType name="strings" class="solr.StrField" sortMissingLast="true" docValues="true" multiValued="true"/> <fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" words="lang/stopwords_ar.txt" ignoreCase="true"/> <filter class="solr.ArabicNormalizationFilterFactory"/> <filter class="solr.ArabicStemFilterFactory"/> </analyzer> </fieldType> When I use qeries with only ASCII characters, the results are correct: "Al":{ "term":"<b>Al</b>exandre Dumas", "weight":0, "payload":""} When I try it with Hungarian authorname with special character: "Jó":"author":{ "Jó":{ "numFound":0, "suggestions":[]}} When I try it with three letters, it works again: "Józ":"author":{ "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza <b>Józ</b>sef", " weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, " payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, { "term":"<b>Józ</b>sef Attila", "weight":0, "payload":""}.. Any idea how can it happen that a longer string has more matches than a shorter one. It is inconsistent. What can I do to fix it as it would results poor customer experience. They would feel that sometimes they need 2 sometimes 3 characters to get suggestions. Thanks in advance, Roland