Hi Erick, Thanks your advice. I already removed it from the field definition used by the suggester and it works great. I will consider to took it from the entire processing of the other fields. I have only 7000 docs with index size of 18MB so far, so the memory footprint is not a key issue for me.
Best, Roland Erick Erickson <erickerick...@gmail.com> ezt írta (időpont: 2019. júl. 31., Sze, 14:24): > Roland: > > Have you considered just not using stopwords anywhere? Largely they’re a > holdover > from a long time ago when every byte counted. Plus using stopwords has > “interesting” > issues with things like highlighting and phrase queries and the like. > > Sure, not using stopwords will make your index larger, but so will a > copyfield… > > Your call of course, but stopwords are over-used IMO. > > I’m stealing Walter Underwood’s thunder here ;) > > Best, > Erick > > > On Jul 30, 2019, at 2:11 PM, Szűcs Roland <szucs.rol...@bookandwalk.hu> > wrote: > > > > Hi Furkan, > > > > Thanks the suggestion, I always forget the most effective debugging tool > > the analysis page. > > > > It turned out that "Jó" was a stop word and it was eliminated during the > > text analysis. What I will do is to create a new field type but without > > stop word removal and I will use it like this: > > <str > > name="suggestAnalyzerFieldType">short_text_hu_without_stop_removal</str> > > > > Thanks again > > > > Roland > > > > Furkan KAMACI <furkankam...@gmail.com> ezt írta (időpont: 2019. júl. > 30., > > K, 16:17): > > > >> Hi Roland, > >> > >> Could you check Analysis tab ( > >> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell > >> how > >> the term is analyzed for both query and index? > >> > >> Kind Regards, > >> Furkan KAMACI > >> > >> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland < > szucs.rol...@bookandwalk.hu> > >> wrote: > >> > >>> Hi All, > >>> > >>> I have an author suggester (searchcomponent and the related request > >>> handler) defined in solrconfig: > >>> <searchComponent name="suggest" class="solr.SuggestComponent"> > >>> <!-- All suggester component must have different filepath to avoid > >>> write lock issues-->> > >>> <lst name="suggester"> > >>> <str name="name">author</str> > >>> <str name="lookupImpl">AnalyzingInfixLookupFactory</str> > >>> <str name="dictionaryImpl">DocumentDictionaryFactory</str> > >>> <str name="field">BOOK_productAuthor</str> > >>> <str name="suggestAnalyzerFieldType">short_text_hu</str> > >>> <str name="indexPath">suggester_infix_author</str> > >>> <str name="buildOnStartup">false</str> > >>> <str name="buildOnCommit">false</str> > >>> <str name="minPrefixChars">2</str> > >>> </lst> > >>> </searchComponent> > >>> > >>> <requestHandler name="/suggesthandler" class="solr.SearchHandler" > >>> startup="lazy" > > >>> <lst name="defaults"> > >>> <str name="suggest">true</str> > >>> <str name="suggest.count">10</str> > >>> <str name="suggest.dictionary">author</str> > >>> </lst> > >>> <arr name="components"> > >>> <str>suggest</str> > >>> </arr> > >>> </requestHandler> > >>> > >>> Author field has just a minimal text processing in query and index time > >>> based on the following definition: > >>> <fieldType name="short_text_hu" class="solr.TextField" > >>> positionIncrementGap="100" multiValued="true"> > >>> <analyzer type="index"> > >>> <charFilter class="solr.HTMLStripCharFilterFactory"/> > >>> <tokenizer class="solr.ClassicTokenizerFactory"/> > >>> <filter class="solr.StopFilterFactory" words="stopwords_hu.txt" > >>> ignoreCase="true"/> > >>> <filter class="solr.LowerCaseFilterFactory"/> > >>> </analyzer> > >>> <analyzer type="query"> > >>> <tokenizer class="solr.ClassicTokenizerFactory"/> > >>> <filter class="solr.StopFilterFactory" words="stopwords_hu.txt" > >>> ignoreCase="true"/> > >>> <filter class="solr.LowerCaseFilterFactory"/> > >>> </analyzer> > >>> </fieldType> > >>> <fieldType name="string" class="solr.StrField" sortMissingLast="true" > >>> docValues="true"/> > >>> <fieldType name="strings" class="solr.StrField" sortMissingLast="true" > >>> docValues="true" multiValued="true"/> > >>> <fieldType name="text_ar" class="solr.TextField" > >>> positionIncrementGap="100"> > >>> <analyzer> > >>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>> <filter class="solr.LowerCaseFilterFactory"/> > >>> <filter class="solr.StopFilterFactory" > >> words="lang/stopwords_ar.txt" > >>> ignoreCase="true"/> > >>> <filter class="solr.ArabicNormalizationFilterFactory"/> > >>> <filter class="solr.ArabicStemFilterFactory"/> > >>> </analyzer> > >>> </fieldType> > >>> > >>> When I use qeries with only ASCII characters, the results are correct: > >>> "Al":{ > >>> "term":"<b>Al</b>exandre Dumas", "weight":0, "payload":""} > >>> > >>> When I try it with Hungarian authorname with special character: > >>> "Jó":"author":{ > >>> "Jó":{ "numFound":0, "suggestions":[]}} > >>> > >>> When I try it with three letters, it works again: > >>> "Józ":"author":{ > >>> "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza <b>Józ</b>sef", " > >>> weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, > " > >>> payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, > >> "payload":""}, { > >>> "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, { > >>> "term":"<b>Józ</b>sef > >>> Attila", "weight":0, "payload":""}.. > >>> > >>> Any idea how can it happen that a longer string has more matches than a > >>> shorter one. It is inconsistent. What can I do to fix it as it would > >>> results poor customer experience. > >>> They would feel that sometimes they need 2 sometimes 3 characters to > get > >>> suggestions. > >>> > >>> Thanks in advance, > >>> Roland > >>> > >> > >