Hi guys, just wrote a blog to integrate Erick's post and to explain in details with practical examples all the main Lookup implementations :
http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html I think this can be useful for Edwin to finally fix the config for the FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer in dev, and deep code analysis and testing :) ) Cheers 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti <benedetti.ale...@gmail.com> : > Thanks, Erick, i didn't have time to go again through the code. > But i will forward this to the Dev list. > Thank you for your time ! > > Cheers > > 2015-06-27 16:19 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > >> Alessandro: >> >> Going to have to defer to Mike McCandless et.al., they're the >> authorities here. Don't quite know whether they monitor this list, >> consider the dev list? >> >> Best, >> Erick >> >> On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti >> <benedetti.ale...@gmail.com> wrote: >> > Up, Can anyone gently take a look to my considerations related the >> FreeText >> > Suggester ? >> > I am curious to have more insight. >> > Eventually I will deeply analyse the code to understand my errors. >> > >> > Cheers >> > >> > 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti < >> benedetti.ale...@gmail.com> >> > : >> > >> >> Actually the documentation is not clear enough. >> >> Let's try to understand this suggester. >> >> >> >> *Building* >> >> This suggester build a FST that it will use to provide the autocomplete >> >> feature running prefix searches on it . >> >> The terms it uses to generate the FST are the tokens produced by the >> >> "suggestFreeTextAnalyzerFieldType" . >> >> >> >> And this should be correct. >> >> So if we have a shingle token filter[1-3] ( we produce unigrams as >> well) >> >> in our analysis to keep it simple , from these original field values : >> >> "mp3 ipod" >> >> "mp3 player" >> >> "mp3 player ipod" >> >> "player of Real" >> >> >> >> -> we produce these list of possible suggestions in our FST : >> >> >> >> <mp3> >> >> <player> >> >> <ipod> >> >> <real> >> >> <of> >> >> >> >> <mp3 ipod> >> >> <mp3 player> >> >> <player ipod> >> >> <player of> >> >> <of real> >> >> >> >> <mp3 player ipod> >> >> <player of real> >> >> >> >> From the documentation I read : >> >> >> >>> " ngrams: The max number of tokens out of which singles will be make >> the >> >>> dictionary. The default value is 2. Increasing this would mean you >> want >> >>> more than the previous 2 tokens to be taken into consideration when >> making >> >>> the suggestions. " >> >> >> >> >> >> This makes me confused, as I was not expecting this param to affect the >> >> suggestion dictionary. >> >> So I would like a clarification here from our masters :) >> >> At this point let's see what happens at query time . >> >> >> >> *Query Time * >> >> As my understanding the ngrams params will consider the last N-1 >> tokens >> >> the user put separated by the space separator. >> >> >> >> "Builds an ngram model from the text sent to {@link >> >>> * #build} and predicts based on the last grams-1 tokens in >> >>> * the request sent to {@link #lookup}. This tries to >> >>> * handle the "long tail" of suggestions for when the >> >>> * incoming query is a never before seen query string." >> >> >> >> >> >> Example , grams=3 should consider only the last 2 tokens >> >> >> >> special mp3 p -> mp3 p >> >> >> >> Then this query is analysed using the >> "suggestFreeTextAnalyzerFieldType" . >> >> We produce 3 tokens : >> >> <mp3> >> >> <p> >> >> <mp3 p> >> >> >> >> And we run the prefix matching on the FST . >> >> >> >> *Conclusion* >> >> My understanding is wrong for sure at some point, as the behaviour I >> get >> >> is different. >> >> Can we discuss this , clarify this and eventually put it in the >> official >> >> documentation ? >> >> >> >> Cheers >> >> >> >> 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>: >> >> >> >>> I'm implementing an auto-suggest feature in Solr, and I'll like to >> achieve >> >>> the follwing: >> >>> >> >>> For example, if the user enters "mp3", Solr might suggest "mp3 >> player", >> >>> "mp3 nano" and "mp3 music". >> >>> When the user enters "mp3 p", the suggestion should narrow down to >> "mp3 >> >>> player". >> >>> >> >>> Currently, when I type "mp3 p", the suggester is returning words that >> >>> starts with the letter "p" only, and I'm getting results like "plan", >> >>> "production", etc, and it does not take the "mp3" token into >> >>> consideration. >> >>> >> >>> I'm using Solr 5.1 and below is my configuration: >> >>> >> >>> In solrconfig.xml: >> >>> >> >>> <searchComponent name="suggest" class="solr.SuggestComponent"> >> >>> <lst name="suggester"> >> >>> >> >>> <str name="lookupImpl">FreeTextLookupFactory</str> >> >>> <str name="indexPath">suggester_freetext_dir</str> >> >>> >> >>> <str name="dictionaryImpl">DocumentDictionaryFactory</str> >> >>> <str name="field">Suggestion</str> >> >>> <str name="weightField">Project</str> >> >>> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str> >> >>> <int name="ngrams">5</int> >> >>> <str name="buildOnStartup">false</str> >> >>> <str name="buildOnCommit">false</str> >> >>> </lst> >> >>> </searchComponent> >> >>> >> >>> >> >>> In schema.xml >> >>> >> >>> <fieldType name="suggestType" class="solr.TextField" >> >>> positionIncrementGap="100"> >> >>> <analyzer type="index"> >> >>> <charFilter class="solr.PatternReplaceCharFilterFactory" >> >>> pattern="[^a-zA-Z0-9]" replacement=" " /> >> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> >>> <filter class="solr.ShingleFilterFactory" minShingleSize="2" >> >>> maxShingleSize="6" outputUnigrams="false"/> >> >>> </analyzer> >> >>> <analyzer type="query"> >> >>> <charFilter class="solr.PatternReplaceCharFilterFactory" >> >>> pattern="[^a-zA-Z0-9]" replacement=" " /> >> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> >>> <filter class="solr.ShingleFilterFactory" minShingleSize="2" >> >>> maxShingleSize="6" outputUnigrams="true"/> >> >>> </analyzer> >> >>> </fieldType> >> >>> >> >>> >> >>> Is there anything that I configured wrongly? >> >>> >> >>> >> >>> Regards, >> >>> Edwin >> >>> >> >> >> >> >> >> >> >> -- >> >> -------------------------- >> >> >> >> Benedetti Alessandro >> >> Visiting card : http://about.me/alessandro_benedetti >> >> >> >> "Tyger, tyger burning bright >> >> In the forests of the night, >> >> What immortal hand or eye >> >> Could frame thy fearful symmetry?" >> >> >> >> William Blake - Songs of Experience -1794 England >> >> >> > >> > >> > >> > -- >> > -------------------------- >> > >> > Benedetti Alessandro >> > Visiting card : http://about.me/alessandro_benedetti >> > >> > "Tyger, tyger burning bright >> > In the forests of the night, >> > What immortal hand or eye >> > Could frame thy fearful symmetry?" >> > >> > William Blake - Songs of Experience -1794 England >> > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England