Re: Auto-suggest in Solr

Alessandro Benedetti Fri, 10 Jul 2015 18:31:08 -0700

Hi guys,
just wrote a blog to integrate Erick's post and to explain in details with
practical examples all the main Lookup implementations :


http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html

I think this can be useful for Edwin to finally fix the config for the
FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer
in dev, and deep code analysis and testing :) )

Cheers

2015-06-27 23:51 GMT+01:00 Alessandro Benedetti <benedetti.ale...@gmail.com>
:

> Thanks, Erick, i didn't have time to go again through the code.
> But i will forward this to the Dev list.
> Thank you for your time !
>
> Cheers
>
> 2015-06-27 16:19 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:
>
>> Alessandro:
>>
>> Going to have to defer to Mike McCandless et.al., they're the
>> authorities here. Don't quite know whether they monitor this list,
>> consider the dev list?
>>
>> Best,
>> Erick
>>
>> On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
>> <benedetti.ale...@gmail.com> wrote:
>> > Up, Can anyone gently take a look to my considerations related the
>> FreeText
>> > Suggester ?
>> > I am curious to have more insight.
>> > Eventually I will deeply analyse the code to understand my errors.
>> >
>> > Cheers
>> >
>> > 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti <
>> benedetti.ale...@gmail.com>
>> > :
>> >
>> >> Actually the documentation is not clear enough.
>> >> Let's try to understand this suggester.
>> >>
>> >> *Building*
>> >> This suggester build a FST that it will use to provide the autocomplete
>> >> feature running prefix searches on it .
>> >> The terms it uses to generate the FST are the tokens produced by the
>> >>  "suggestFreeTextAnalyzerFieldType" .
>> >>
>> >> And this should be correct.
>> >> So if we have a shingle token filter[1-3] ( we produce unigrams as
>> well)
>> >> in our analysis to keep it simple , from these original field values :
>> >> "mp3 ipod"
>> >> "mp3 player"
>> >> "mp3 player ipod"
>> >> "player of Real"
>> >>
>> >> -> we produce these list of possible suggestions in our FST :
>> >>
>> >> <mp3>
>> >> <player>
>> >> <ipod>
>> >> <real>
>> >> <of>
>> >>
>> >> <mp3 ipod>
>> >> <mp3 player>
>> >> <player ipod>
>> >> <player of>
>> >> <of real>
>> >>
>> >> <mp3 player ipod>
>> >> <player of real>
>> >>
>> >> From the documentation I read :
>> >>
>> >>> " ngrams: The max number of tokens out of which singles will be make
>> the
>> >>> dictionary. The default value is 2. Increasing this would mean you
>> want
>> >>> more than the previous 2 tokens to be taken into consideration when
>> making
>> >>> the suggestions. "
>> >>
>> >>
>> >> This makes me confused, as I was not expecting this param to affect the
>> >> suggestion dictionary.
>> >> So I would like a clarification here from our masters :)
>> >> At this point let's see what happens at query time .
>> >>
>> >> *Query Time *
>> >> As my understanding the ngrams params will consider  the last N-1
>> tokens
>> >> the user put separated by the space separator.
>> >>
>> >> "Builds an ngram model from the text sent to {@link
>> >>> * #build} and predicts based on the last grams-1 tokens in
>> >>> * the request sent to {@link #lookup}. This tries to
>> >>> * handle the "long tail" of suggestions for when the
>> >>> * incoming query is a never before seen query string."
>> >>
>> >>
>> >> Example , grams=3 should consider only the last 2 tokens
>> >>
>> >> special mp3 p -> mp3 p
>> >>
>> >> Then this query is analysed using the
>> "suggestFreeTextAnalyzerFieldType" .
>> >> We produce 3 tokens :
>> >> <mp3>
>> >> <p>
>> >> <mp3 p>
>> >>
>> >> And we run the prefix matching on the FST .
>> >>
>> >> *Conclusion*
>> >> My understanding is wrong for sure at some point, as the behaviour I
>> get
>> >> is different.
>> >> Can we discuss this , clarify this and eventually put it in the
>> official
>> >> documentation ?
>> >>
>> >> Cheers
>> >>
>> >> 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
>> >>
>> >>> I'm implementing an auto-suggest feature in Solr, and I'll like to
>> achieve
>> >>> the follwing:
>> >>>
>> >>> For example, if the user enters "mp3", Solr might suggest "mp3
>> player",
>> >>> "mp3 nano" and "mp3 music".
>> >>> When the user enters "mp3 p", the suggestion should narrow down to
>> "mp3
>> >>> player".
>> >>>
>> >>> Currently, when I type "mp3 p", the suggester is returning words that
>> >>> starts with the letter "p" only, and I'm getting results like "plan",
>> >>> "production", etc, and it does not take the "mp3" token into
>> >>> consideration.
>> >>>
>> >>> I'm using Solr 5.1 and below is my configuration:
>> >>>
>> >>> In solrconfig.xml:
>> >>>
>> >>> <searchComponent name="suggest" class="solr.SuggestComponent">
>> >>>   <lst name="suggester">
>> >>>
>> >>>                  <str name="lookupImpl">FreeTextLookupFactory</str>
>> >>>                  <str name="indexPath">suggester_freetext_dir</str>
>> >>>
>> >>> <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>> >>> <str name="field">Suggestion</str>
>> >>> <str name="weightField">Project</str>
>> >>> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str>
>> >>> <int name="ngrams">5</int>
>> >>> <str name="buildOnStartup">false</str>
>> >>> <str name="buildOnCommit">false</str>
>> >>>   </lst>
>> >>> </searchComponent>
>> >>>
>> >>>
>> >>> In schema.xml
>> >>>
>> >>> <fieldType name="suggestType" class="solr.TextField"
>> >>> positionIncrementGap="100">
>> >>> <analyzer type="index">
>> >>> <charFilter class="solr.PatternReplaceCharFilterFactory"
>> >>> pattern="[^a-zA-Z0-9]" replacement=" " />
>> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >>> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>> >>> maxShingleSize="6" outputUnigrams="false"/>
>> >>> </analyzer>
>> >>> <analyzer type="query">
>> >>> <charFilter class="solr.PatternReplaceCharFilterFactory"
>> >>> pattern="[^a-zA-Z0-9]" replacement=" " />
>> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >>> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>> >>> maxShingleSize="6" outputUnigrams="true"/>
>> >>> </analyzer>
>> >>> </fieldType>
>> >>>
>> >>>
>> >>> Is there anything that I configured wrongly?
>> >>>
>> >>>
>> >>> Regards,
>> >>> Edwin
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> --------------------------
>> >>
>> >> Benedetti Alessandro
>> >> Visiting card : http://about.me/alessandro_benedetti
>> >>
>> >> "Tyger, tyger burning bright
>> >> In the forests of the night,
>> >> What immortal hand or eye
>> >> Could frame thy fearful symmetry?"
>> >>
>> >> William Blake - Songs of Experience -1794 England
>> >>
>> >
>> >
>> >
>> > --
>> > --------------------------
>> >
>> > Benedetti Alessandro
>> > Visiting card : http://about.me/alessandro_benedetti
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>>
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Auto-suggest in Solr

Reply via email to