--- On Tue, 3/30/10, Andrzej Bialecki <a...@getopt.org> wrote:
> From: Andrzej Bialecki <a...@getopt.org> > Subject: Re: SOLR-1316 How To Implement this autosuggest component ??? > To: solr-user@lucene.apache.org > Date: Tuesday, March 30, 2010, 9:59 AM > On 2010-03-30 15:42, Robert Muir > wrote: > > On Mon, Mar 29, 2010 at 11:34 PM, Andy<angelf...@yahoo.com> > wrote: > > > >> Reading through this thread and SOLR-1316, there > seems to be a lot of > >> different ways to implement auto-complete in Solr. > I've seen the mentions > >> of: > >> > >> EdgeNGrams > >> TermsComponent > >> Faceting > >> TST > >> Patricia Tries > >> RadixTree > >> DAWG > >> > >> > > > > Another idea is you can use the Automaton support in > the lucene flexible > > indexing branch: to query the index directly with a > DFA that represents > > whatever terms you want back. > > The idea is that there really isn't much gain in > building a separate Pat, > > Radix Tree, or DFA to do this when you can efficiently > intersect a DFA with > > the existing terms dictionary. > > > > I don't really understand what autosuggest needs to > do, but if you are doing > > things like looking for mispellings you can easily > build a DFA that > > recognizes terms within some short edit distance with > the support thats > > there (the LevenshteinAutomata class), to quickly get > back candidates. > > > > You can intersect/concatenate/union these DFAs with > prefix or suffix DFAs if > > you want too, don't really understand what the > algorithm should do, but I'm > > happy to try to help. > > > > The problem is a bit more complicated. There are two > issues: > > * simple term-level completion often produces wrong results > for > multi-term queries (which are usually rewritten as "weak" > phrase queries), > > * the weights of suggestions should not correspond directly > to IDF in > the index - much better results can be obtained when they > correspond to > the frequency of terms/phrases in the query logs ... > > TermsComponent and EdgeNGrams, while simple to use, suffer > from both issues. > Thanks. I actually have 2 use cases for autosuggest: 1) The "normal" one - I want to suggest search terms to users after they've typed a few letters. Just like Google suggest. Looks like for this use case SOLR-1316 is the best option. Right? 2) I have a field "city" with values that are entered by users. When a user is entering his city, I want to make suggestion based on what cities have already been entered so far by other users -- in order to reduce chances of duplication. What method would you recommend for this use case?