Re: SOLR-1316 How To Implement this autosuggest component ???

Andrzej Bialecki Wed, 31 Mar 2010 04:06:38 -0700

On 2010-03-31 06:14, Andy wrote:



--- On Tue, 3/30/10, Andrzej Bialecki<a...@getopt.org>  wrote:

From: Andrzej Bialecki<a...@getopt.org>
Subject: Re: SOLR-1316 How To Implement this autosuggest component ???
To: solr-user@lucene.apache.org
Date: Tuesday, March 30, 2010, 9:59 AM
On 2010-03-30 15:42, Robert Muir
wrote:

On Mon, Mar 29, 2010 at 11:34 PM, Andy<angelf...@yahoo.com>

wrote:

Reading through this thread and SOLR-1316, there

seems to be a lot of

different ways to implement auto-complete in Solr.

I've seen the mentions

of:

EdgeNGrams
TermsComponent
Faceting
TST
Patricia Tries
RadixTree
DAWG


Another idea is you can use the Automaton support in

the lucene flexible

indexing branch: to query the index directly with a

DFA that represents

whatever terms you want back.
The idea is that there really isn't much gain in

building a separate Pat,

Radix Tree, or DFA to do this when you can efficiently

intersect a DFA with

the existing terms dictionary.

I don't really understand what autosuggest needs to

do, but if you are doing

things like looking for mispellings you can easily

build a DFA that

recognizes terms within some short edit distance with

the support thats

there (the LevenshteinAutomata class), to quickly get

back candidates.


You can intersect/concatenate/union these DFAs with

prefix or suffix DFAs if

you want too, don't really understand what the

algorithm should do, but I'm

happy to try to help.


The problem is a bit more complicated. There are two
issues:

* simple term-level completion often produces wrong results
for
multi-term queries (which are usually rewritten as "weak"
phrase queries),

* the weights of suggestions should not correspond directly
to IDF in
the index - much better results can be obtained when they
correspond to
the frequency of terms/phrases in the query logs ...

TermsComponent and EdgeNGrams, while simple to use, suffer
from both issues.


Thanks.

I actually have 2 use cases for autosuggest:

1) The "normal" one - I want to suggest search terms to users after they've 
typed a few letters. Just like Google suggest. Looks like for this use case SOLR-1316 is 
the best option. Right?

Hopefully, yes - it depends on how you intend to populate the TST. Ifyou populate it from the main index, then (unless you have indexedphrases) there won't be any benefit over the TermsComponent. It may befaster, but it will take more RAM. If you populate it from a list oftop-N queries, then SOLR-1316 is the way to go.

2) I have a field "city" with values that are entered by users. When a user is 
entering his city, I want to make suggestion based on what cities have already been 
entered so far by other users -- in order to reduce chances of duplication. What method 
would you recommend for this use case?

If the "city" field is not analyzed then TermsComponent is easiest touse. If it is analyzed, but vast majority of cities are single terms,then TermsComponent is ok too. If you want to assign differentpriorities to suggestions (other than a simple IDF based priority), orhave many city names consisting of multiple tokens, then use SOLR-1316.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: SOLR-1316 How To Implement this autosuggest component ???

Reply via email to