On 2010-03-30 15:42, Robert Muir wrote:
On Mon, Mar 29, 2010 at 11:34 PM, Andy<angelf...@yahoo.com> wrote:
Reading through this thread and SOLR-1316, there seems to be a lot of
different ways to implement auto-complete in Solr. I've seen the mentions
of:
EdgeNGrams
TermsComponent
Faceting
TST
Patricia Tries
RadixTree
DAWG
Another idea is you can use the Automaton support in the lucene flexible
indexing branch: to query the index directly with a DFA that represents
whatever terms you want back.
The idea is that there really isn't much gain in building a separate Pat,
Radix Tree, or DFA to do this when you can efficiently intersect a DFA with
the existing terms dictionary.
I don't really understand what autosuggest needs to do, but if you are doing
things like looking for mispellings you can easily build a DFA that
recognizes terms within some short edit distance with the support thats
there (the LevenshteinAutomata class), to quickly get back candidates.
You can intersect/concatenate/union these DFAs with prefix or suffix DFAs if
you want too, don't really understand what the algorithm should do, but I'm
happy to try to help.
The problem is a bit more complicated. There are two issues:
* simple term-level completion often produces wrong results for
multi-term queries (which are usually rewritten as "weak" phrase queries),
* the weights of suggestions should not correspond directly to IDF in
the index - much better results can be obtained when they correspond to
the frequency of terms/phrases in the query logs ...
TermsComponent and EdgeNGrams, while simple to use, suffer from both issues.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com