On Tue, Mar 30, 2010 at 9:59 AM, Andrzej Bialecki <a...@getopt.org> wrote:

> The problem is a bit more complicated. There are two issues:

Somehow I guessed this was the case, as admittedly I dont understand what it
should do!

> * simple term-level completion often produces wrong results for multi-term
> queries (which are usually rewritten as "weak" phrase queries),

Yeah, this seems obvious to me. But I don't understand how these other data
structure address this problem. They are just indexing "single terms" too,

> * the weights of suggestions should not correspond directly to IDF in the
> index - much better results can be obtained when they correspond to the
> frequency of terms/phrases in the query logs ...

This makes sense too. Again i'm not really suggesting some solution to the
entire problem, only a quick way to prune the search space directly from the
index to get back candidates for  individual terms (e.g. get the top-25
terms with edit distance <= 1 or 2 for each term).

After that point, you need to do a lot of additional processing, via query
logs, at phrase level, etc, etc...

Again I still don't know if this would even be a good fit, just suggesting a
way for an individual term to get back an enumeration of similar terms very
quickly, that could be some portion of the overall larger algorithm.

Robert Muir

Reply via email to