On Tue, Mar 30, 2010 at 9:59 AM, Andrzej Bialecki <a...@getopt.org> wrote:

>
> The problem is a bit more complicated. There are two issues:
>

Somehow I guessed this was the case, as admittedly I dont understand what it
should do!


>
> * simple term-level completion often produces wrong results for multi-term
> queries (which are usually rewritten as "weak" phrase queries),
>

Yeah, this seems obvious to me. But I don't understand how these other data
structure address this problem. They are just indexing "single terms" too,
correct?


> * the weights of suggestions should not correspond directly to IDF in the
> index - much better results can be obtained when they correspond to the
> frequency of terms/phrases in the query logs ...
>

This makes sense too. Again i'm not really suggesting some solution to the
entire problem, only a quick way to prune the search space directly from the
index to get back candidates for  individual terms (e.g. get the top-25
terms with edit distance <= 1 or 2 for each term).

After that point, you need to do a lot of additional processing, via query
logs, at phrase level, etc, etc...

Again I still don't know if this would even be a good fit, just suggesting a
way for an individual term to get back an enumeration of similar terms very
quickly, that could be some portion of the overall larger algorithm.

-- 
Robert Muir
rcm...@gmail.com

Reply via email to