On Tue, Mar 30, 2010 at 9:59 AM, Andrzej Bialecki <a...@getopt.org> wrote:
> > The problem is a bit more complicated. There are two issues: > Somehow I guessed this was the case, as admittedly I dont understand what it should do! > > * simple term-level completion often produces wrong results for multi-term > queries (which are usually rewritten as "weak" phrase queries), > Yeah, this seems obvious to me. But I don't understand how these other data structure address this problem. They are just indexing "single terms" too, correct? > * the weights of suggestions should not correspond directly to IDF in the > index - much better results can be obtained when they correspond to the > frequency of terms/phrases in the query logs ... > This makes sense too. Again i'm not really suggesting some solution to the entire problem, only a quick way to prune the search space directly from the index to get back candidates for individual terms (e.g. get the top-25 terms with edit distance <= 1 or 2 for each term). After that point, you need to do a lot of additional processing, via query logs, at phrase level, etc, etc... Again I still don't know if this would even be a good fit, just suggesting a way for an individual term to get back an enumeration of similar terms very quickly, that could be some portion of the overall larger algorithm. -- Robert Muir rcm...@gmail.com