Re: IDF in Distributed Search

Walter Underwood Fri, 11 Apr 2008 21:14:55 -0700

Global IDF does not require another request/response.
It is nearly free if you return the right info.


Return the total number of docs and the df in the original
response. Sum the doc counts and dfs, recompute the idf,
and re-rank.

See this post for an efficient way to do it:

  
http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.htm
l

This works best if you treat the results from each server as
a queue and refill just that queue when it is exhausted. All the
good results might be from one server.

wunder

On 4/11/08 8:50 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> On Fri, Apr 11, 2008 at 11:39 PM, Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
>>  So, I'd like to see what it would take to add distributed IDF info to Solr's
>> distributed search.
>>  Here are some questions to get the discussion going:
>>  - Is anyone already working on it?
>>  - Does anyone plan on working on it in the very near future?
>>  - Does anyone already have thoughts how and where dist. idf could be plugged
>> in?
>>  - There is a mention of dist idf and performance cost up there - any idea
>> how costly dist idf would
> 
> It's relatively easy to implement, but the performance cost is is not
> negligible since it adds another search "phase" (another
> request-response).  It should be optional of course (globalidf=true),
> so there is no reason not to add this feature.
> 
> I also left room for this stage (ResponseBuilder.STAGE_PARSE_QUERY),
> which is ordered before query execution.
> 
> -Yonik

Re: IDF in Distributed Search

Reply via email to