On Sat, Dec 11, 2010 at 2:18 AM, bbarani <bbar...@gmail.com> wrote:
> Also, if I try to sort the query result from shards.. will sorting happens
> on the consolidated data or on each individual core data?

Both - to find the top 10 docs by any sort, the top 10 docs from each
shard are collected and then
sorted to find the top 10 out of those.

> I am just trying to figure out best possible way to implement distributed
> search without affecting the search relevancy.

The "IDF" part of the relevancy score is the only place that
distributed search scoring won't "match up" with no distributed
scoring because the document frequency used for the term is local to
every core instead of global.  If you distribute your documents fairly
randomly to the different shards, this won't matter.

There is a patch in the works to add global idf, but I think that even
when it's committed, it will default to off because of the higher cost
associated with it.

-Yonik
http://www.lucidimagination.com

Reply via email to