Re: a bug of solr distributed search

Yonik Seeley Thu, 22 Jul 2010 10:28:58 -0700

As the comments suggest, it's not a bug, but just the best we can do
for now since our priority queues don't support removal of arbitrary
elements.  I guess we could rebuild the current priority queue if we
detect a duplicate, but that will have an obvious performance impact.
Any other suggestions?


-Yonik
http://www.lucidimagination.com

On Wed, Jul 21, 2010 at 3:13 AM, Li Li <fancye...@gmail.com> wrote:
> in QueryComponent.mergeIds. It will remove document which has
> duplicated uniqueKey with others. In current implementation, it use
> the first encountered.
>          String prevShard = uniqueDoc.put(id, srsp.getShard());
>          if (prevShard != null) {
>            // duplicate detected
>            numFound--;
>            collapseList.remove(id+"");
>            docs.set(i, null);//remove it.
>            // For now, just always use the first encountered since we
> can't currently
>            // remove the previous one added to the priority queue.
> If we switched
>            // to the Java5 PriorityQueue, this would be easier.
>            continue;
>            // make which duplicate is used deterministic based on shard
>            // if (prevShard.compareTo(srsp.shard) >= 0) {
>            //  TODO: remove previous from priority queue
>            //  continue;
>            // }
>          }
>
>  It iterate ove ShardResponse by
> for (ShardResponse srsp : sreq.responses)
> But the sreq.responses may be different. That is -- shard1's result
> and shard2's result may interchange position
> So when an uniqueKey(such as url) occurs in both shard1 and shard2.
> which one will be used is unpredicatable. But the socre of these 2
> docs are different because of different idf.
> So the same query will get different result.
> One possible solution is to sort ShardResponse srsp  by shard name.
>

Re: a bug of solr distributed search

Reply via email to