As the comments suggest, it's not a bug, but just the best we can do for now since our priority queues don't support removal of arbitrary elements. I guess we could rebuild the current priority queue if we detect a duplicate, but that will have an obvious performance impact. Any other suggestions?
-Yonik http://www.lucidimagination.com On Wed, Jul 21, 2010 at 3:13 AM, Li Li <fancye...@gmail.com> wrote: > in QueryComponent.mergeIds. It will remove document which has > duplicated uniqueKey with others. In current implementation, it use > the first encountered. > String prevShard = uniqueDoc.put(id, srsp.getShard()); > if (prevShard != null) { > // duplicate detected > numFound--; > collapseList.remove(id+""); > docs.set(i, null);//remove it. > // For now, just always use the first encountered since we > can't currently > // remove the previous one added to the priority queue. > If we switched > // to the Java5 PriorityQueue, this would be easier. > continue; > // make which duplicate is used deterministic based on shard > // if (prevShard.compareTo(srsp.shard) >= 0) { > // TODO: remove previous from priority queue > // continue; > // } > } > > It iterate ove ShardResponse by > for (ShardResponse srsp : sreq.responses) > But the sreq.responses may be different. That is -- shard1's result > and shard2's result may interchange position > So when an uniqueKey(such as url) occurs in both shard1 and shard2. > which one will be used is unpredicatable. But the socre of these 2 > docs are different because of different idf. > So the same query will get different result. > One possible solution is to sort ShardResponse srsp by shard name. >