a bug of solr distributed search

Li Li Wed, 21 Jul 2010 00:14:08 -0700

in QueryComponent.mergeIds. It will remove document which has
duplicated uniqueKey with others. In current implementation, it use
the first encountered.
          String prevShard = uniqueDoc.put(id, srsp.getShard());
          if (prevShard != null) {
            // duplicate detected
            numFound--;
            collapseList.remove(id+"");
            docs.set(i, null);//remove it.
            // For now, just always use the first encountered since we
can't currently
            // remove the previous one added to the priority queue.
If we switched
            // to the Java5 PriorityQueue, this would be easier.
            continue;
            // make which duplicate is used deterministic based on shard
            // if (prevShard.compareTo(srsp.shard) >= 0) {
            //  TODO: remove previous from priority queue
            //  continue;
            // }
          }


 It iterate ove ShardResponse by
for (ShardResponse srsp : sreq.responses)
But the sreq.responses may be different. That is -- shard1's result
and shard2's result may interchange position
So when an uniqueKey(such as url) occurs in both shard1 and shard2.
which one will be used is unpredicatable. But the socre of these 2
docs are different because of different idf.
So the same query will get different result.
One possible solution is to sort ShardResponse srsp  by shard name.

a bug of solr distributed search

Reply via email to