Hi, We are experimenting a parallel approach to issue a large OR-Boolean query, e.g., keywords:(1 OR 2 OR 3 OR ... OR 102400), against several solr shards.
The way we are trying is to break the large query into smaller ones, e.g., the example above can be broken into 10 small queries: keywords:(1 OR 2 OR 3 OR ... OR 1024), keywords:(1025 OR 1026 OR 1027 OR ... OR 2048), etc Now each shard will get 10 requests and the master will merge the results coming back from each shard, similar to the regular distributed search. There are a few issues with this approach, though: Issue #1: sorting: ShardFieldSortedHitQueue.lessThan() checks if two documents are from the same shard; if yes, it will just compare the orderInShard. However, in our approach, two documents from the same shard could be results of two different small queries, in that case, we can't just compare the orderInShard. There is a simple solution to this issue: just change the if statement to : if (docA.shard == docB.shard && docA.sortFieldValues == docB.sortFieldValues) Can someone make this change for Solr1.4 so that we don't have to customize it? It does NOT impact the normal case. Issue #2: number of matching documents found: Since multiple responses from the same shard could contain duplicated documents, the sum of number of documents found from each response will be greater than the real number of documents found. Unless we ask each shard to return all the matching records for every small query and then check duplicates, we can't get the accurate number of documents found. Issue #3: faceting counts Due to the same limitation as in issue #2, the faceting count is always greater than the correct one. I don't have any idea how this issue can be resolved.