[ https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567953#action_12567953 ]
patrick o'leary commented on SOLR-303: -------------------------------------- It looks pretty good, I really need the ShardDoc's classes to be split up into public classes so I can use them. It would also be fantastic to open up QueryComponent, my component only needs to over ride a few functions, and it would so much cleaner to just extend QueryComponent rather than duplicate the code. Also through testing, it might be worth while to apply a few negative edge cases. e.g. duplicate documents in different shards. As systems get larger this is a huge possibility. Only fixed hash indexing could ensure you don't get duplicates, but if you try to have an extend-able environment that might not be an option. Took me a while to realize I had duplicated documents during indexing, but it causes NPEs in the query response writers, so not obvious or easy to figure out. A solution would be to maintain map of unique fields as adding the ShardDocs to the priority queue, and continue on duplicates. You might also want to put some logic in there to ensure same shard doc is used for each duplicate doc, simple because the scores for identical doc's will be different across shards, and could change based upon order of which Shard responds first. This should eliminate that So something like QueryComponent.mergeIds {code} Map<Object, String> uniqueDoc = new HashMap<Object, String>(); for (ShardResponse srsp : sreq.responses) { SolrDocumentList docs = srsp.rsp.getResults(); ................ ................ // go through every doc in this response, construct a ShardDoc, and // put it in the priority queue so it can be ordered. for (int i=0; i<docs.size(); i++) { SolrDocument doc = docs.get(i); .................. .................. Object uniqueField = doc.getFieldValue(uniqueKeyField.getName()); if(! uniqueDoc.containsKey(uniqueField)) { shardDoc.setId(uniqueField); uniqueDoc.put(uniqueField, shardDoc.shard); } else{ numFound--; if(uniqueDoc.get(uniqueField).compareTo(shardDoc.shard) >0){ continue; } } .......................... queue.insert(shardDoc); } // end for-each-doc-in-response } // end for-each-response {code} > Distributed Search over HTTP > ---------------------------- > > Key: SOLR-303 > URL: https://issues.apache.org/jira/browse/SOLR-303 > Project: Solr > Issue Type: New Feature > Components: search > Reporter: Sharad Agarwal > Assignee: Yonik Seeley > Attachments: distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch > > > Searching over multiple shards and aggregating results. > Motivated by http://wiki.apache.org/solr/DistributedSearch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.