Chris M. Hostetter created SOLR-15293:
-----------------------------------------

             Summary: Deprecate/remove overwriteDupes option from 
SignatureUpdateProcessorFactory
                 Key: SOLR-15293
                 URL: https://issues.apache.org/jira/browse/SOLR-15293
             Project: Solr
          Issue Type: Sub-task
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Chris M. Hostetter


The design principle of the {{overwriteDupes}} option of 
SignatureUpdateProcessorFactory is something that is only viable in single 
shard use cases, and even then it currently doesn't work because UpdateCommand 
"options" are not included when Shard Leaders write updates to the tlog, or 
forwards them to other replicas (SOLR-8030). With multiple shards it can never 
be viable w/o broadcasting a "Delete By Query" to every replica on every 
document add/update (SOLR-3473) which is vastly less efficient then the current 
low level {{updateDocument(Term,...)}} support provided by IndexWriter for 
replacing documents by uniqueKey.

I think in general we should remove the {{overwriteDupes}} option completely. 
If SignatureUpdateProcessorFactory is used to generate a synthetic uniqueKey 
field then the existing Solr/Lucene behavior of routing the document to the 
correct shard, and replacing any prior instances of that doc will work find.

The functionality of SignatureUpdateProcessorFactory should be constrained 
*solely* to generating a signature – if that signature is put in the unique key 
field, then de-duplication will happen automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to