Jira ticket added https://issues.apache.org/jira/browse/SOLR-1908 Wiki entry updated http://wiki.apache.org/solr/Deduplication
On Tuesday 11 May 2010 17:24:10 Mark Miller wrote: > 1. You need to set the sig field to indexed. > 2. This should be added to the wiki > 3. Want to make a JIRA issue? This is not very friendly behavior (when > you have the sig field set to indexed=false and overwriteDupes=true it > should likely complain) > > > List, > > > > > > I've stumbled upon an issue with the deduplication mechanism. It either > > deletes all documents or does nothing at all and it depends on the > > overwriteDupes setting, resp. true and false. > > > > I use a slightly modified configuration: > > > > <updateRequestProcessorChain name="dedupe"> > > <processor > > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> > > <bool name="enabled">true</bool> > > <str name="signatureField">sig</str> > > <bool name="overwriteDupes">true</bool> > > <str name="fields">content</str> > > <str > > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</ > >str> </processor> > > <processor class="solr.LogUpdateProcessorFactory" /> > > <processor class="solr.RunUpdateProcessorFactory" /> > > </updateRequestProcessorChain> > > > > > > <field name="sig" type="string" stored="true" indexed="false" > > multiValued="true" /> > > > > After importing new documents i (only with overwriteDupes=false) can > > clearly see the correct signatures. Most documents have a distinct > > signature and some share the same because the content field's value is > > identical for those documents. > > > > > > Anyway, why does it delete all my documents? Any clues? The wiki is not > > very helpful on this subject. > > > > > > Cheers. > > > > > > Markus Jelsma - Technisch Architect - Buyways BV > > http://www.linkedin.com/in/markus17 > > 050-8536620 / 06-50258350 > Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350