Thanks Mark,
I already fixed it in the meantime and quickly went on with the usual stuff, i know, bad me =). I'll file a Jira report tomorrow and update the wiki on this subject. I'll can also file another ticket from another current topic on this subject; that's about a proper use-case for the update handler to return information on which documents where rejected due to dedupe. I would like to think that updating the wiki with links to those new Jira tickets would be a good idea for other readers, is it not? Cheers, -----Original message----- From: Mark Miller <markrmil...@gmail.com> Sent: Tue 11-05-2010 17:25 To: solr-user@lucene.apache.org; Subject: Re: Dedupe and overwriteDupes setting 1. You need to set the sig field to indexed. 2. This should be added to the wiki 3. Want to make a JIRA issue? This is not very friendly behavior (when you have the sig field set to indexed=false and overwriteDupes=true it should likely complain) -- - Mark http://www.lucidimagination.com On 5/11/10 4:13 AM, Markus Jelsma wrote: > List, > > > I've stumbled upon an issue with the deduplication mechanism. It either > deletes all documents or does nothing at all and it depends on the > overwriteDupes setting, resp. true and false. > > I use a slightly modified configuration: > > <updateRequestProcessorChain name="dedupe"> > <processor > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> > <bool name="enabled">true</bool> > <str name="signatureField">sig</str> > <bool name="overwriteDupes">true</bool> > <str name="fields">content</str> > <str > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str> > </processor> > <processor class="solr.LogUpdateProcessorFactory" /> > <processor class="solr.RunUpdateProcessorFactory" /> > </updateRequestProcessorChain> > > > <field name="sig" type="string" stored="true" indexed="false" > multiValued="true" /> > > After importing new documents i (only with overwriteDupes=false) can clearly > see the correct signatures. Most documents have a distinct signature and some > share the same because the content field's value is identical for those > documents. > > > Anyway, why does it delete all my documents? Any clues? The wiki is not very > helpful on this subject. > > > Cheers. > > > Markus Jelsma - Technisch Architect - Buyways BV > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >