1. You need to set the sig field to indexed.
2. This should be added to the wiki
3. Want to make a JIRA issue? This is not very friendly behavior (when you have the sig field set to indexed=false and overwriteDupes=true it should likely complain)



--
- Mark

http://www.lucidimagination.com


On 5/11/10 4:13 AM, Markus Jelsma wrote:
List,


I've stumbled upon an issue with the deduplication mechanism. It either
deletes all documents or does nothing at all and it depends on the
overwriteDupes setting, resp. true and false.

I use a slightly modified configuration:

   <updateRequestProcessorChain name="dedupe">
     <processor
class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
       <bool name="enabled">true</bool>
       <str name="signatureField">sig</str>
       <bool name="overwriteDupes">true</bool>
       <str name="fields">content</str>
       <str
name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
     </processor>
     <processor class="solr.LogUpdateProcessorFactory" />
     <processor class="solr.RunUpdateProcessorFactory" />
   </updateRequestProcessorChain>


         <field name="sig" type="string" stored="true" indexed="false"
multiValued="true" />

After importing new documents i (only with overwriteDupes=false) can clearly
see the correct signatures. Most documents have a distinct signature and some
share the same because the content field's value is identical for those
documents.


Anyway, why does it delete all my documents? Any clues? The wiki is not very
helpful on this subject.


Cheers.


Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to