Hey Andrew, Just wondering if you ever managed to run TextProfileSignature based deduplication. I would appreciate it if you could send me the code fragment for it from solrconfig.
I have currently something like this, but not sure if I am doing it right: <updateRequestProcessorChain name="dedupe"> <processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> <bool name="enabled">true</bool> <str name="signatureField">signature</str> <bool name="overwriteDupes">true</bool> <str name="fields">title,author,abstract</str> <str name="signatureClass">org.apache.solr.update.processor.TextProfileSignature</str> <str name="minTokenLen">3</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> -- Thanks in advance, -Ali -- View this message in context: http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tp479039p880044.html Sent from the Solr - User mailing list archive at Nabble.com.