Hi all, I've followed the instructions at this link http://wiki.apache.org/solr/Deduplication and got the basic dedupe field working. However, it doesn't seem to recognize case differences or white space differences even thought I've defined the type of the fields to be used for dedupe as well as the signature field as followings in schema.xml
<fieldType autoGeneratePhraseQueries="true" class="solr.TextField" name="text_ws_lower" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="name" type="text_ws_lower"/> <field name="signatureField" type="text_ws_lower"/> and in the solrconfig.xml <updateRequestProcessorChain name="dedupe"> <processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> <bool name="enabled">true</bool> <bool name="overwriteDupes">false</bool> <str name="signatureField">signatureField</str> <str name="fields">name</str> <str name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> I know a possible solution is to lowercase and remove white spaces for the field "name" before submiting documents to solr, but is there any other alternatives so that when the following data is given Name: JOHN SMITH and jOhn SMITh the documents have the same outcome in signatureField? Thanks heaps Cheers tinman -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-ignore-whitespace-case-sensitivity-with-dedupe-tp2997624p2997624.html Sent from the Solr - User mailing list archive at Nabble.com.