Thanks for reply...I have already seen wiki. It is more likely to record matching.
On Sat, Jan 3, 2015 at 7:39 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > First, see if you can get your requirements to align to the de-dupe feature > that Solr already has: > https://cwiki.apache.org/confluence/display/solr/De-Duplication > > > -- Jack Krupansky > > On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha <shanuu....@gmail.com> wrote: > > > I am trying to find out duplicate records based on distance and phonetic > > algorithms. Can I utilize solr for that? I have following fields and > > conditions to identify exact or possible duplicates. > > > > 1. Fields > > prefix > > suffix > > firstname > > lastname > > email(primary_email1, email2, email3) > > phone(primary_phone1, phone2, phone3) > > 2. Conditions: > > Two records said to be exact duplicates if > > > > 1. IsExactMatchFunction(record1_prefix, record2_prefix) AND > > IsExactMatchFunction(record1_suffix, record2_suffix) AND > > IsExactMatchFunction(record1_firstname,record2_firstname) AND > > IsExactMatchFunction(record1_lastname,record2_lastname) AND > > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR > > IsExactMatchFunction(record1_primary_phone,record2_primary_primary) > > Two records said to be possible duplicates if > > > > 1. IsExactMatchFunction(record1_prefix, record2_prefix) OR > > IsExactMatchFunction(record1_suffix, record2_suffix) OR > > IsExactMatchFunction(record1_firstname,record2_firstname) AND > > IsExactMatchFunction(record1_lastname,record2_lastname) AND > > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR > > IsExactMatchFunction(record1_primary_phone,record2_primary_primary) > > ELSE > > 2. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND > > IsExactMatchFunction(record1_lastname,record2_lastname) AND > > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR > > IsExactMatchFunction(record1_primary_phone,record2_primary_primary) > > ELSE > > 3. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND > > IsExactMatchFunction(record1_lastname,record2_lastname) AND > > IsExactMatchFunction(record1_any_email,record2_any_email) OR > > IsExactMatchFunction(record1_any_phone,record2_any_primary) > > > > IsFuzzyMatchFunction() will perform distance and phonetic algorithms > > calculation and compare it with predefined threshold. > > > > For example: > > > > if threshold defined for firsname is 85 and IsFuzzyMatchFunction() > function > > only return "ture" only and only if one of the algorithms(distance or > > phonetic) return the similarity socre >= 85. > > > > Can I use solr to perform this job. Or Can you guys suggest how can I > > approach to this problem. I have seen the duke(De duplication API) but I > > can not use duke out of the box. > > >