Thanks for reply...I have already seen wiki. It is more likely to record
matching.
On Sat, Jan 3, 2015 at 7:39 PM, Jack Krupansky
wrote:
> First, see if you can get your requirements to align to the de-dupe feature
> that Solr already has:
> https://cwiki.apache.org/confluence/display/solr/De-D
First, see if you can get your requirements to align to the de-dupe feature
that Solr already has:
https://cwiki.apache.org/confluence/display/solr/De-Duplication
-- Jack Krupansky
On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha wrote:
> I am trying to find out duplicate records based on distance and
One possible "match" is using Python's FuzzyWuzzy
https://github.com/seatgeek/fuzzywuzzy
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
> Date: Sat, 3 Jan 2015 13:24:17 +0530
> Subject: De Duplication using Solr
> From: shanuu@g
I am trying to find out duplicate records based on distance and phonetic
algorithms. Can I utilize solr for that? I have following fields and
conditions to identify exact or possible duplicates.
1. Fields
prefix
suffix
firstname
lastname
email(primary_email1, email2, email3)
phone(primary_phone1,