Re: De Duplication using Solr

2015-01-03 Thread Amit Jha
Thanks for reply...I have already seen wiki. It is more likely to record matching. On Sat, Jan 3, 2015 at 7:39 PM, Jack Krupansky wrote: > First, see if you can get your requirements to align to the de-dupe feature > that Solr already has: > https://cwiki.apache.org/confluence/display/solr/De-D

Re: De Duplication using Solr

2015-01-03 Thread Jack Krupansky
First, see if you can get your requirements to align to the de-dupe feature that Solr already has: https://cwiki.apache.org/confluence/display/solr/De-Duplication -- Jack Krupansky On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha wrote: > I am trying to find out duplicate records based on distance and

RE: De Duplication using Solr

2015-01-03 Thread steve
One possible "match" is using Python's FuzzyWuzzy https://github.com/seatgeek/fuzzywuzzy http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/ > Date: Sat, 3 Jan 2015 13:24:17 +0530 > Subject: De Duplication using Solr > From: shanuu@g

De Duplication using Solr

2015-01-02 Thread Amit Jha
I am trying to find out duplicate records based on distance and phonetic algorithms. Can I utilize solr for that? I have following fields and conditions to identify exact or possible duplicates. 1. Fields prefix suffix firstname lastname email(primary_email1, email2, email3) phone(primary_phone1,