Detecting (Almost) Matches for DeDuping?

2007-01-14 Thread Matthew Reinbold
I have a dataset that I've put together from a number of client files. To this point I've been able to easily build a set of ColdFusion tools for using the data but there is a de-duping process that I need to do that I just don't now how to approach. The data has a series of first and last

Re: Detecting (Almost) Matches for DeDuping?

2007-01-14 Thread Jacob Munson
I did some poking around one day for stuff like this and came across an algorithm called Soundex that helps you know if two names are the same, even though they might have slightly different spelling. I just did a search, and found that Ben Forta wrote a UDF for doing this. Not sure if it will

Re: Detecting (Almost) Matches for DeDuping?

2007-01-14 Thread Claude Schneegans
an algorithm called Soundex Yeah, but soundex is not a panacea either. All what matters with soundex is the first syllable; after that, about anything will match. It might detect Martine and Matinez as being the same, but Martin, Martinovitch and Martinelli as well. So anyway, some human

Re: Detecting (Almost) Matches for DeDuping?

2007-01-14 Thread Robertson-Ravo, Neil (RX)
. Visit our website at http://www.reedexpo.com -Original Message- From: Matthew Reinbold To: CF-Talk Sent: Sun Jan 14 16:28:16 2007 Subject: Detecting (Almost) Matches for DeDuping? I have a dataset that I've put together from a number of client files. To this point I've been able to easily

Re: Detecting (Almost) Matches for DeDuping?

2007-01-14 Thread Matthew Reinbold
Thanks for all the quick responses. SoundEx is interesting but it only finds names that sound the same - like Johnson and Jonson. However, if a misspelling causes the two names to be phonetically different - like Johnson and Jihnson I don't believe it will find that match. I agree, if there's

Re: Detecting (Almost) Matches for DeDuping?

2007-01-14 Thread Robertson-Ravo, Neil (RX)
To: CF-Talk Sent: Sun Jan 14 17:31:51 2007 Subject: Re: Detecting (Almost) Matches for DeDuping? Thanks for all the quick responses. SoundEx is interesting but it only finds names that sound the same - like Johnson and Jonson. However, if a misspelling causes the two names to be phonetically different