-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brandon, Nicholas (UK) wrote:
> I have a table with first_name and a last_name column. I would like to
> find similar duplicates

BTW there exists a whole algorithm for doing this with part influenced
by the medical industry (matching patient records) as well as the US
census (same problem).

Mathematical detail: http://en.wikipedia.org/wiki/Jaro-Winkler

There are Python and C implementations in
http://bitpim.svn.sourceforge.net/viewvc/bitpim/trunk/bitpim/src/native/strings/

I originally wrote the code to detect similarities between contact
records coming from cell phones, Outlook, Evolution etc.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIk6TKmOOfHg372QQRAvFMAKCo/Fi2Tnfn9h7i/1l70jeceHVzFQCgm4ku
hGvf5V3boWJ71umj9Kofct0=
=xAnN
-----END PGP SIGNATURE-----
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to