Re: [SQL] How to find double entries

2008-04-19 Thread Jean-David Beyer
Andreas wrote: > Hi, > > how can I find double entries in varchar columns where the content is not > 100% identical because of a spelling error or the person considered it > "looked nicer" that way? > > I'd like to identify and then merge records of e.g. 'google', 'gogle', > 'guugle' Then I wa

Re: [SQL] How to find double entries

2008-04-16 Thread Craig Ringer
Vivek Khera wrote: > > On Apr 15, 2008, at 11:23 PM, Tom Lane wrote: >> What's really a duplicate sounds like a judgment call here, so you >> probably shouldn't even think of automating it completely. > > I did a consulting gig about 10 years ago for a company that made > software to normalize st

Re: [SQL] How to find double entries

2008-04-16 Thread Vivek Khera
On Apr 15, 2008, at 11:23 PM, Tom Lane wrote: What's really a duplicate sounds like a judgment call here, so you probably shouldn't even think of automating it completely. I did a consulting gig about 10 years ago for a company that made software to normalize street addresses and names. Lit

Re: [SQL] How to find double entries

2008-04-15 Thread Volkan YAZICI
On Wed, 16 Apr 2008, Andreas <[EMAIL PROTECTED]> writes: > how can I find double entries in varchar columns where the content is > not 100% identical because of a spelling error or the person > considered it "looked nicer" that way? > > I'd like to identify and then merge records of e.g. 'google'

Re: [SQL] How to find double entries

2008-04-15 Thread Tena Sakai
: [EMAIL PROTECTED] on behalf of Andreas Sent: Tue 4/15/2008 8:15 PM To: pgsql-sql@postgresql.org Subject: [SQL] How to find double entries Hi, how can I find double entries in varchar columns where the content is not 100% identical because of a spelling error or the person considered it "l

Re: [SQL] How to find double entries

2008-04-15 Thread Tom Lane
Andreas <[EMAIL PROTECTED]> writes: > I'd like to identify and then merge records of e.g. 'google', 'gogle', > 'guugle' > Then I want to match abbrevations like 'A-Company Ltd.', 'a company > ltd.', 'A-Company Limited' > Is there a way to do this? > It would be OK just to list candidats up

Re: [SQL] How to find double entries

2008-04-15 Thread Craig Ringer
Andreas wrote: > Hi, > > how can I find double entries in varchar columns where the content is > not 100% identical because of a spelling error or the person considered > it "looked nicer" that way? When doing some near-duplicate elimination as part of converting a legacy data set to PostgreSQL I

[SQL] How to find double entries

2008-04-15 Thread Andreas
Hi, how can I find double entries in varchar columns where the content is not 100% identical because of a spelling error or the person considered it "looked nicer" that way? I'd like to identify and then merge records of e.g. 'google', 'gogle', 'guugle' Then I want to match abbrevations