Re: [SQL] How to find double entries

2008-04-19 Thread Jean-David Beyer
Andreas wrote: > Hi, > > how can I find double entries in varchar columns where the content is not > 100% identical because of a spelling error or the person considered it > "looked nicer" that way? > > I'd like to identify and then merge records of e.g. 'google', 'gogle', > 'guugle' Then I wa

Re: [SQL] How to find double entries

2008-04-16 Thread Craig Ringer
Vivek Khera wrote: > > On Apr 15, 2008, at 11:23 PM, Tom Lane wrote: >> What's really a duplicate sounds like a judgment call here, so you >> probably shouldn't even think of automating it completely. > > I did a consulting gig about 10 years ago for a company that made > software to normalize st

Re: [SQL] How to find double entries

2008-04-16 Thread Vivek Khera
On Apr 15, 2008, at 11:23 PM, Tom Lane wrote: What's really a duplicate sounds like a judgment call here, so you probably shouldn't even think of automating it completely. I did a consulting gig about 10 years ago for a company that made software to normalize street addresses and names. Lit

Re: [SQL] How to find double entries

2008-04-15 Thread Volkan YAZICI
On Wed, 16 Apr 2008, Andreas <[EMAIL PROTECTED]> writes: > how can I find double entries in varchar columns where the content is > not 100% identical because of a spelling error or the person > considered it "looked nicer" that way? > > I'd like to identify and then merge records of e.g. 'google'

Re: [SQL] How to find double entries

2008-04-15 Thread Tena Sakai
Hi, In a recent linux magazine article (http://www.linux-mag.com/id/5679) there was a mentioning of Full-Text Search Integration. Which I know nothing about, but sounded interesting to me. You might want to check it out. Regards, Tena Sakai [EMAIL PROTECTED] -Original Message- From:

Re: [SQL] How to find double entries

2008-04-15 Thread Tom Lane
Andreas <[EMAIL PROTECTED]> writes: > I'd like to identify and then merge records of e.g. 'google', 'gogle', > 'guugle' > Then I want to match abbrevations like 'A-Company Ltd.', 'a company > ltd.', 'A-Company Limited' > Is there a way to do this? > It would be OK just to list candidats up

Re: [SQL] How to find double entries

2008-04-15 Thread Craig Ringer
Andreas wrote: > Hi, > > how can I find double entries in varchar columns where the content is > not 100% identical because of a spelling error or the person considered > it "looked nicer" that way? When doing some near-duplicate elimination as part of converting a legacy data set to PostgreSQL I