Re: [GENERAL] Need magic for identifieing double adresses

2010-09-23 Thread Octavio Alvarez
On Thu, 16 Sep 2010 06:22:15 -0700, Andreas maps...@gmx.net wrote: It's not only typos to catch. There is variation in the way to write things that not necessarily are wrong. e.g. Miller's Bakery Bakery Miller Bakery Miller, Ltd. Bakery Miller and sons Bakery Smith (formerly Miller) and the

Re: [GENERAL] Need magic for identifieing double adresses

2010-09-17 Thread John DeSoi
On Sep 15, 2010, at 10:40 PM, Andreas wrote: I need to clean up a lot of contact data because of a merge of customer lists that used to be kept separate. I allready know that there are double entries within the lists and they do overlap, too. Relevant fields could be name, street, zip,

Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Sam Mason
On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote: I need to clean up a lot of contact data because of a merge of customer lists that used to be kept separate. I allready know that there are double entries within the lists and they do overlap, too. Relevant fields could be name,

Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Andreas
Am 16.09.2010 13:18, schrieb Sam Mason: On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote: I need to clean up a lot of contact data because of a merge of customer lists that used to be kept separate. What to do depends on how much data you have; a few thousand and you can do lots of

Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Sam Mason
On Thu, Sep 16, 2010 at 03:22:15PM +0200, Andreas wrote: We are talking about nearly 500.000 records with considerable overlapping. Other things to consider is whether each one contains unique entries and hence can you do a best match between datasets--FULL OUTER JOIN is your friend here, but

Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Bill Thoen
On 9/16/2010 5:18 AM, Sam Mason wrote: On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote: I need to clean up a lot of contact data because of a merge of customer lists that used to be kept separate. I allready know that there are double entries within the lists and they do

[GENERAL] Need magic for identifieing double adresses

2010-09-15 Thread Andreas
Hi, I need to clean up a lot of contact data because of a merge of customer lists that used to be kept separate. I allready know that there are double entries within the lists and they do overlap, too. Relevant fields could be name, street, zip, city, phone Is there a way to do something

Re: [GENERAL] Need magic for identifieing double adresses

2010-09-15 Thread Darren Duncan
Andreas wrote: I need to clean up a lot of contact data because of a merge of customer lists that used to be kept separate. I allready know that there are double entries within the lists and they do overlap, too. Relevant fields could be name, street, zip, city, phone Is there a way to do

Re: [GENERAL] Need magic for identifieing double adresses

2010-09-15 Thread Gary Chambers
Andreas, Relevant fields could be  name, street, zip, city, phone Is there a way to do something like this with postgresql ? I fear this will need still a lot of manual sorting and searching even when potential peers get automatically identified. One of the techniques I use to increase the