Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-12 Thread Jonathan Vanasco
On Dec 8, 2014, at 9:35 PM, Scott Marlowe wrote: > select a,b,c into newtable from oldtable group by a,b,c; > > On pass, done. This is a bit naive, but couldn't this approach potentially be faster (depending on the system)? SELECT a, b, c INTO duplicate_records FROM ( SELECT a, b, c,

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-12 Thread Daniel Begin
soon with useful results Daniel From: Marc Mamin [mailto:m.ma...@intershop.de] Sent: December-12-14 14:25 To: John McKown; Daniel Begin Cc: Tom Lane; Scott Marlowe; Andy Colson; PostgreSQL General Subject: AW: [GENERAL] Removing duplicate records from a bulk upload (rationale behind se

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-12 Thread Marc Mamin
From: pgsql-general-ow...@postgresql.org >[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Marc Mamin >Sent: December-12-14 06:41 >To: Daniel Begin; 'Tom Lane'; 'Scott Marlowe' >Cc: 'Andy Colson'; pgsql-general@postgresql.or

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-12 Thread John McKown
shared full table scan on oldtable. > > HTH > > Marc Mamin > > > > > >Best regards, > >Daniel > > > >-Original Message- > >From: pgsql-general-ow...@postgresql.org > >[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Tom Lan

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-12 Thread Daniel Begin
4 06:41 To: Daniel Begin; 'Tom Lane'; 'Scott Marlowe' Cc: 'Andy Colson'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method) >Thank Tom, >I understand that the rationale behind

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-12 Thread Marc Mamin
lowe >Cc: Andy Colson; Daniel Begin; pgsql-general@postgresql.org >Subject: Re: [GENERAL] Removing duplicate records from a bulk upload >(rationale behind selecting a method) > >Scott Marlowe writes: >> If you're de-duping a whole table, no need to create indexes, as it'

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-09 Thread Daniel Begin
-Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Tom Lane Sent: December-08-14 21:52 To: Scott Marlowe Cc: Andy Colson; Daniel Begin; pgsql-general@postgresql.org Subject: Re: [GENERAL] Removing duplicate records from a bulk upload

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-08 Thread Tom Lane
Scott Marlowe writes: > If you're de-duping a whole table, no need to create indexes, as it's > gonna have to hit every row anyway. Fastest way I've found has been: > select a,b,c into newtable from oldtable group by a,b,c; > On pass, done. > If you want to use less than the whole row, you can

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-08 Thread Scott Marlowe
If you're de-duping a whole table, no need to create indexes, as it's gonna have to hit every row anyway. Fastest way I've found has been: select a,b,c into newtable from oldtable group by a,b,c; On pass, done. If you want to use less than the whole row, you can use select distinct on (col1, col

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-08 Thread Andy Colson
t the selection of an approach:-| Daniel -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Andy Colson Sent: December-08-14 11:39 To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Removing duplicate records from a bulk u

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-08 Thread John R Pierce
On 12/8/2014 1:59 PM, Daniel Begin wrote: Thanks for your answers Andy; I will keep in mind the procedure you proposed. About the fields required to find duplicate records, all of them are required (5-9) depending on the table. these tables have no field that would normally be designated 'prim

Re: [GENERAL] Removing duplicate records from a bulk upload (rationale behind selecting a method)

2014-12-08 Thread Daniel Begin
ginal Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Andy Colson Sent: December-08-14 11:39 To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Removing duplicate records from a bulk upload On 12/8/2014 10:30 AM, Andy Colson wrote:

Re: [GENERAL] Removing duplicate records from a bulk upload

2014-12-08 Thread Andy Colson
On 12/8/2014 10:30 AM, Andy Colson wrote: On 12/7/2014 9:31 PM, Daniel Begin wrote: I have just completed the bulk upload of a large database. Some tables have billions of records and no constraints or indexes have been applied yet. About 0.1% of these records may have been duplicated during the

Re: [GENERAL] Removing duplicate records from a bulk upload

2014-12-08 Thread Andy Colson
On 12/7/2014 9:31 PM, Daniel Begin wrote: I have just completed the bulk upload of a large database. Some tables have billions of records and no constraints or indexes have been applied yet. About 0.1% of these records may have been duplicated during the upload and I need to remove them before ap

[GENERAL] Removing duplicate records from a bulk upload

2014-12-07 Thread Daniel Begin
I have just completed the bulk upload of a large database. Some tables have billions of records and no constraints or indexes have been applied yet. About 0.1% of these records may have been duplicated during the upload and I need to remove them before applying constraints. I understand ther