On Dec 8, 2014, at 9:35 PM, Scott Marlowe wrote:
> select a,b,c into newtable from oldtable group by a,b,c;
>
> On pass, done.
This is a bit naive, but couldn't this approach potentially be faster
(depending on the system)?
SELECT a, b, c INTO duplicate_records FROM ( SELECT a, b, c,
soon with useful results
Daniel
From: Marc Mamin [mailto:m.ma...@intershop.de]
Sent: December-12-14 14:25
To: John McKown; Daniel Begin
Cc: Tom Lane; Scott Marlowe; Andy Colson; PostgreSQL General
Subject: AW: [GENERAL] Removing duplicate records from a bulk upload
(rationale behind se
From: pgsql-general-ow...@postgresql.org
>[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Marc Mamin
>Sent: December-12-14 06:41
>To: Daniel Begin; 'Tom Lane'; 'Scott Marlowe'
>Cc: 'Andy Colson'; pgsql-general@postgresql.or
shared full table scan on oldtable.
>
> HTH
>
> Marc Mamin
>
>
> >
> >Best regards,
> >Daniel
> >
> >-Original Message-
> >From: pgsql-general-ow...@postgresql.org
> >[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Tom Lan
4 06:41
To: Daniel Begin; 'Tom Lane'; 'Scott Marlowe'
Cc: 'Andy Colson'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Removing duplicate records from a bulk upload
(rationale behind selecting a method)
>Thank Tom,
>I understand that the rationale behind
lowe
>Cc: Andy Colson; Daniel Begin; pgsql-general@postgresql.org
>Subject: Re: [GENERAL] Removing duplicate records from a bulk upload
>(rationale behind selecting a method)
>
>Scott Marlowe writes:
>> If you're de-duping a whole table, no need to create indexes, as it'
-Original Message-
From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: December-08-14 21:52
To: Scott Marlowe
Cc: Andy Colson; Daniel Begin; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Removing duplicate records from a bulk upload
Scott Marlowe writes:
> If you're de-duping a whole table, no need to create indexes, as it's
> gonna have to hit every row anyway. Fastest way I've found has been:
> select a,b,c into newtable from oldtable group by a,b,c;
> On pass, done.
> If you want to use less than the whole row, you can
If you're de-duping a whole table, no need to create indexes, as it's
gonna have to hit every row anyway. Fastest way I've found has been:
select a,b,c into newtable from oldtable group by a,b,c;
On pass, done.
If you want to use less than the whole row, you can use select
distinct on (col1, col
t the selection of an
approach:-|
Daniel
-Original Message-
From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Andy Colson
Sent: December-08-14 11:39
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Removing duplicate records from a bulk u
On 12/8/2014 1:59 PM, Daniel Begin wrote:
Thanks for your answers Andy; I will keep in mind the procedure you proposed.
About the fields required to find duplicate records, all of them are required
(5-9) depending on the table.
these tables have no field that would normally be designated 'prim
ginal Message-
From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Andy Colson
Sent: December-08-14 11:39
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Removing duplicate records from a bulk upload
On 12/8/2014 10:30 AM, Andy Colson wrote:
On 12/8/2014 10:30 AM, Andy Colson wrote:
On 12/7/2014 9:31 PM, Daniel Begin wrote:
I have just completed the bulk upload of a large database. Some tables
have billions of records and no constraints or indexes have been applied
yet. About 0.1% of these records may have been duplicated during the
On 12/7/2014 9:31 PM, Daniel Begin wrote:
I have just completed the bulk upload of a large database. Some tables
have billions of records and no constraints or indexes have been applied
yet. About 0.1% of these records may have been duplicated during the
upload and I need to remove them before ap
I have just completed the bulk upload of a large database. Some tables have
billions of records and no constraints or indexes have been applied yet. About
0.1% of these records may have been duplicated during the upload and I need to
remove them before applying constraints.
I understand ther
15 matches
Mail list logo