[SQL] A DISTINCT problem removing duplicates

2008-12-09 Thread Richard Huxton
The scenario is - a list of documents, some of which may be (near) duplicates of others, one document being in many duplicate-sets and a duplicate-set containing many documents. We want to see a list with only one document (any one) from each duplicate set. There's an example script attached. So:

Re: [SQL] A DISTINCT problem removing duplicates

2008-12-09 Thread Tom Lane
Richard Huxton <[EMAIL PROTECTED]> writes: > Anyone got anything more elegant? Seems to me that no document should have an empty dup_set. If it's not a match to any existing document, then immediately assign a new dup_set number to it. regards, tom lane -- Sent via pgsq

Re: [SQL] A DISTINCT problem removing duplicates

2008-12-09 Thread Richard Huxton
Tom Lane wrote: > Richard Huxton <[EMAIL PROTECTED]> writes: >> Anyone got anything more elegant? > > Seems to me that no document should have an empty dup_set. If it's not > a match to any existing document, then immediately assign a new dup_set > number to it. That was my initial thought too,

Re: [SQL] A DISTINCT problem removing duplicates

2008-12-09 Thread Tom Lane
Richard Huxton <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Richard Huxton <[EMAIL PROTECTED]> writes: >>> Anyone got anything more elegant? >> >> Seems to me that no document should have an empty dup_set. If it's not >> a match to any existing document, then immediately assign a new dup_set

Re: [SQL] A DISTINCT problem removing duplicates

2008-12-09 Thread Richard Huxton
Tom Lane wrote: > Richard Huxton <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> Richard Huxton <[EMAIL PROTECTED]> writes: Anyone got anything more elegant? >>> Seems to me that no document should have an empty dup_set. If it's not >>> a match to any existing document, then immediately as