Tom Lane wrote:
> Richard Huxton <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Richard Huxton <[EMAIL PROTECTED]> writes:
Anyone got anything more elegant?
>>> Seems to me that no document should have an empty dup_set. If it's not
>>> a match to any existing document, then immediately as
Richard Huxton <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Richard Huxton <[EMAIL PROTECTED]> writes:
>>> Anyone got anything more elegant?
>>
>> Seems to me that no document should have an empty dup_set. If it's not
>> a match to any existing document, then immediately assign a new dup_set
Tom Lane wrote:
> Richard Huxton <[EMAIL PROTECTED]> writes:
>> Anyone got anything more elegant?
>
> Seems to me that no document should have an empty dup_set. If it's not
> a match to any existing document, then immediately assign a new dup_set
> number to it.
That was my initial thought too,
Richard Huxton <[EMAIL PROTECTED]> writes:
> Anyone got anything more elegant?
Seems to me that no document should have an empty dup_set. If it's not
a match to any existing document, then immediately assign a new dup_set
number to it.
regards, tom lane
--
Sent via pgsq
The scenario is - a list of documents, some of which may be (near)
duplicates of others, one document being in many duplicate-sets and a
duplicate-set containing many documents.
We want to see a list with only one document (any one) from each
duplicate set. There's an example script attached.
So: