> I'm new to SQL, and SQLite, and I find myself needing to
> identify duplicate records in a SQLite table (there are
> about 2K duplicates in a 36K row table). Any suggestions
> or magic SQL queries appreciated :-)
Here are a couple of strategies I've used successfully for finding and
potentially removing duplicate records. It assumes a primary key field that
is *separate* from the fields being checked for duplication. This example
also assumes that I only want to check for duplication on two of the fields
in the table: field1 and field2.
Assume a table such as follows, that might included duplicate records.
create table checkedtable
(
priKey integer primary key,
field1 char (10),
field2 char (10),
<other possible fields>
);
and a temporary table to hold the duplicates
create temp table dups
(
field1,
field2,
reccount,
keyvalue
);
The first method is probably the simplest:
insert into dups (field1, field2, keyvalue)
select field1, field2, max(priKey)
from checkedtable
group by field1, field2
I don't know if this one works on SQLite, as I've never had reason to try
it. (I've learned a fair bit about DB design in the last couple of years
before starting with SQLite, so duplication that needs correction is less of
a problem now :) It does work on SQL Server 2000, though.
The second method is one that I used before I understood that GROUP BY could
be used for more than counting. :-)
Get all the records that have duplicate information and put them into the
temp table:
insert into dups (field1, field2, reccount)
select field1, field2, count(priKey)
from checkedtable
group by field1, field2
having count(priKey) > 1
Now, get the key for one of them:
update dups
set keyvalue = checkedtable.priKey
from checkedtable
where checkedtable.field1 = checkedtable.field1
and checkedtable.field2 = checkedtable.field2
Using either of these methods, the temporary table now contains key values
that point to one of each set of duplicated records in the original table.
It can be used to delete them, archive them, change them, whatever, by
joining back to the original.
Note that the code presented here is off the top of my head, and not tested,
so there may be syntactic errors that I didn't catch while writing it. With
any luck, there aren't any logic errors. Hopefully, it will give you a
starting point.
Brad