> I'm new to SQL, and SQLite, and I find myself needing to > identify duplicate records in a SQLite table (there are > about 2K duplicates in a 36K row table). Any suggestions > or magic SQL queries appreciated :-)
Here are a couple of strategies I've used successfully for finding and potentially removing duplicate records. It assumes a primary key field that is *separate* from the fields being checked for duplication. This example also assumes that I only want to check for duplication on two of the fields in the table: field1 and field2. Assume a table such as follows, that might included duplicate records. create table checkedtable ( priKey integer primary key, field1 char (10), field2 char (10), <other possible fields> ); and a temporary table to hold the duplicates create temp table dups ( field1, field2, reccount, keyvalue ); The first method is probably the simplest: insert into dups (field1, field2, keyvalue) select field1, field2, max(priKey) from checkedtable group by field1, field2 I don't know if this one works on SQLite, as I've never had reason to try it. (I've learned a fair bit about DB design in the last couple of years before starting with SQLite, so duplication that needs correction is less of a problem now :) It does work on SQL Server 2000, though. The second method is one that I used before I understood that GROUP BY could be used for more than counting. :-) Get all the records that have duplicate information and put them into the temp table: insert into dups (field1, field2, reccount) select field1, field2, count(priKey) from checkedtable group by field1, field2 having count(priKey) > 1 Now, get the key for one of them: update dups set keyvalue = checkedtable.priKey from checkedtable where checkedtable.field1 = checkedtable.field1 and checkedtable.field2 = checkedtable.field2 Using either of these methods, the temporary table now contains key values that point to one of each set of duplicated records in the original table. It can be used to delete them, archive them, change them, whatever, by joining back to the original. Note that the code presented here is off the top of my head, and not tested, so there may be syntactic errors that I didn't catch while writing it. With any luck, there aren't any logic errors. Hopefully, it will give you a starting point. Brad