--- Klint Gore <[EMAIL PROTECTED]> wrote:
> On Wed, 26 Jan 2005 21:25:19 -0500, "D. Richard Hipp"
> <[EMAIL PROTECTED]> wrote:
> > On Wed, 2005-01-26 at 17:45 -0800, Clark Christensen
> wrote:
> > > I'm new to SQL, and SQLite, and I find myself needing
> to
> > > identify duplicate records in a SQLite table (there
> are
> > > about 2K duplicates in a 36K row table). Any
> suggestions
> > > or magic SQL queries appreciated :-)
> > >
> >
> > SELECT * FROM table EXCEPT SELECT DISTINCT * FROM
> table;
>
> Does that actually work?
>
> sqlite> create table firstnames (fname varchar(5));
> sqlite> insert into firstnames values ('bob');
> sqlite> insert into firstnames values ('bob');
> sqlite> insert into firstnames values ('bill');
> sqlite> select * from firstnames except select distinct *
> from firstnames;
> sqlite>
> sqlite> select fname from firstnames group by fname
> having count(*) > 1;
> bob
> sqlite>
>
> klint.
>
>
+---------------------------------------+-----------------+
> : Klint Gore : "Non rhyming
> :
> : EMail : [EMAIL PROTECTED] : slang - the
> :
> : Snail : A.B.R.I. : possibilities
> :
> : Mail University of New England : are useless"
> :
> : Armidale NSW 2351 Australia : L.J.J.
> :
> : Fax : +61 2 6772 5376 :
> :
>
+---------------------------------------+-----------------+
>
Many thanks to all who replied with help. As has been
pointed out, the EXCEPT method wasn't successful :-(
Seemed almost too good to be true.
I managed to find and eliminate the duplicates using a
combination of the GROUP BY method Klint suggested, and a
temp table.
While possibly not clear without having the schema for
table DMI_SKU, here's what I finally was able to do after
lots of trial and error (gotta love transactions).
create temp table TEMP_DUPES (MPN text, DMI text, MDESC
text, RID integer, DCOUNT integer);
-- Get all duplicates that are dupes by mpn, dmi, and
modeldesc into temp table
begin transaction;
insert into temp_dupes
select ModelPartNumber MPN, dmistring DMI, modeldesc
MDESC, rowid RID, count(1) DCOUNT
from dmi_sku
group by MPN, DMI, MDESC
HAVING count(*) >1;
--examine all duplicates
select * from dmi_sku where modelpartnumber in (select mpn
from temp_dupes);
delete from dmi_sku
where
modelpartnumber in (select mpn from temp_dupes) and
rowid not in (select rid from temp_dupes);
--inspect the results again
select ModelPartNumber, ModelDesc, DMIString, rowid
from dmi_sku
where modelpartnumber in (select mpn from temp_dupes);
--and, of course...
commit;
Fortunately, this is a one-time build-up of a dataset for
another project, so fast, elegant code wasn't necessary.
Also, many thanks to DRH for an excellent product in
SQLite.
-Clark